- 18 novembre 2024
- Mistral AI team
Pixtral Large in short:
- Frontier-class multimodal performance
- State-of-the-art on MathVista, DocVQA, VQAv2
- Extends Mistral Large 2 without compromising text performance
- 123B multimodal decoder, 1B parameter vision encoder
- 128K context window: fits minimum of 30 high-resolution images
- Use:
Today we announce Pixtral Large, a 124B open-weights multimodal model built on top of Mistral Large 2. Pixtral Large is the second model in our multimodal family and demonstrates frontier-level image understanding. Particularly, the model is able to understand documents, charts and natural images, while maintaining the leading text-only understanding of Mistral Large 2.
The model is available under the Mistral Research License (MRL) for research and educational use; and the Mistral Commercial License for experimentation, testing, and production for commercial purposes.
Performance
We evaluate Pixtral Large against frontier models on a set of standard multimodal benchmarks, through a common testing harness. On MathVista, which evaluates complex mathematical reasoning over visual data, our model achieves 69.4%, outperforming all other models. To assess reasoning capabilities over complex charts and documents, we evaluate performance using ChartQA and DocVQA, where Pixtral Large surpasses GPT-4o and Gemini-1.5 Pro.
Finally, Pixtral Large demonstrates competitive capabilities on MM-MT-Bench, outperforming all of Claude-3.5 Sonnet (new), Gemini-1.5 Pro and GPT-4o (latest). MM-MT-Bench is an open-source, judge-based evaluation intended to reflect real-world use cases of multimodal LLMs (see the Pixtral 12B technical report for details).
Qualitative Samples
Multilingual OCR and reasoning…
Prompt:
I bought the coffee and the sausage. How much do I owe? Add a 18% tip.
Response:
To calculate the total amount you owe, including the 18% tip, follow these steps:
Calculate the cost of the coffee and the sausage:
- 2x Latte Macchiato: 2 * 4.50 CHF = 9.00 CHF
- 1x Schweinschnitzel: 22.00 CHF
- Total cost for coffee and sausage: 9.00 CHF + 22.00 CHF = 31.00 CHF
Calculate the 18% tip:
- Tip: 31.00 CHF * 0.18 = 5.58 CHF
Add the tip to the total cost:
- Total amount owed: 31.00 CHF + 5.58 CHF = 36.58 CHF
So, you owe a total of 36.58 CHF.
Chart understanding…
Prompt:
When did things start to go wrong for dark dragon?
Response:
The training loss for the “dark-dragon-50” model starts to exhibit significant instability around the 10,000 steps mark. There is a noticeable spike in the loss value at this point, and the instability continues with another major spike around the 20,000 steps mark. These spikes indicate that the training process for the “dark-dragon-50” model began to encounter issues starting around the 10,000 steps mark.
And more…
Prompt:
Which companies use models made by Mistral AI?
Response:
The companies that use models made by Mistral AI, as shown in the image, are:
- BNP Paribas
- Brave
- Cloudflare
- CMA CGM
- Front
One more thing…
Along with Pixtral Large, Mistral Large, our state-of-the-art text
model, also gets an update. The model is available as pixtral-large-latest
on our API, and as Mistral Large 24.11 on HuggingFace under the Mistral Research License (MRL) for research, or with a commercial license from Mistral AI for commercial use.
This newest model provides a significant upgrade on the previous Mistral Large 24.07, with notable improvements in long context understanding, a new system prompt, and more accurate function calling. The model is highly capable for RAG and agentic workflows, making it a suitable choice for enterprise use cases such as knowledge exploration and sharing, semantic understanding of documents, task automation, and improved customer experiences.
Mistral Large 24.11 will be available from our cloud provider partners soon, starting with Google Cloud and Microsoft Azure within a week.