Bringing open AI models to the frontier

Why we’re building Mistral AI.

  • September 27, 2023
  • Mistral AI team

Generative AI, particularly large language models, is revolutionising content creation, knowledge retrieval, and problem-solving by generating human-quality text, content and commands based on human instructions. In the coming years, generative AI will completely redefine our culture and our lives, the way we interact with machines and with fellows.

As in the previous ages of software, proprietary solutions were developed first—and we’re grateful they revealed the power of generative models to the world. Yet, as with the Web, with web browsers (Webkit), with operating systems (Linux), with cloud orchestration (Kubernetes), open solutions will quickly outperform proprietary solutions for most use cases. They will be driven by the power of community and the requirement for technical excellence that successful open-source projects have always promoted.

At Mistral AI, we believe that an open approach to generative AI is necessary. Community-backed model development is the surest path to fight censorship and bias in a technology shaping our future.

We strongly believe that by training our own models, releasing them openly, and fostering community contributions, we can build a credible alternative to the emerging AI oligopoly. Open-weight generative models will play a pivotal role in the upcoming AI revolution.

Mistral AI’s mission is to spearhead the revolution of open models.

Generative AI needs open models

Working with open models is the best way for both vendors and users to build a sustainable business around AI solutions. Open models can be finely adapted to solve many new core business problems, in all industry verticals—in ways unmatched by black-box models. The future will be made of many different specialised models, each adapted to specific tasks, compressed as much as possible, and connected to specific modalities.

In the open model paradigm, the developer has full control over the engine that powers their application. Model sizes and costs can be adapted to fit specific task difficulty, to put costs and latency under control. For enterprises, deploying open models on one’s infrastructure using well-packaged solutions simplifies dependencies and preserves data privacy.

Closed and opaque APIs introduce well-known technical liabilities, in particular IP leakage risks; in the case of generative AI, it introduces a cultural liability, since the generated content is fully under the control of the API provider, with limited customization capacities. With model weights at hand, end-user application developers can customise the guardrails and the editorial tone they desire, instead of depending on the choices and biases of black-box model providers.

Open models will also be precious safeguards against the misuse of generative AI. They will allow public institutions and private companies to audit generative systems for flaws, and to detect bad usage of generative models. They are our strongest bet for efficiently detecting misinformation content, whose quantity will increase unavoidably in the coming years.

Mistral AI first steps

Our ambition is to become the leading supporter of the open generative AI community, and bring open models to state-of-the-art performance. We will make them the go-to solutions for most of the generative AI applications. Many of us played pivotal roles in important episodes in the development of LLMs; we’re thrilled to be working together on new frontier models with a community-oriented mindset.

In the coming months, Mistral AI will progressively and methodically release new models that close the performance gap between black-box and open solutions – making open solutions the best options on a growing range of enterprise use-cases. Simultaneously, we will seek to empower community efforts to improve the next generations of models.

As part of this effort, we’re releasing today Mistral 7B, our first 7B-parameter model, which outperforms all currently available open models up to 13B parameters on all standard English and code benchmarks. This is the result of three months of intense work, in which we assembled the Mistral AI team, rebuilt a top-performance MLops stack, and designed a most sophisticated data processing pipeline, from scratch.

Mistral 7B’s performance demonstrates what small models can do with enough conviction. Tracking the smallest models performing above 60% on MMLU is quite instructive: in two years, it went from Gopher (280B, DeepMind. 2021), to Chinchilla (70B, DeepMind, 2022), to Llama 2 (34B, Meta, July 2023), and to Mistral 7B.

Mistral 7B is only a first step toward building the frontier models on our roadmap. Yet, it can be used to solve many tasks: summarisation, structuration and question answering to name a few. It processes and generates text much faster than large proprietary solutions, and runs at a fraction of their costs.

Mistral 7B is released in Apache 2.0, making it usable without restrictions anywhere.

What’s next?

To actively engage with our user community and promote responsible usage of the models and tools we release under open-source licences, we are opening a GitHub repository and a Discord channel. These will provide a platform for collaboration, support, and transparent communication.

We’re committing to release the strongest open models in parallel to developing our commercial offering. We will propose optimised proprietary models for on-premise/virtual private cloud deployment. These models will be distributed as white-box solutions, making both weights and code sources available. We are actively working on hosted solutions and dedicated deployment for enterprises.

We’re already training much larger models, and are shifting toward novel architectures. Stay tuned for further releases this fall.