HomeGame GuidesMicrosoft unveils Phi-3, its smallest AI model to run on smartphones

Microsoft unveils Phi-3, its smallest AI model to run on smartphones

Published on

Microsoft introduced The next iteration of his lightweight artificial intelligence (AI) model, called Phi-3. The updated family includes the Phi-3 Mini with 3.8 billion parameters, the Phi-3 Small with 7 billion parameters and the Phi-3 Medium with 14 billion parameters.

This release comes after the Phi-2 model, Introduced in December 2023, outperformed by models like Meta’s Llama-3 family. In the face of increasing competition, Microsoft Research has implemented newer techniques in its curriculum learning approach.

The new 3.8 billion parameter model improves on the previous Phi-2 model while using significantly less resources than larger language models. At just 3.8 billion parameters, Phi-3 Mini surpasses Meta’s 8 billion parameters and OpenAI’s 3.5 billion GPT-3 parameters, according to Microsoft’s own benchmarks.

We present phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals models such as Mixtral 8x7B and GPT-3.5 ( For example, phi -3-mini scores 69% on MMLU and 8.38 on MT-bench), despite being small enough to deploy on a phone.

We also provide some preliminary parameter scaling results with 7B and 14B models trained for 4.8T tokens, called phi-3-small and phi-3-medium, both of which are significantly more capable than phi-3-mini (e.g., respectively 75% and 78% in MMLU, and 8.7 and 8.9 in MT-bench).

Due to its smaller size, the Phi-3 family is optimized for low-power devices compared to larger models. Microsoft Vice President Eric Boyd said (via The Verge) that the new model is capable of advanced natural language processing directly on the smartphone. This makes Phi-3 Mini well-suited for new applications that require ubiquitous AI assistance.

While Phi-3 Mini punches above the competition in its weight class, it can’t match the breadth of knowledge of massive web-trained models. However, Boyd notes that smaller, higher-quality models tend to perform better because internal datasets are more limited in scale.

Latest articles

More like this