Phi-4-Mel models, high results

The Phi-4 family is Microsoft's latest development in small models (SLM), designed to improve in complex reasoning tasks while maintaining performance. The Phi-4 series includes three key models: Phi-4 justifying, justified Phi-4 and justification of Phi-4-Mini. The newly published models are built with a clear focus: provide advanced reasoning efficiency without the demand for infrastructure of trillion parameter models. They choose the optimal balance between size and performance using advanced techniques, such as distillation, reinforcement learning and carefully selected data.

Justification of Phi-4 It is a 14-million parameters model with a 32K token context window, trained using high quality internet data and OPENAI O3-Mini hints. It stands out in tasks requiring detailed, multi -stage reasoning, such as mathematics, coding and solving algorithmic problems.

Phi-4-Gasu It is based on additional tuning using 1.5x more tokens and reinforcement learning, ensuring even higher accuracy and performance efficiency.

Justification of Phi-4-MiniWith only 3.8 billion parameters, he was trained about a million synthetic mathematical problems generated by Deepseek R1. It will be directed to use, such as educational tools and mobile applications, proveing ​​to solve problems step by step in limited resources.

What distinguishes Phi-4 is not only performance, but the ability itself. On comparative tests such as Humaneval+ and Math-500:

  • Phi-4-tailoring plus exceeds Deepseek-R1 (parameters 671B) in some tasks, which shows that smarter training can overcome brutal strength.
  • OPENAI O3-Mini also competes and exceeds Deepseek-R1-Distill-70B in complex tasks of reasoning and planning.
  • PHI-4-mini planting works competitively with much larger models, and even exceeds some in relation to mathematics.

According to responsible AI Microsoft frames, all Phi-4 models are trained with strong security protocols. After the training, it includes supervised tuning (SFT), optimization of direct preferences (DPO) and learning to strengthen based on human feedback (RLHF). Microsoft uses public sets of data focused on security, usefulness and honesty – ensuring wide utility while minimizing risk.

All three models are available free of charge Hugging AND Azure ai foundryenabling researchers, startups and teachers to integrate high -performance reasoning in their own applications.

LEAVE A REPLY

Please enter your comment!
Please enter your name here