Meta AI presented a series of language models – lama

Meta AI Llam was launched, a collection of foundation language models from parameters from 7b to 65b. According to Lama programmers, it can compete or even outweigh the best existing models, such as GPT-3, Chinchilla and Palm.

Models of large languages ​​(LLM), which are trained in massive data foundations, showed their ability to perform various basic tasks, such as a text summary, preparation of text instructions and writing poetry for more complex ones, such as creating AI art descriptions.

As a set of training data for programmers, Lama used a mix of several sources: English Commoncrawl, C4, GitHub, Wikipedia, Books, Arxiv and Stack Exchange. It included a diverse set of domains. Unlike chinchillas, palm or GPT-3, Llam only uses publicly available data, thanks to which its operation is compatible with Open Sourcing, while most existing models are based on data that is not publicly available or undocumented.

To improve the speed of training, Llam models use the effective implementation of the causal multi -meal remarks that reduces memory consumption and calculation. To improve learning even more, programmers decided on control points as a way to reduce the number of activations re -calculated during backwardness.

Unlike previous studies, META studies on LLAMA show that the latest results can be achieved by training only on publicly available data without resorting to the reserved data sets. Developers hope that publishing these models of the research community will speed up the development of large language models, help improve their reliability and reduce known problems, such as toxicity and prejudice.

Read more details about research in paper.

LEAVE A REPLY

Please enter your comment!
Please enter your name here