t3chnowolf Jan 11 2024 at 14:00

Microsoft представила небольшую модель Phi-2, которая лучше «старших сестёр». Что это за проект?

4 min

6.3K

МТС corporate blogMachine learning*Artificial IntelligenceIT-companies

+20

Comments 10

Nurked Jan 11 2024 at 17:55

Ба, подумать только! Если кормить компьютер правильными данными, он более охотно выдаёт правильные ответы.

Могу только представить себе использование матмодели, для конденсации данных для обучения более продвинутой мат модели.

Вспоминается песня написанная ~~GlaDOS~~ Aperture Science Psychoacoustic Laboratories под названием (Defun Botsbuildbots () [Botsbuildbots) ]

EugeneH Jan 11 2024 at 19:18

del

EugeneH Jan 11 2024 at 19:20

В статье не хватает ссылки на модель.

MountainGoat Jan 11 2024 at 19:41

Я услышал "Наш новый трактор пашет лучше, чем Нива, и лучше, чем их гибрид трактора с самолётом и скороваркой (но наш не летает и не варит)"

EugeneH Jan 11 2024 at 23:09

Погонял немного эту Phi-2, спасибо Майкам, что выложили всё на HF, в отличие от некоторых.

Что можно сказать?

Для 2.7B, конечно, хорошие результаты. Но ни о каком сравнении с Llama-2 70B говорить не приходится. Число параметров решает, чудес не бывает.

В документации указана длина контекста 2048 токенов, но модель как правило сбивается уже в районе 1000, что типично и ожидаемо для микромоделей.

Неквантованая модель занимает около 9ГБ VRAM, ждём когда добавят поддержку в llama.cpp и выложат квантовыные версии. Чем-то всё-таки притягательна идея крохотных LLM, которые можно запускать на чём угодно.

Пример ответа на пустой промпт, <endoftext> после ~600 токенов:

Topic: Mathematics--Logic--Logical Train of Thought
Once upon a time, in a small town called Mathville, there lived three friends named Alice, Bob, and Carol. They were all passionate about mathematics and loved discussing logical train of thought. One sunny afternoon, they gathered at their favorite coffee shop to have their usual intellectual conversations.
Alice, who was known for her logical reasoning, started the conversation by saying, "You know, Bob, I always solve math problems faster than you because I have a better understanding of the concepts."
Bob chuckled and replied, "That's true, Alice. I admire your ability to grasp complex mathematical concepts quickly. It's like you have a natural talent for it."
Carol, who was listening intently, chimed in, "Speaking of talents, did you know that the famous mathematician, Isaac Newton, was also a skilled musician? He composed beautiful symphonies in his free time."
Alice nodded and added, "Yes, I've read about that. It's fascinating how people can excel in multiple fields. It just goes to show that our abilities are not limited to one area."
Bob, intrigued by the conversation, asked, "Do you think it's possible for someone to be a genius in both mathematics and music?"
Alice pondered for a moment and replied, "Well, it's highly unlikely. While some individuals may have a natural inclination towards both, it requires a significant amount of dedication and practice to excel in both fields."
Carol, who had been silently listening, finally spoke up, "I believe that passion plays a crucial role in achieving greatness. If someone is truly passionate about something, they will put in the necessary effort to master it."
Alice nodded in agreement and said, "Absolutely, Carol. Passion fuels our determination and motivates us to overcome challenges. It's what sets apart the average from the extraordinary."
As their conversation continued, they delved deeper into the world of mathematics, discussing various logical train of thought and problem-solving techniques. They shared their knowledge and insights, each contributing to the intellectual growth of the group.
Hours passed by, and the sun began to set, casting a warm glow over the coffee shop. Alice, Bob, and Carol bid farewell, promising to meet again soon for another stimulating conversation.
As they walked out of the coffee shop, their minds filled with new ideas and perspectives, they couldn't help but feel grateful for their shared passion for mathematics and the logical train of thought that brought them together.
And so, their journey of intellectual exploration continued, fueled by their curiosity and love for the world of numbers and logic.

vassabi Jan 12 2024 at 14:32

ну, 2.7В - это конечно "микромодель", но с другой стороны ее все-таки не позапускаешь "на чем угодно" :)
кстати, пробовали https://huggingface.co/cloudyu/Mixtral_34Bx2_MoE_60B ?

У нее неплохой рейтинг и она умеет очень неплохо генерировать и после 10к токенов.

EugeneH Jan 12 2024 at 15:39

Конкретно эту не пробовал, не знал про неё. Спасибо за наводку.

А вот Mixtral-8x7B не впечатлил. Больше было похоже на просто 7B модель.

Про что угодно, я имел ввиду, что ужатая в 4 бита модель будет занимать меньше 3 ГБ RAM, и её можно будет запустить и на распберри и на телефоне (не самом новом).

Конечно, скажи мне кто года 3 назад, что модели размером с GPT-2 будут считаться "микро" и запускаться на кофеварках, я бы не поверил.

vassabi Jan 12 2024 at 16:47

ну микстрали есть разные. Во-вторых - некоторые нейронки плохо работают когда сильно пожаты. (например данный микстраль - лучше запускать на 4 бита и больше)

кроме Mixtral_34Bx2_MoE_60B еще мне понравилась
https://huggingface.co/sophosympatheia/Aurora-Nights-70B-v1.0
она конечно тоже немаленькая (и поэтому медленная на моем железе), но она делает сравнимый вывод даже на 2 битах.

А так-то да, гоняю сетки 7В - 13В (и иногда пожатые 30В), потому что "небольшие" :D

(а год тому назад не поверил бы сам себе)

EugeneH Jan 12 2024 at 17:06

Ух ты, новая uncensored 70B модель с контекстом на 8к токенов. Это я люблю.

Вообще семейство Llama-2 вроде бы почти не теряет в качестве генерации при сжатии до q5_k_s. По крайней мере, на бенчмарках разница на уровне погрешности. 4х битные тоже имеют хорошее соотношение цена/качество.

EugeneH Jan 12 2024 at 16:58

del