Вступление
Давно хотелось влезть в тему генерации видео, но всё не доходили руки. А поскольку мне сильно не нравится современная мультипликация по сравнению с советской (в советской мультипликации, как по мне, были куда лучше и спокойнее образы и краски — смотреть их намного приятнее), то решил начать пробовать генерацию именно с мультиков.
Нашел простенькую сказку — «Колосок» — и стал искать, как в целом можно генерировать видео-контент своими руками.
Если вам было интересно погенерировать видео на открытых моделях и посмотреть, что из этого получится — эта статья как раз про это. Можно наглядно посмотреть результат. (Если не интересно читать материал ниже, то вот ссылки на 3 сгенерированных разными нейросетями мультика)
https://stickystick.ru/videos/?id=606 - kolosok_ltx_2_0
https://stickystick.ru/videos/?id=605 - kolosok_wan_2_2
https://stickystick.ru/videos/?id=604 - kolosok_kandinsky_5_0_lite
Железо и окружение
Решил в первой итерации начать всё делать на домашней машине. Параметры следующие:
CPU: 13th Intel(R) Core(TM) i9-13900K
RAM: 32 ГБ
GPU: Nvidia Geforce RTX 5070 Ti 16 ГБ
SSD: Samsung SSD 990 EVO Plus 1 ТБ
Плюс я накидывал еще 10 ГБ свопа, потому что модели при генерации умирали с ошибкой OOMKill (не хватало памяти).
Фреймворк: ComfyUI
Гугл и Дипсик подсказали, что для локальной (и не только) работы с нейросетями есть фреймворк ComfyUI:
🔗 https://github.com/comfy-org/comfyui
Поставил, посмотрел. Ставится достаточно легко: клонируется репозиторий и ставятся зависимости. Я делал на Python 3.10 — всё взлетело.
Если самостоятельно с нуля пилить пайплайны — это сложновато, если не владеешь технической базой. Но там много готовых продуктов, к которым можно просто скачать модель, положить в нужную директорию и начинать генерацию. Так и сделал.
Какие нейросети запускал
В итоге на моей машине запустил следующие модели:
1. Kandinsky 5.0 Lite
🔗 https://huggingface.co/collections/kandinskylab/kandinsky-50-video-lite
Модель:
kandinsky5lite_t2v_sft_5s.safetensorsПараметры: 2B
Точность: FP16
Архитектура: DiT
VAE:
hunyuan_video_vae_bf16.safetensorsТекст. энкодер 1:
qwen_2.5_vl_7b_fp8_scaled.safetensors(7B, FP8, эмбеддинг 3584)Текст. энкодер 2:
clip_l.safetensors(CLIP-L, эмбеддинг 768)
2. LTX (lxt)
🔗 https://comfyanonymous.github.io/ComfyUI_examples/ltxv/
Модель:
ltx-video-2b-v0.9.5.safetensorsПараметры: 2B
Точность: bfloat16
Архитектура: DiT
VAE: встроенный (отдельный файл не нужен)
Текст. энкодер:
t5xxl_fp16.safetensors(11B, FP16, эмбеддинг 4096)Лицензия: OpenRAIL-M (разрешено коммерческое использование)
3. Wan 2.2
🔗 https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged
Модель:
wan2.2_ti2v_5B_fp16.safetensorsПараметры: 5B
Точность: FP16
Архитектура: Dense Transformer (DiT)
VAE:
wan2.2_vae.safetensors(3D VAE, сжатие 16×16×4)Текст. энкодер:
umt5_xxl_fp8_e4m3fn_scaled.safetensors(11B, FP8, эмбеддинг 4096)
4. Cogvideo (не вошел в финальный результат)
Еще нашел и запустил шаблон с Cogvideo, но где-то потерял результаты, и лень стало восстанавливать. А поскольку она какого-то вау-эффекта не дала, пока что решил забить.
Как генерировал
Никакого дообучения или дополнительных компонент в пайплайны не засовывал. Было интересно, что модели могут прямо из коробки.
Все мультики генерились с одинаковыми параметрами:
fps = 24width = 768height = 512length = 192остальное — стандартные шаблонные значения (
steps,cfg,shift,noise)
Процесс:
Прописал сказку по сценам.
С помощью DeepSeek/Gemini сгенерил промты для каждой модели.
Через
curlпобросал запросы в API.Оставил комп на сутки работать.
Смонтировал результат в Premier Pro.
Хотел еще наложить звук и речь, но пока не стал. В следующих итерациях, если результат будет поинтереснее — наложу.
Результаты генерации
https://stickystick.ru/videos/?id=606 - kolosok_ltx_2_0
https://stickystick.ru/videos/?id=605 - kolosok_wan_2_2
https://stickystick.ru/videos/?id=604 - kolosok_kandinsky_5_0_lite
Модель | Качество | Время генерации (1 видео, 8 сек) |
|---|---|---|
Kandinsky | 🏆 победитель | 30–40 минут |
Wan 2.2 | приемлемо, но стилистика хромает | ~30–40 минут |
LTX | в основном просто шум | ~30–40 минут |
👉 Kandinsky — на мой взгляд, победитель. Лучше всех справился с таким заданием.
Примеры промтов для Kandinsky
1 { 2 "два_мышонка_выбегает_на_двор_и_начинают_бросать_друг_к_другу_мяч": "Two little mice run out of a wooden house onto a sunny green yard. They stop facing each other at some distance. They start throwing a small ball back and forth. The ball flies smoothly and slowly through the air. The mice laugh, jump, and wave their paws. Their movements are soft, playful. In the background: wooden house, well, blue sky with white clouds. Camera: static wide shot.\n\nSoviet Soyuzmultfilm animation style, lyrical branch, 1970s-1980s. Warm, na ive, fairy-tale atmosphere. NO thick black outlines. NO bold heavy contour lines. Lines are very thin, barely visible, or almost absent. Color is the main element. Flat cel shading, solid flat color fills, no gradients, no volumetric shadows. NO visible paint texture , NO watercolor stains, NO brush strokes. Soft borders between colors. Bright, saturated, cheerful palette: lush green grass, bright blue sky, warm yellow-orange house, red or blue ball. Soft diffused sunlight. Light film grain. No 3D, no photorealism. The mice have consistent character design — small, rounded, soft, no sharp features, no teeth, same as in previous scenes.", 3 4 "в_кадер_мышата_они_бросают_другу_другу_мяч_мяч_бросает_1_мышонок_второму_второй_ловит_второй_ловит_перебрасывает_второму": "Medium shot on two little mice on a sunny green yard. They stand facing each other at some distance. One mouse throws a small ball to the se cond mouse. The second mouse catches the ball with his paws. Then the second mouse throws the ball back to the first mouse. The ball flies smoothly and slowly through the air. The mice laugh, smile, and wave their paws. Their movements are soft, playful. In the backg round: wooden house, well, blue sky with white clouds. Camera: static medium shot.\n\nSoviet Soyuzmultfilm animation style, lyrical branch, 1970s-1980s. Warm, naive, fairy-tale atmosphere. NO thick black outlines. NO bold heavy contour lines. Lines are very thin, bar ely visible, or almost absent. Color is the main element. Flat cel shading, solid flat color fills, no gradients, no volumetric shadows. NO visible paint texture, NO watercolor stains, NO brush strokes. Soft borders between colors. Bright, saturated, cheerful palette : lush green grass, bright blue sky, warm yellow-orange house, red or blue ball. Soft diffused sunlight. Light film grain. No 3D, no photorealism. The mice have consistent character design — small, rounded, soft, no sharp features, no teeth, same as in previous scene s.", 5 6 "в_кадре_изба_двор_на_котором_петушок_петя_метет_метлой_на_полу_лежит_колосок": "Wide shot. A wooden log house with a porch and a wooden door stands in the background. In front of the house is a sunny green yard. Rooster Petya stands in the yard, holding a broom. H e slowly sweeps the ground with calm, gentle movements — like a kind tired grandfather doing his chores. He has a bright red comb and a fluffy tail. He sweeps the porch and the dirt area in front of the house. In the background: wooden house, well, blue sky with whit e clouds. No other characters are visible. Camera: static wide shot, showing the house, the yard, and the rooster sweeping.\n\nSoviet Soyuzmultfilm animation style, lyrical branch, 1970s-1980s. Warm, naive, fairy-tale atmosphere. NO thick black outlines. NO bold heav y contour lines. Lines are very thin, barely visible, or almost absent. Color is the main element. Flat cel shading, solid flat color fills, no gradients, no volumetric shadows. NO visible paint texture, NO watercolor stains, NO brush strokes. Soft borders between co lors. Bright, saturated, cheerful palette: lush green grass, bright blue sky, warm yellow-orange house, bright red rooster comb, brown wooden broom. Soft diffused sunlight. Light film grain. No 3D, no photorealism.", 7 8 "в_кадре_крупно_колосок_некоторое_время_потом_петушок_подносит_лицо_к_колоску_и_что_то_говорит_эй_мышата_смотрите_колосок_нашел": "Extreme close-up shot on a small golden ear of wheat lying on the ground. The ear fills most of the frame. For a few seconds, nothing happens — just the ear on the ground. Then, a rooster's face slowly comes into the frame from above or from the side. The rooster brings his face very close to the ear of wheat. He opens his beak slightly and moves it, as if saying something to the ear or about the e ar. No actual words are heard. His expression is curious, gentle, slightly surprised. In the blurred background: yard, grass. Camera: static extreme close-up, then the rooster's face enters the frame.\n\nSoviet Soyuzmultfilm animation style, lyrical branch, 1970s-198 0s. Warm, naive, fairy-tale atmosphere. NO thick black outlines. NO bold heavy contour lines. Lines are very thin, barely visible, or almost absent. Color is the main element. Flat cel shading, solid flat color fills, no gradients, no volumetric shadows. NO visible p aint texture, NO watercolor stains, NO brush strokes. Soft borders between colors. Bright, saturated, cheerful palette: golden ear of wheat, bright red rooster comb, green grass. Soft diffused sunlight. Light film grain. No 3D, no photorealism.", 9
👉 Wan 2.2 — результат приемлемый, но образы очень «крутые». Стилистику не выдерживает, хотя в промтах явно задавал. Вероятно, модель просто не обучалась на советской стилистике (или видела очень мало образов).
Иногда генерировала кадры, которые вообще не соответствуют промту. Думаю, это тюнится дообучением. Также есть специальный узел в Comfy для Wan, с помощью которого можно перекидывать контекст из джобы в джобу — но нужно тестировать
Примеры промтов для Wan 2.2
{ "1_два_мышонка_выбегает_на_двор_и_начинают_бросать_друг_к_другу_мяч": "Medium shot. Two little mice run out of a wooden house onto a sunny green yard. They stop facing each other at some distance. They throw a small ball back and forth. The ball flies smoothly and slowly through the air. The mice laugh, jump, and wave their paws. Their movements are soft and playful. Background: wooden house, well, blue sky with white clouds. Camera: static wide shot. Soviet Soyuzmultfilm style, 1970s. Warm, naive, fairy-tale mood. NO thick black outlines. Lines are very thin, barely visible. Flat cel shading, solid colors, no gradients, no shadows. No paint texture, no watercolor. Bright, saturated palette: green grass, blue sky, warm yellow-orange house, red or blue ball. Soft sunlight. Light film grain. No 3D, no photorealism. Mice are small, rounded, soft, no sharp features, no teeth.", "2_мышата_бросают_мяч_друг_другу": "Medium shot on two little mice on a sunny green yard. They stand facing each other at some distance. One mouse throws a small ball to the second mouse. The second mouse catches the ball with his paws. Then the second mouse throws the ball back to the first mouse. The ball flies smoothly and slowly through the air. The mice laugh, smile, and wave their paws. Their movements are soft and playful. Background: wooden house, well, blue sky with white clouds. Camera: static medium shot. Soviet Soyuzmultfilm style, 1970s. Warm, naive, fairy-tale mood. NO thick black outlines. Lines are very thin, barely visible. Flat cel shading, solid colors, no gradients, no shadows. No paint texture, no watercolor. Bright, saturated palette: green grass, blue sky, warm yellow-orange house, red or blue ball. Soft sunlight. Light film grain. No 3D, no photorealism. Mice are small, rounded, soft, no sharp features, no teeth.", "3_петушок_метет_двор_колосок_на_полу": "Wide shot. A wooden log house with a porch and a wooden door stands in the background. In front of the house is a sunny green yard. Rooster Petya stands in the yard holding a broom. He slowly sweeps the ground with calm, gentle movements like a kind tired grandfather doing chores. He has a bright red comb and a fluffy tail. He sweeps the porch and the dirt area in front of the house. Background: wooden house, well, blue sky with white clouds. No other characters visible. Camera: static wide shot, showing the house, the yard, and the rooster sweeping. Soviet Soyuzmultfilm style, 1970s. Warm, naive, fairy-tale mood. NO thick black outlines. Lines are very thin, barely visible. Flat cel shading, solid colors, no gradients, no shadows. No paint texture, no watercolor. Bright, saturated palette: green grass, blue sky, warm yellow-orange house, bright red rooster comb, brown broom. Soft sunlight. Light film grain. No 3D, no photorealism.", "4_петушок_подносит_лицо_к_колоску_эй_мышата_смотрите": "Extreme close-up shot on a small golden ear of wheat lying on the ground. The ear fills most of the frame. For a few seconds, nothing happens — just the ear on the ground. Then a rooster's face slowly comes into the frame from above or from the side. The rooster brings his face very close to the ear of wheat. He opens his beak slightly and moves it, as if saying something to the ear. No actual words. His expression is curious, gentle, slightly surprised. Blurred background: yard, grass. Camera: static extreme close-up, then the rooster's face enters the frame. Soviet Soyuzmultfilm style, 1970s. Warm, naive, fairy-tale mood. NO thick black outlines. Lines are very thin, barely visible. Flat cel shading, solid colors, no gradients, no shadows. No paint texture, no watercolor. Bright palette: golden ear, green grass, bright red rooster comb. Soft sunlight. Light film grain. No 3D, no photorealism.", "5_мышата_говорят_надо_колосок_перемолоть": "Medium shot. Rooster Petya stands on a sunny green yard holding a broom. Next to him stand two little mice. The rooster looks down at the mice with a kind, tired, attentive expression. The mice look up at the rooster. They open their mouths and move their lips, as if saying something to him. First one mouse speaks, then the other. No actual words. Background: wooden house, well, blue sky with white clouds. An ear of wheat lies on the ground nearby. Camera: static medium shot, showing the rooster and the mice together. Soviet Soyuzmultfilm style, 1970s. Warm, naive, fairy-tale mood. NO thick black outlines. Lines are very thin, barely visible. Flat cel shading, solid colors, no gradients, no shadows. No paint texture, no watercolor. Bright palette: green grass, blue sky, warm yellow-orange house, golden ear, bright red rooster comb. Soft sunlight. Light film grain. No 3D, no photorealism. Mice are small, rounded, soft, no sharp features, no teeth.", "6_петушок_вопросительно_говорит": "Medium shot. Rooster Petya stands on a sunny green yard in front of a wooden log house. He looks directly at the camera. He has a bright red comb and a fluffy tail. He holds a broom in his wings or has it leaning next to him. His expression is questioning, curious, slightly expectant — like a grandfather asking a question. He tilts his head slightly to one side. He opens his beak and moves his lips, as if asking something in a questioning tone. No actual words. His tone is gentle, not angry. Background: wooden house, well, blue sky with white clouds. No other characters visible. Camera: static medium shot, focused on the rooster's face and upper body. Soviet Soyuzmultfilm style, 1970s. Warm, naive, fairy-tale mood. NO thick black outlines. Lines are very thin, barely visible. Flat cel shading, solid colors, no gradients, no shadows. No paint texture, no watercolor. Bright palette: green grass, blue sky, warm yellow-orange house, bright red rooster comb. Soft sunlight. Light film grain. No 3D, no photorealism.", "7_первый_мышонок_подпрыгивает_чур_не_я": "Close-up on the first little mouse in profile view on a sunny green meadow. He has a soft, rounded face, small ears, no sharp features, no visible teeth. He does a little jump upwards. At the same moment, he breaks into a wide, playful, mischievous smile. Then he opens his mouth and moves his lips, as if saying something short and cheerful like 'Not me!' or 'Catch!' No actual words. His expression is joyful, mischievous, energetic. Blurred background: green meadow, wooden house, blue sky. Camera: static profile shot, capturing the jump, the smile, and the lip movement together. Soviet Soyuzmultfilm style, 1970s. Warm, naive, fairy-tale mood. NO thick black outlines. Lines are very thin, barely visible. Flat cel shading, solid colors, no gradients, no shadows. No paint texture, no watercolor. Bright palette: green grass, blue sky, warm house. Soft sunlight. Light film grain. No 3D, no photorealism. Mouse is small, rounded, soft, no sharp features, no teeth.", "8_второй_мышонок_чур_не_я": "Close-up on the second little mouse in profile view on a sunny green meadow. She stands on the opposite side of the screen from the first mouse. She has a soft, rounded face, small ears, no sharp features, no visible teeth. She does a little jump upwards. At the same moment, she breaks into a wide, playful, mischievous smile. Then she opens her mouth and moves her lips, as if saying something short and cheerful like 'Me!' or 'I will!' or 'My turn!' No actual words. Her expression is joyful, mischievous, energetic. Blurred background: green meadow, wooden house, blue sky. Camera: static profile shot, capturing the jump, the smile, and the lip movement together. Soviet Soyuzmultfilm style, 1970s. Warm, naive, fairy-tale mood. NO thick black outlines. Lines are very thin, barely visible. Flat cel shading, solid colors, no gradients, no shadows. No paint texture, no watercolor. Bright palette: green grass, blue sky, warm house. Soft sunlight. Light film grain. No 3D, no photorealism. Mouse is small, rounded, soft, no sharp features, no teeth.", "9_петушок_слева_мышата_справа_кто_будет_делать": "Medium wide shot. On the left side stands Rooster Petya holding a broom, leaning on it slightly. He has a bright red comb and a fluffy tail. On the right side stand two little mice, smiling cheerfully and looking at the rooster. They have soft rounded bodies, no sharp features, no visible teeth. Between them on the ground lies a small golden ear of wheat. The rooster looks at the mice with a kind, tired, slightly questioning expression. He opens his beak and moves his lips, as if asking 'Who will do it?' gently. No actual words. Background: wooden house, well, blue sky with white clouds. Camera: static medium wide shot, showing rooster on left, mice on right, ear of wheat between them. Soviet Soyuzmultfilm style, 1970s. Warm, naive, fairy-tale mood. NO thick black outlines. Lines are very thin, barely visible. Flat cel shading, solid colors, no gradients, no shadows. No paint texture, no watercolor. Bright palette: green grass, blue sky, warm yellow-orange house, golden ear, bright red rooster comb. Soft sunlight. Light film grain. No 3D, no photorealism. Mice are small, rounded, soft, no sharp features, no teeth.",
👉 LTX — по большей части сгенерировало просто шум. Были даже кадры, где все 8 секунд видео были просто переливающимся градиентом.
Примеры промтов для LTX
{ "1_два_мышонка_выбегает_на_двор_и_начинают_бросать_друг_к_другу_мяч": "Medium shot. Two little mice run out of a wooden house onto a sunny green yard. They stop facing each other at some distance. They throw a small ball back and forth. The ball flies smoothly and slowly through the air. The mice laugh, jump, and wave their paws. Their movements are soft and playful. Background: wooden house, well, blue sky with white clouds. Camera: static wide shot. Soviet Soyuzmultfilm style, 1970s. Warm, naive, fairy-tale mood. NO thick black outlines. Lines are very thin, barely visible. Flat cel shading, solid colors, no gradients, no shadows. No paint texture, no watercolor. Bright, saturated palette: green grass, blue sky, warm yellow-orange house, red or blue ball. Soft sunlight. Light film grain. No 3D, no photorealism. Mice are small, rounded, soft, no sharp features, no teeth.", "2_мышата_бросают_мяч_друг_другу": "Medium shot on two little mice on a sunny green yard. They stand facing each other at some distance. One mouse throws a small ball to the second mouse. The second mouse catches the ball with his paws. Then the second mouse throws the ball back to the first mouse. The ball flies smoothly and slowly through the air. The mice laugh, smile, and wave their paws. Their movements are soft and playful. Background: wooden house, well, blue sky with white clouds. Camera: static medium shot. Soviet Soyuzmultfilm style, 1970s. Warm, naive, fairy-tale mood. NO thick black outlines. Lines are very thin, barely visible. Flat cel shading, solid colors, no gradients, no shadows. No paint texture, no watercolor. Bright, saturated palette: green grass, blue sky, warm yellow-orange house, red or blue ball. Soft sunlight. Light film grain. No 3D, no photorealism. Mice are small, rounded, soft, no sharp features, no teeth.", "3_петушок_метет_двор_колосок_на_полу": "Wide shot. A wooden log house with a porch and a wooden door stands in the background. In front of the house is a sunny green yard. Rooster Petya stands in the yard holding a broom. He slowly sweeps the ground with calm, gentle movements like a kind tired grandfather doing chores. He has a bright red comb and a fluffy tail. He sweeps the porch and the dirt area in front of the house. Background: wooden house, well, blue sky with white clouds. No other characters visible. Camera: static wide shot, showing the house, the yard, and the rooster sweeping. Soviet Soyuzmultfilm style, 1970s. Warm, naive, fairy-tale mood. NO thick black outlines. Lines are very thin, barely visible. Flat cel shading, solid colors, no gradients, no shadows. No paint texture, no watercolor. Bright, saturated palette: green grass, blue sky, warm yellow-orange house, bright red rooster comb, brown broom. Soft sunlight. Light film grain. No 3D, no photorealism.", "5_петушок_подносит_лицо_к_колоску_эй_мышата_смотрите": "Extreme close-up shot on a small golden ear of wheat lying on the ground. The ear fills most of the frame. For a few seconds, nothing happens — just the ear on the ground. Then a rooster's face slowly comes into the frame from above or from the side. The rooster brings his face very close to the ear of wheat. He opens his beak slightly and moves it, as if saying something to the ear. No actual words. His expression is curious, gentle, slightly surprised. Blurred background: yard, grass. Camera: static extreme close-up, then the rooster's face enters the frame. Soviet Soyuzmultfilm style, 1970s. Warm, naive, fairy-tale mood. NO thick black outlines. Lines are very thin, barely visible. Flat cel shading, solid colors, no gradients, no shadows. No paint texture, no watercolor. Bright palette: golden ear, green grass, bright red rooster comb. Soft sunlight. Light film grain. No 3D, no photorealism.", "6_мышата_говорят_надо_колосок_перемолоть": "Medium shot. Rooster Petya stands on a sunny green yard holding a broom. Next to him stand two little mice. The rooster looks down at the mice with a kind, tired, attentive expression. The mice look up at the rooster. They open their mouths and move their lips, as if saying something to him. First one mouse speaks, then the other. No actual words. Background: wooden house, well, blue sky with white clouds. An ear of wheat lies on the ground nearby. Camera: static medium shot, showing the rooster and the mice together. Soviet Soyuzmultfilm style, 1970s. Warm, naive, fairy-tale mood. NO thick black outlines. Lines are very thin, barely visible. Flat cel shading, solid colors, no gradients, no shadows. No paint texture, no watercolor. Bright palette: green grass, blue sky, warm yellow-orange house, golden ear, bright red rooster comb. Soft sunlight. Light film grain. No 3D, no photorealism. Mice are small, rounded, soft, no sharp features, no teeth.", "7_петушок_вопросительно_говорит": "Medium shot. Rooster Petya stands on a sunny green yard in front of a wooden log house. He looks directly at the camera. He has a bright red comb and a fluffy tail. He holds a broom in his wings or has it leaning next to him. His expression is questioning, curious, slightly expectant — like a grandfather asking a question. He tilts his head slightly to one side. He opens his beak and moves his lips, as if asking something in a questioning tone. No actual words. His tone is gentle, not angry. Background: wooden house, well, blue sky with white clouds. No other characters visible. Camera: static medium shot, focused on the rooster's face and upper body. Soviet Soyuzmultfilm style, 1970s. Warm, naive, fairy-tale mood. NO thick black outlines. Lines are very thin, barely visible. Flat cel shading, solid colors, no gradients, no shadows. No paint texture, no watercolor. Bright palette: green grass, blue sky, warm yellow-orange house, bright red rooster comb. Soft sunlight. Light film grain. No 3D, no photorealism.", "8_первый_мышонок_подпрыгивает_чур_не_я": "Close-up on the first little mouse in profile view on a sunny green meadow. He has a soft, rounded face, small ears, no sharp features, no visible teeth. He does a little jump upwards. At the same moment, he breaks into a wide, playful, mischievous smile. Then he opens his mouth and moves his lips, as if saying something short and cheerful like 'Not me!' or 'Catch!' No actual words. His expression is joyful, mischievous, energetic. Blurred background: green meadow, wooden house, blue sky. Camera: static profile shot, capturing the jump, the smile, and the lip movement together. Soviet Soyuzmultfilm style, 1970s. Warm, naive, fairy-tale mood. NO thick black outlines. Lines are very thin, barely visible. Flat cel shading, solid colors, no gradients, no shadows. No paint texture, no watercolor. Bright palette: green grass, blue sky, warm house. Soft sunlight. Light film grain. No 3D, no photorealism. Mouse is small, rounded, soft, no sharp features, no teeth.", "9_второй_мышонок_чур_не_я": "Close-up on the second little mouse in profile view on a sunny green meadow. She stands on the opposite side of the screen from the first mouse. She has a soft, rounded face, small ears, no sharp features, no visible teeth. She does a little jump upwards. At the same moment, she breaks into a wide, playful, mischievous smile. Then she opens her mouth and moves her lips, as if saying something short and cheerful like 'Me!' or 'I will!' or 'My turn!' No actual words. Her expression is joyful, mischievous, energetic. Blurred background: green meadow, wooden house, blue sky. Camera: static profile shot, capturing the jump, the smile, and the lip movement together. Soviet Soyuzmultfilm style, 1970s. Warm, naive, fairy-tale mood. NO thick black outlines. Lines are very thin, barely visible. Flat cel shading, solid colors, no gradients, no shadows. No paint texture, no watercolor. Bright palette: green grass, blue sky, warm house. Soft sunlight. Light film grain. No 3D, no photorealism. Mouse is small, rounded, soft, no sharp features, no teeth.",
Основные минусы (на текущий момент)
❌ Ни одна модель не умеет держать контекст между кадрами/сценами.
Решение: либо дообучать модель, либо генерировать сначала картинки, а потом делать из них видео.
❌ Внутри одного кадра — сильно похожие друг на друга движения или движения, которые просто повторяются. Другими словами в этой повторяемости проявляется некоторая скучность видео
Возможно, дело в промтах. Или опять же нужно дообучение под конкретную задачу. Также может лечится монтажем
Планы на следующую итерацию
Хочу попробовать запустить те же самые промты, но на более мощных моделях:
LTX 2.3
Hunyuan Video
Wan 2.2 Turbo
CogVideoX 1.5-5B-I2V
Для этого арендовал виртуалку с карточкой NVIDIA H100 (благо почти все провайдеры позволяют арендовать GPU в почасовой оплате).
А дальше уже попробую заняться дообучением моделей. Очень интересно, что из этого может получиться.