Not satisfied after watching “Squid Game”? Create your own ending.
Can’t wait for “Dune Part Three”? Make your own version.
Previously, maintaining consistent character appearances required significant time. Now, with just a screenshot, AI can start making movies.
This is thanks to Conch AI’s “Subject Reference” feature, powered by the new S2V-01 model. It accurately identifies the subject in uploaded images and sets it as the character in generated videos. The rest is simple: create freely with basic instructions.
Advantages of the “Subject Reference” Feature
Many companies are developing “Subject Reference” features, but not all can tackle the challenges of stability and coherence, especially maintaining consistency in motion.
While others may struggle, Conch AI excels. With just one image, it accurately understands character traits, identifies them as subjects, and places them in various scenes.
One moment Spider-Man is saving the world, the next he’s riding a motorcycle.
The Mother of Dragons, who should be training dragons in “Game of Thrones,” is now playing with a little wolf.
The breakthrough in “subject reference” lies in achieving a balance between creative freedom and fidelity. It’s like giving creators a “universal actor” whose appearance doesn’t distort but naturally changes with actions and poses, performing any action in any scene as required by the director.
Not Just a New Feature, But a Unique Technical Solution
The actual test experience shows that subject reference is a different function, with different technical challenges and requirements compared to text-to-image or image-to-image generation.
Traditional image-to-video generation only animates static images, mainly with partial modifications. For example, in this still of Song Hye-kyo, image-to-video only turns the static image into a dynamic one with limited range and no significant movements.
With the same photo, “subject reference” can create a complete segment based on text prompts, allowing free movement while maintaining stable facial features.
There are currently two technical routes for generating videos with a subject. One is based on LoRA technology, which fine-tunes pre-trained large generative models. LoRA requires significant computation when generating new videos, necessitating users to upload multiple angles of the same subject, even specifying different elements for each segment to ensure quality. This also consumes many tokens and requires a long wait time.
After extensive technical exploration, MiniMax chose a route based on image reference: images contain the most accurate visual information, aligning with the creative logic of physical shooting. In this route, the protagonist in the image is the model’s top priority for recognition—regardless of the subsequent scenes or plot, the subject must remain consistent.
Other visual information is more open and controlled by text prompts. This approach achieves the goal of “precise reproduction + high freedom.”
In this video, only one picture of the Dragon Queen was provided to the model. The final generated video accurately presented the camera language and visual elements mentioned in the prompt, demonstrating a strong understanding.
Compared to the LoRA solution, this technical approach significantly reduces the amount of material users need to upload, transforming dozens of video segments into a single image. The waiting time is measured in seconds, feeling similar to the time it takes to generate text or images—combining the accuracy of image-to-video with the freedom of text-to-video.
Highlights of Chinese Manufacturing, Meeting Your Multiple Needs
Multiple needs are not an excessive demand. Only by simultaneously achieving accurate and consistent character images and free movement can the model surpass simple entertainment uses and have broader value in industry applications.
For example, in product advertisements, a single model image can directly generate various product videos by simply changing the prompt words.
If using image-to-video methods, the current mainstream solution is to set the first and last frames, with the effect limited by the existing images. It also requires repeated attempts to collect different angles and then stitch the materials together to complete a sequence of shots.
Combining the characteristics of different technologies to better fit the video creation workflow is the advantage of “subject reference.” In the future, over 80% of marketing professionals will use generative tools at various stages, focusing only on story and plot conception, freeing their hands.
According to Statista, the market size of generative AI products in advertising and marketing exceeded $15 billion in 2021. By 2028, this number will reach $107.5 billion. In previous workflows, pure text-to-video had too many uncontrollable factors, suitable for the early stages of creation. In the European and American advertising and marketing industries, generative AI is already very common, with 52% of use cases for drafts and planning, and 48% for brainstorming.
Currently, Hailuo AI first opens the reference capability for a single character. In the future, it will expand to multiple characters, objects, scenes, and more, further unleashing creativity, as proposed by Hailuo’s slogan, “Every idea is a blockbuster.”
Since MiniMax released the video model in August 2023, it has continuously attracted a large number of users internationally, from the quality and smoothness of the generated images to consistency and stability, receiving a lot of positive feedback and professional recognition.
In the past year of technological competition, the competitive landscape of the AI video generation field has initially emerged. Sora’s implementation showed the potential of video generation, prompting major tech companies to invest heavily in this field.
With the delayed launch of Sora’s product at the end of 2024 and average user reviews, it failed to meet market expectations, giving other players a chance to seize the market.
Now, as generative video enters the second half, only three companies truly demonstrate technical strength and development potential: MiniMax’s Hailuo AI, Kuaishou’s Keling AI, and ByteDance’s Jimeng AI.
As a startup founded just three years ago, MiniMax has brought products and technology that can compete at the top level with its lean startup size. From the I2V-01-Live image-to-video model in December 2023 to the new S2V-01 model, they have been solving the challenges of previous video generation.
As technology continues to mature and application scenarios expand, video generation AI will spark a new revolution in content creation, film production, marketing, and communication. These companies, representing the highest level of China’s video generation AI field, are not only leading the Chinese market but are also expected to compete globally with international giants. Meanwhile, ensuring product stability and controllability while maintaining technological innovation will be a continuous challenge for these enterprises.
Source from ifanr
Disclaimer: The information set forth above is provided by ifanr.com, independently of Alibaba.com. Alibaba.com makes no representation and warranties as to the quality and reliability of the seller and products. Alibaba.com expressly disclaims any liability for breaches pertaining to the copyright of content.