Achieving consistency in AI-generated videos has always been challenging. While generating virtual models and clothing is straightforward, creating a video where Elon Musk models a fur coat remains a complex task.
Pika’s recently updated 2.0 model offers a clever solution: by uploading multiple photos, Pika can reference specific elements to generate videos with remarkable accuracy.
By providing photos of people, products, and settings, users can create a basic commercial video where the visuals closely match the original images.
Does this mean AI has solved video consistency, creating new challenges for advertisers? Not quite. While Pika is fun to use, its practicality still has room for improvement.
Creating Unreal Scenes With Pika
Pika’s multi-image input feature, called “Scene Ingredients,” allows users to combine photos and generate unique scenarios. Here’s how it works:
- Upload up to six images by clicking the “+” button.
- Add a simple prompt in the text box.
For instance, let’s have Elon Musk and Ultraman watch a movie together. Prompt: Two people sit in a dark theater, holding popcorn and focused on the screen with anticipation…
Simply upload their photos, and the theater environment is created from the prompt. While Elon Musk looks realistic, Ultraman’s appearance feels exaggerated and disconnected from the original photo.
A standout feature of Pika is its ability to “reuse” elements. For example, we can dress Musk and Ultraman in matching green coats and create a fashion photoshoot.
The photos of the two people were both sourced from ready-made images. The green coat and the icy snowy background were generated separately using AI, with the “AIGC” text on the coat serving as a challenge for Pika.
The result showed decent consistency between the scene and the coat, and the “AIGC” text was faintly recognizable. The poses of the models also followed the instructions. But the biggest issue is, who are these two people? The faces in the video and the photos may not be identical, but they are completely unrelated.
Next, we tested Pika’s outfit customization by generating a black T-shirt with the phrase “I was human.” We added a photo of Mark Zuckerberg and a photo of a ukulele to create a musical performance.
Pika followed the prompt well, and the camera movement was smooth. The clothes were also put on seamlessly, but the right hand, especially the thumb, is still not perfect.
Compared to Google Veo and OpenAI Sora, Pika’s model isn’t top-notch. Solving one problem often reveals more errors.
After trying a realistic style, let’s switch to an anime style. To have Gintoki Sakata and Naruto Uzumaki in the same frame, I choose two images with blue sky and white clouds as the background.
The background blends naturally, and the expressions are well-captured, with the wind effect on hair and clothes fitting nicely. However, the turning effect is quite unsettling. Gintoki’s eyes look lifeless, not truly rolling back.
You can also have famous paintings interact across eras—like Mona Lisa and the Girl with a Pearl Earring eating fries at McDonald’s. The effect isn’t ideal. Seeing Mona Lisa, one wonders if Da Vinci would turn in his grave. The characters look like stickers placed in the video, with odd head movements.
Sometimes, returning to simplicity yields unexpectedly good results. Uploading a Starbucks image and a Monet’s Water Lilies painting results in a “lotus-like” coffee cup.
Competing with Chinese-made models, controlling AI video is now easier
To some extent, Pika has improved video controllability. While not entirely successful, as seen in practice, Pika maintains consistency in scenes, clothing, and objects, but faces tend to distort regardless of style.
Additionally, Pika’s basic capabilities need improvement. Actions like eating or playing the piano still pose challenges. Can these issues be alleviated by drawing cards?
In three words: not affordable. Pika 2.0 is currently available only to Pro and Fancy users, costing at least $35 per month with no free trial. Moreover, Pro users get only 2000 points per month, but using the Scene Ingredients feature costs 100 points per video.
In fact, the Chinese-made AI video model Vidu implemented the “multi-image reference” feature earlier than Pika. More appealing to users, it offers free trial points.
I tested some of Pika’s cases on Vidu. Mona Lisa and the Girl with a Pearl Earring eating fries look like they just emerged from the ground, but Mona Lisa’s likeness is better than Pika’s.
Elon Musk and Ultraman watching a movie together, Musk’s face is about 70-80% accurate, but Ultraman’s face still isn’t great.
Gintoki Sakata and Naruto Uzumaki in the same frame, Vidu can generate a side face from a front face, but the style differs from the original image.
Additionally, Vidu has a limitation compared to Pika—it can only upload a maximum of three images. So, when I used Vidu to create a fashion shoot for Musk and Ultraman, I only uploaded their photos and a green coat, leaving out the background.
The results felt unfamiliar. It’s clear that maintaining facial stability is still a challenge.
When comparing Vidu to Pika, opinions may vary. Pika uses a professional version, while Vidu uses a free version, which accounts for some differences. However, the approach of Pika and Vidu is similar—using just a few image materials and simple prompts to generate relatively stable objects.
In AI video generation, maintaining subject consistency is currently more reliably achieved with the LoRA solution. This involves fine-tuning the model with a certain amount of specific subject material. With adequate material and training, the model gradually learns the appearance features of the character.
But to make AI videos more accessible and commercially valuable, the entry barrier needs to be lowered. At least with Vidu and Pika, we see the potential.
Going Viral with AI Short Videos: A One-Way Ticket to Creativity
Shortly after the release of Pika’s 2.0 model, international users were already having a blast. By repeatedly generating videos in different scenes using their own photos, they could achieve “instant universe travel.” With AI, trying on clothes is just a click away. Models and outfits flow seamlessly, saving the cost of real shoots.
Playing around with Pika gave me a feeling similar to playing “QQ Show” and “The Sims,” where we decide how to dress up the characters in the video.
If you want to fulfill Musk’s “dream,” it’s easy. First, use other AI tools to generate a “Conquer Mars” T-shirt and a red hat with “MAGA” written on it.
Then, upload these images, a Mars scene, Musk’s photo, his Optimus Prime humanoid robot, and his favorite internet meme Doge prototype to Pika.
In the end, a sunny and cheerful young man appears, with a dog on the left and a robot on the right, looking friendly but not quite like Musk.
Whether it looks like him or not is one thing; as long as you keep an open mind, the possibilities are endless. Using photos of ourselves and celebrities, we can easily engage in fandom. Upload hats, clothes, and instruments to dress ourselves from head to toe. Gather scenes, products, and models, and you have a simple commercial video…
Photos + AI images + Pika 2.0 + prompts can generate many interesting visuals. This method also avoids some of the shortcomings of video models, like writing, which can be solved with image models. Without competing directly with Google’s model capabilities or comparing with Runway’s Hollywood dreams, Pika has its unique approach.
Pika has always been a master of creativity, with its previous series of AI special effects features, Pikaffect, going viral across platforms like RedNote and TikTok, pushing Pika’s user base past 11 million.
Pika has tapped into a group of users with a high demand for entertaining short videos. Even if these videos are templated and fleeting, as long as they’re fun, people will flock to them.
Who says winning is all about taking it all? The AI market is vast, and while simulating the physical world is a grand dream, achieving the small goal of making AI short videos fun is also a form of success.
Source from ifanr
Disclaimer: The information set forth above is provided by ifanr.com, independently of Alibaba.com. Alibaba.com makes no representation and warranties as to the quality and reliability of the seller and products. Alibaba.com expressly disclaims any liability for breaches pertaining to the copyright of content.