The Chinese multinational Alibaba, best known for its e-commerce operations, also invests heavily in technological development projects. Researchers at the company’s Intelligent Computing Institute showed off their new AI video generator, EMO.
EMO, or Emote Portrait Alive, is a «Audio-based expressive video and portrait generation framework» that converts a single reference still image and vocal audio into an animated avatar video with facial expressions and poses.
Among the numerous examples created by the team is taking an AI-generated woman in sunglasses from OpenAI’s debut Sora and making her sing “Don’t Start Now» by Dua Lipa. Fortunately, the character is one of Sora’s least terrifying creations.
Another example shows an AI-generated photo of Da Vinci’s Mona Lisa and makes her sing “flowers» by Miley Cyrus, covered by YUQI. In another clip, Audrey Hepburn sings a cover of an Ed Sheeran song. The RINKI YouTube channel collected all Alibaba demo videos and upscaled them to 4K.
A key part of EMO is that it can lip sync a synthesized video clip with the actual audio, so the model supports songs in multiple languages. It also works with numerous art styles, whether it’s photography, painting, or anime-style cartoons. It also works with other audio inputs, such as typical speech.
In theory, an audio input wouldn’t have to be “authentic” either. This week, Adobe introduced a new generative AI platform capable of creating music from text messages. And as celebrities like Taylor Swift know, it’s very easy to generate voices that sound realistic.
The model, based on a stable diffusion backbone, is not the first of its kind, but it is arguably the most effective. There are notable imperfections in this initial effort, such as a fairly strong smoothing effect on people’s skin and occasionally jarring mouth movements. Still, the overall accuracy of lip movements in response to input audio is notable.
The complete research from the Alibaba Institute for Intelligent Computing is published on Githuband the associated research article It is available on ArXiv.