HomeGame GuidesMicrosoft's new AI creates super-realistic talking head fakes, and it made the...

Microsoft’s new AI creates super-realistic talking head fakes, and it made the Mona Lisa rap

Published on

Microsoft Research Asia has published a new paper introducing VASA, a framework for creating faces from life-like words. The researchers presented their model, called VASA-1, which can create realistic videos based on just one static image and an audio clip of speech. The full paper is Available at arXiv.

The results are impressive and beat all previous tools that use generative artificial intelligence to produce realistic deep fakes.

What is particularly interesting about VASA-1 is the general ability to imitate natural facial expressions, a wide range of emotions and the ability to lip sync with very few objects.

The researchers admit that the model – like all other models – still struggles with non-rigid elements, such as hair. However, in this area as well, the model performs above average, easing one of the known red flags when identifying an inauthentic and deeply fake video.

The technical cornerstone, Microsoft says, is an innovative model of holistic facial dynamics and head movement generation that operates in the latent space of an expressive and disaggregated face. VASA-1 also offers real-time efficiency:

“Our method produces 512 × 512 video frames at 45fps in offline batch processing mode, and can support up to 40fps in online streaming mode with a prior latency of only 170ms, evaluated on a single GPU NVIDIA RTX 4090 desktop computer.”

The tool based on the new model is very easy to use and even offers the ability to control “optional signals as a condition”, meaning the user can set head-to-eye gaze direction, head distance and emotion offset:

VASA-1 also handles unrealistic inputs, such as art. Therefore, it can actually bring life to paintings as well.

The model can also make the images sing, rap or speak in languages ​​other than English. As one of the examples, Microsoft presented a funny clip of a crazy Mona Lisa rap:

It is important to highlight the potential harm that such technology can cause when it is used to produce content that imitates real people – not just politicians and celebrities, but also ordinary citizens. The good news is that Microsoft researchers are aware of the risk:

“We have no plans to release an online demo, API, product, additional implementation details, or any related offerings until we are confident that the technology will be used responsibly and in accordance with appropriate regulations.”

Microsoft recognizes the possibility of abuse. However, he also highlights the potential benefits of technology, ranging from improving educational equity, improving accessibility for people with communication challenges, and offering companionship or therapeutic support to those in need.

It is worth mentioning that Microsoft’s competitor, OpenAI, is also facing a similar dilemma. Just recently, OpenAI introduced a powerful AI model for voice cloning, but chose not to make it public. The company argues that the wider release of this technology should go hand in hand with policies and countermeasures to prevent its misuse.

Latest articles

More like this