Microsoft now demands a piece of the "AI image generator" market.

Updated: Jan 25

According to Microsoft, NUWA-Infinity is superior to DALL-E, Imagen, and Midjourney in that it can create long-duration videos and high-resolution photos of any size.

The capacity of text-to-image generative models like OpenAI's DALL-E 2 to generate images just from word prompts is garnering a lot of interest. Other emerging AI picture generators include Google's "Imagen," Ultraleap's "Midjourney," Hugging Face's "Craiyon," Meta's "Make-A-Scene," and others. DALL-E 2 is the most well-known.

It appears like Microsoft now wants a piece of the "AI picture generator" action. NUWA-Infinity, a multimodal generative model created to produce high-quality pictures and movies from any given text, image, or video input, was recently revealed by Microsoft's Asia research team.


Microsoft said that it tested NUWA-Infinity on five high-resolution visual synthesis tasks for its research paper, "NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis."

· Generation of Unconditional Images

· Text-to-Image

· Animation of Text-to-Video Images

· Image Outpainting

In terms of quality and variable-size production, NUWA-Infinity offers better visual synthesis capabilities than its predecessor, "NUWA," which also includes photos and videos.

The majority of existing datasets cannot be utilised for training or assessment since NUWA-Infinity is focused on creating high resolution and extended duration pictures and films. In order to train the model, the team created four additional datasets with high resolutions.

The researchers also disclosed that they will pre-train NUWA-subsequent Infinity's iteration with additional gathered visual data and report its generalisation skills using inputs from open-domain systems.

This will give a boost to the computer vision video and image analytics technology.

The fact that NUWA Infinity can create movies from text is the biggest catch, though. It can produce hidden films from a straightforward request. Additionally, it can turn doodles into videos. It can produce transiently reliable open domain movies.

It can also forecast what frames will come next in a video. If a user enters an image and requests that the computer anticipate the following frames, NUWA Infinity will do so, whether the image is of a landscape or a person's face.

How does it compare to the competition?

The first thing that distinguishes NUWA-Infinity from its rivals is that it is built to produce high-quality films as well as photos from a given text, image, or video, something that neither of its rivals can do.

According to Microsoft, "compared to DALL-E 2, Imagen, and MidJourney, NUWA-Infinity can create high-resolution pictures of any size and allow long-duration video creation."

AI picture generators are popular on the internet.

A million customers on OpenAI's waiting list will now be able to purchase DALL-E 2 starting today, according to the business's recent announcement. Microsoft has invested in this startup as well. Users who got access to DALL-E 2 were already utilising the AI to create inventive photos through prompts and uploading them on social media before this ever happened.

The most recent social media sensation was caused by a TikTok user who used the prompt "selfie at the end of the world" on DALL-E 2 and shared the results. However, given the apocalyptic vibe, the consequences can be uncomfortable for some.


