NVIDIA introduces AI that generates high resolution videos based on
Software

NVIDIA introduces AI that generates high-resolution videos based on text descriptions

NVIDIA introduced its text-to-video conversion AI model named VideoLDMdeveloped in collaboration with researchers from Cornell University. The model is able to produce videos based on a text description with a resolution of up to 2048 × 1280 pixels at a frequency of 24 frames and a duration of up to 4.7 seconds.

    Image source: NVIDIA

Image source: NVIDIA

The model is based on the developments of the stable diffusion neural network. The NVIDIA solution has up to 4.1 billion parameters, but only 2.7 billion of them used video for training. This is quite modest by the standards of modern AI. However, with an efficient Latent Diffusion Model (LDM) approach, developers were able to create diverse and time-consistent high-definition videos with very high quality.

The researchers highlight the following features of this model: personalized video generation and temporal convolution synthesis. Temporal layers trained in VideoLDM to convert text to video are inserted into LDM image reference networks that are fine-tuned in advance in the DreamBooth image set. Temporal layers are grouped together by DreamBooth breakpoints, allowing you to personalize the text-to-video conversion. By applying the learned time planes wrinkle-wise over time, you can get slightly longer clips with little degradation in quality.

The model is also able to create videos of driving scenes. Videos have a resolution of 1024 × 512 pixels and are up to 5 minutes long. It is possible to simulate a specific driving scenario by using bounding boxes to create an interesting environment, synthesizing an appropriate source image, and then creating believable videos. In addition, the model can make multimodal predictions of motion scenarios by generating multiple plausible missions based on a single initial frame.

This research is a participant in the Machine Vision and Pattern Recognition Conference taking place June 18-22 in Vancouver. So far, the presented neural network is only a research project and it is not clear when NVIDIA will release something like this to the public.

RELATED TOPICS

About the author

Robbie Elmers

Robbie Elmers is a staff writer for Tech News Space, covering software, applications and services.

Add Comment

Click here to post a comment