Visual generative AI is the process of creating images from text prompts. The technology is based on vision-language foundation models that are pretrained on web-scale data. These foundation models are used in many applications by providing a multimodal representation. Examples include image captioning and video retrieval, creative 3D and 2D image synthesis, and robotic manipulation.
]]>