Google has announced the launch of Imagen 3, its latest and most sophisticated text-to-image generation model, on its Vertex AI platform. This next-generation foundation model is now available in preview for select customers in the United States, offering developers enhanced capabilities compared to its predecessor.
The launch comes six months after Google made Imagen 2 widely accessible on Vertex AI. Additionally, the company introduced text-to-live capabilities for this model in April. This effort is crucial for Google to maintain its competitive edge in artificial intelligence, as numerous rivals, including OpenAI’s DALL-E, Midjourney, Adobe’s Firefly, Meta’s AI, and Microsoft’s Designer, have developed their own image generation technologies.
Imagen 3 boasts several key improvements, including faster image generation, better understanding of natural language prompts, more realistic rendering of people, and greater control over text incorporation within generated images, thus marking a significant step forward in AI technology.
“It’s our most capable image generation model yet,” stated Douglas Eck, senior research director of Google DeepMind.
“Imagen 3 is more photorealistic, with richer details and fewer visual artifacts or distorted images. It understands prompts written the way people write—the more creative and detailed you are, the better. And Imagen 3 remembers to incorporate small details in longer prompts. Plus, this is our best model yet for rendering text, which has been a challenge for image generation models,” he added.
The launch on Vertex AI brings additional features, including multi-language support, safety measures such as Google DeepMind’s SynthID digital watermarking, and support for various aspect ratios. Stock photography provider Shutterstock, which has been using Imagen, has already generated millions of images with the model and is excited about the enhancements promised by Imagen 3.
While Imagen 3 introduces several restrictions, such as declining to generate images of public figures or weapons, users can still experiment and create content reminiscent of copyrighted characters by describing them. This flexibility contrasts with Grok, an AI image generator on Elon Musk’s X platform, which has been used to create more controversial content.
Google’s AI tools have faced their own challenges, with the company halting image generation capabilities in its Gemini AI chatbot earlier this year due to concerns over historically inaccurate visuals. However, Gemini and Imagen are distinct models with different purposes, as explained by Google Cloud Chief Executive Thomas Kurian.
The release of Imagen 3 comes amid a surge of advancements in image and video generation models from various tech giants. With the technology evolving, it continues to inspire creativity and push the boundaries of what is possible in digital imagery.