Thursday, May 21, 2026
[gtranslate]

‘Indians among most avid users’: Team behind ChatGPT Images 2.0 on multilingual AI image generation

by

Post Content ​
India is playing a growing role in shaping how AI image generation models are developed, with OpenAI’s ChatGPT Images 2.0 now capable of generating everything from Manga-style panels in Hindi to more realistic depictions of crowded and chaotic Indian streets.
Earlier this week, OpenAI CEO Sam Altman said that Indian users have generated more than one billion visuals using Images 2.0 since its release in April 2026. The milestone comes a year after OpenAI first introduced the ‘Images for ChatGPT’ feature that kicked off the viral Studio Ghibli-style AI images trend.
However, OpenAI is also reportedly undergoing a broader strategic reset, pulling the plug on experimental side projects while redirecting talent and computing resources toward enterprise products. In a surprise move, the company shut down Sora, its popular AI video-generation tool, just six months after releasing it to the public.

In this context, The Indian Express sat down with members of the San Francisco-based team that built Images 2.0 to understand how exactly the latest model is a step change above previous versions and more importantly, how it was iterated for multilingual, culturally diverse markets like India – an approach that seems to be paying off in terms of adoption and user engagement.
“Previously, most of our work, including model evaluations, were done in English. Our models also struggled with a lot of details, especially in Asian languages. In Chinese, Japanese, Korean, Hindi and others, there are thousands of characters compared to just 26 letters in English,” ⁠Boyuan Chen, a research scientist at OpenAI, said.
“However, this time, we spent a lot of time making sure cultures from around the world were covered in our internal iteration process. Whenever we saw that a language was not performing well, we added a lot more data to ensure broader cultural and linguistic coverage,” Chen explained.
With ChatGPT Images 2.0, OpenAI said it has achieved significant gains in non-Latin text rendering, particularly in Japanese, Korean, Chinese, Hindi, and Bengali. The multilingual understanding of the model is said to go beyond simple translation, where language is embedded in visual outputs such as posters, comics, diagrams, etc.

Story continues below this ad

Abhi Muchhal, a product manager at OpenAI, offered another example of the model’s India-specific realism. “In the previous model, if you prompted it to make a city scene in India, it wouldn’t be crowded at all. While this model is not perfect, now you can see a realistic representation where there’s rickshaws moving left and right, and there’s a lot of people, there’s hustle and bustle,” he said.
Beyond multilingual capabilities, Images 2.0 has the ability to generate across a wide range of aspect ratios in much higher quality, with support for up to 2K resolution, and is said to demonstrate improved fidelity across a wide range of visual styles.
Challenge of multilingual image generation
As recently as 2024, text-to-image generators like DALL-E 3 struggled to spell words accurately inside images. Because diffusion models generate images by reconstructing pixels from noise, small text elements received less attention during training. The issue became more complex with regard to outputs in different languages.
But that limitation has now largely gone the way of the infamous ‘extra fingers’ problem that plagued earlier image generators.

Story continues below this ad

Declining to share details of how OpenAI achieved this, Chen said that the key was training the model to follow instructions from users better. “With this image-generation model, we wanted it to follow the user’s intent. So we trained it on both types of data, publicly available casual data and studio-style images,” he said.
“We made sure the model follows what people actually want, instead of simply outputting good-looking images,” he added.
Also Read | ‘India’s AI edge will come from our deep understanding of language’: Mission Bhashini architect
OpenAI was able to improve the model’s ability to accurately render text by applying the same advances used to improve its text-based chatbots.“It’s similar to text intelligence in ChatGPT. Depending on the prompt, it can respond robotically or more naturally and conversationally. The same idea applies here,” Chen said.
Images 2.0 is also OpenAI’s first image generation model built on top of its reasoning models and has the ability to use the web to find relevant information. It also has much more up-to-date knowledge of the world, and is more likely to understand that context than Images 1.5 did, according to Muchhal.

Story continues below this ad

According to Chen, inaccurate text placement in AI-generated images is also a problem of the past.
Unexpected ways Indians use Images 2.0
Stating that Indians have consistently been one of the most avid users of image generation, Muchhal said, “We were very happy to see the level of adoption in India, but more than the numbers, what surprised me most was the diversity of use cases.”
He also said that not all of the usage trends pertained to generating photorealistic outputs, pointing to the latest trend of asking ChatGPT to turn nice photos into scribbly drawings like the ones done on Microsoft Paint decades ago.
When asked whether viral AI image trends are intentionally shaped by OpenAI or driven organically by user behaviour, Muchhal said that it was a combination of both: “We try to pick a representative set of use cases where we know that either the model has struggled with it in the past or areas that we want to improve, and we try to improve on those. But to be honest, a lot of the things that go viral are also unexpected to us.”

Story continues below this ad

The OpenAI executives also said some of the most unexpected trends in India included AI-generated hair-colour previews, the ‘younger me’ portraits, and Y2K-style romantic portraits.
Also Read | ChatGPT Images 2.0 crosses one billion AI visuals in India: 10 most popular prompts users are trying
On enterprise adoption of AI image generators, Muchhal said, “In the past, the model struggled with accurately following instructions which made it very hard for users to be able to use this for a professional use case.”
“But what we’ve seen now with Images 2.0 is not only the personal use cases, but there’s been overwhelming enterprise demand because now you’re able to make the creative workflow go so much faster,” he added.
Safety, watermarks, and deepfake risks
Images 2.0 is also able to generate fine-grained elements, including the tiny flaws that add realism to its visuals.

Story continues below this ad

Asked about the dangers of photorealistic outputs in spreading misinformation, Muchhal said that OpenAI looks to strike a constant balance between users’ creative freedom and user safety and transparency. “We have very high standards around copyright infringement, and we make sure there is no misuse in those areas. One thing we care deeply about is ensuring there is nothing deceptive or impersonating in the outputs,” he said.
ChatGPT-generated images support the open C2PA standard which adds a clear signal in metadata that an image was generated by AI.
Earlier this week, it also announced a partnership with Google to include an invisible watermark called SynthID. But the AI-generated images do not carry a visible watermark so as not to tarnish the output, as per Muchhal.
When asked for comment on the Indian government’s recently notified AI labelling rules, which require social media platforms to attach a prominent label on AI-generated content, Muchhal said, “We believe the system needs to be built in collaboration with stakeholders […] We have shared a lot of what we are doing with government stakeholders, continue to incorporate their input, and are working to find the right balance between giving users control and meeting the trust and safety expectations set by governments.”

 

Related Articles

Leave a Comment