Post Content
AI coding startup Cursor has launched a new model called Composer 2.5 that has been specifically trained for long-running coding tasks.
Composer 2.5 also follows complex instructions more reliably, besides other behavioural improvements such as communication style and effort calibration, Cursor said in a blog post on Monday, May 18. The improvements in Composer 2.5 have come from scaling training, generating more complex RL environments, and introducing new learning methods, as per the company.
Composer 2.5’s debut arrives months after Cursor’s Composer 2 model that drew some backlash after users found that the model was a RL-modified version of Kimi 2.5, an open-weight AI model recently released by Moonshot AI, a Chinese AI startup backed by Alibaba and HongShan (formerly Sequoia China).
Acknowledging that Composer 2 was built on top of Kimi 2.5, Lee Robinson, Cursor’s vice president of developer education, said, “Yep, Composer 2 started from an open-source base!” “Only ~1/4 of the compute spent on the final model came from the base, the rest is from our training,” he added.
“It was a miss to not mention the Kimi base in our blog from the start. We’ll fix that for the next model,” Aman Sanger, the co-founder of Cursor, said.
To be sure, the latest 2.5 variant is also built on the same open-source checkpoint (Kimi K2.5) as Composer 2. Besides not developing its coding model from scratch, Cursor relying on a Chinese model base could potentially raise concerns amid the global AI arms race that is often framed as an existential battle between the United States and China.
Last year, the US-based startup raised a $2.3 billion round at a $29.3 billion valuation, and is reportedly exceeding $2 billion in annualised revenue. In April, Elon Musk-owned SpaceX, which is also now the parent firm of xAI, announced plans to acquire Cursor for $60 billion sometime later this year.
Cursor on Monday said that it is already working with SpaceXAI (the new AI division of SpaceX) to train a “significantly larger model” from scratch using 10 times more total compute from millions of H100-equivalent GPU clusters that make up the Colossus 2 supercomputer.
Story continues below this ad
Under the hood
Meanwhile, Cursor said that it made several new changes to the training stack of Composer 2.5 that focused on improving model intelligence and usability. For starters, Composer 2.5 was trained with targeted textual feedback during reinforcement learning (RL), which allowed them to provide feedback directly to the model at the point in the trajectory where the model could have behaved better.
“For a target model message, we construct a short hint describing the desired improvement, insert that hint into the local context, and use the resulting model distribution as a teacher,” Cursor said. “This gives us a localised training signal for the behavior we want to change, while still retaining the broader RL objective over the full trajectory,” it added.
Also Read | OpenAI looks to bring ChatGPT, Codex under one product team in latest shake-up
For example, when Composer 2.5 attempts to call a tool that is not available during a long rollout, it will receive text feedback on the mistake where a hint such as “Reminder: Available tools…” is inserted in the context of the problematic turn.
Composer 2.5 is also trained on 25 times more synthetic data (in the form of difficult coding tasks) than its predecessor. However, Cursor warned that the latest model is more susceptible to reward hacking as a consequence of training on synthetic tasks. “We were able to find and diagnose these problems using agentic monitoring tools, but they demonstrate the increasing care necessary for large scale RL,” it said.
Story continues below this ad
Performance on benchmarks
Composer 2.5 matched leading AI models such as Anthropic’s Opus 4.7 and OpenAI’s GPT-5.5 when evaluated on benchmark tests such as SWE-Bench Multilingual (79.8 percent) and CursorBench v3.1 (63.2 percent).
However, Composer 2.5 is much cheaper to use per task as it is priced at $0.50 per million input tokens and $2.50 per million output tokens, a fraction of what Anthropic and OpenAI currently charge.
Also Read | 75% Google code now AI-generated, says Sundar Pichai: How company is putting AI to work
There’s also a faster variant with the same intelligence at $3.00 per million input and $15.00 per million output tokens. Composer 2.5 includes double usage for the first week.