October 09, 2025
By Hubert Brychczynski
Generative AI,
Software Development
How many generative AI tools for coding are there? Depends on the perspective. If you look solely at models, the options boil down to three major players: ChatGPT, Claude, and Gemini. Then there are tools like GitHub Copilot or Cursor, which integrate those major LLMs under a single interface.
If you care about the engine more than the bells and whistles, this article is for you. The idea is to compare the three major LLM providers with respect to coding assistance in the hopes of helping you choose the right foundation for your workflow. For a breakdown of tools like Cursor or Copilot, that's a topic for another time.
Full disclosure: I'm worried this article might get outdated in the time it takes you to read it. Apparently, that is the pace of progress we're dealing with in the AI industry right now. When I was still ideating, Claude was in version 4.1, which had premiered on August 5, 2025. Then, less than two months after that, Claude 4.5 dropped on September 29 with the bombshell that it had spent 30 hours building a Slack clone by itself.
So, you can imagine my apprehension about writing a rundown of the current AI models and their applications in coding.
Nevertheless, I will do my best to give you the latest information - at least as of now, which is (checks watch) 2:30pm Central European Time on October 9, 2025.
Engineers at Janea Systems have never shied away from using large language models to assist in and accelerate coding. In fact, we have even run in-house experiments to see how exactly the use of generative AI (genAI) is impacting software development at the company.
We have broken down these experiments in a series of three articles, complete with tables, graphs, and tangible numbers to give you a sense of what was happening at the intersection of artificial intelligence and programming. If you think these experiments were a no-brainer and AI invariably added value to the work, you'd be wrong. At least at the time of writing those reports, AI wasn't unequivocally beneficial in every use case we used it for.
"Time" is the key word here. We ran and reported on those experiments just a few months ago, but when I reached out to one of the developers behind the project to dig up some additional insights for this write-up, he said they were already outdated.
What is the situation now?
ChatGPT's Codex is a cloud-based software engineering agent that handles writing features, fixing bugs, answering codebase questions, and proposing pull requests for review. Operating in a secure, isolated cloud sandbox, Codex can read and edit files, run commands, and work on multiple tasks in parallel. According to OpenAI, tasks typically complete in 1 to 30 minutes, with real-time progress monitoring and verifiable evidence through terminal logs and test outputs. You can also guide Codex with AGENTS.md files, and there's a lightweight open-source CLI version (codex-mini-latest) for use in the terminal.
What sets Codex apart is that it's built on the specialized codex-1 model, optimized for software engineering and asynchronous delegation. The security profile is also impressive: analysis by Sonar reveals that GPT-5 produces the lowest vulnerability density among leading models - just 0.12 per 1,000 lines of code.
The drawbacks? In comparative testing by YouTube creator Cole Medin, Codex proved significantly slower, taking 1 hour and 20 minutes to build a Stripe implementation, which Claude completed in 15 minutes. In addition, according to Sonar, GPT-5 generates substantially larger, more complex code than any other model tested, with the highest cyclomatic and cognitive complexity scores. This verbosity creates high technical debt and maintainability challenges, plus a notably high code smell density (25.28 per KLOC).
Claude Code is Anthropic's agentic coding tool powered by their latest model, Sonnet 4.5. Capable of building features from descriptions, debugging, codebase navigation, and large refactors, it can also do things like edit files, run commands, or create commits. In addition, agentic search enables the model to understand an entire codebase without manual context selection. Available integrations include a VS Code Extension (Beta) and JetBrains IDE support. Finally, Claude uses the Model Context Protocol (MCP) to pull information from Google Drive, Jira, Figma, and Slack.
Anthropic positions Claude as a leader for agents, coding, and computer use, pointing to benchmarks such as 77.2% on SWE-bench Verified and 61.4% on OSWorld. The company states the model excels at extended autonomous operation, maintaining clarity over hours-long tasks, with Extended Thinking mode designed to boost complex reasoning quality. In Cole Medin's testing, Claude demonstrated strong speed advantages in specific feature-building tasks.
The benefits come at a cost—quite literally. Claude Code is entirely closed-source and only available with paid tiers or standard API pricing, which can add up with high-volume use.
Another cost is verbosity. Per Sonar's earlier analysis, Claude Sonnet 4 (the predecessor to 4.5) demonstrated a "highly verbose personality," generating almost two times as many lines of code for the same solution as GPT-4o (the leading OpenAI's model at the time), and 30% more than its own predecessor, Claude 3.7. It's unclear yet whether Sonnet 4.5 has improved on this front, but given the rising trend across successive versions, developers should be aware that it may produce significantly more code than other models for equivalent functionality.
Google's Gemini Code Assist comes in three editions: Individuals (free), Standard, and Enterprise. It is also the only option in the list without a dedicated front-end for coding. Instead, it provides conversational assistance in VS Code, JetBrains, and Android Studio using opened file context, with automatic code completions and full function generation. It is also available as an open-source CLI, handling file manipulation, command execution, and troubleshooting. Its Agent Mode supports complex, multi-step tasks, integrates via Model Context Protocol (MCP), and allows GitHub users to automatically review pull requests with suggested changes. The underlying models are Gemini 2.5 for the chat interaction and a tailored version of Gemini 2.0 for coding.
In terms of advantages, Gemini’s CLI is the only major, fully open-source solution with generous free limits—1,000 model requests daily in the free tier for individuals. In addition, testing by Amanda Caswell at Tom's Guide showed Gemini 2.5 Pro excelling in agentic task simulation, structuring tutorials to mirror realistic IDE workflows.
Limitation-wise, the free tier isn't available for Google Workspace accounts, minors, or certain locations/networks. Fireship also noted that the CLI experience can be "rougher around the edges" than that in Gemini's competitors.
Although I'd like to tell you otherwise, there isn't a clear winner. The available data shows trade-offs, not a hierarchy. Claude demonstrated faster task completion in limited testing but tends toward verbose, complex code. Gemini showed strength in workflow simulation; Codex offers the strongest security profile at the cost of speed.
What matters is your specific context. What languages or fields are you working in? Do you value concise code or comprehensive solutions? Is cost a factor? Are you building greenfield projects or maintaining legacy systems? Is security paramount?
If you have access, try all three on a representative task from your actual work. The "best" tool is the one that fits your workflow, codebase, and brain.
Important caveats: These comparisons are based on limited testing scenarios, and individual results will vary significantly based on programming language, project type, specific use cases, and personal workflow preferences. All information is accurate as of October 9, 2025 - check vendor websites for current features and pricing.
The leading generative AI models for coding are OpenAI’s GPT-5 with Codex, Anthropic’s Claude Sonnet 4.5 with Claude Code 2.0, and Google’s Gemini 2.5 with Code Assist. Each enables developers to write, debug, and refactor code through natural-language interaction, often integrating directly with IDEs or command-line tools.
A model is the underlying engine that generates code, text, or other content based on training data and fine-tuning. An interface is the tool or environment that allows users to interact with that engine. For example, ChatGPT, Claude, and Gemini are models, while GitHub Copilot or Cursor are interfaces that integrate one or more of these models into development environments.
GPT-5 Codex is optimized for security and precision, producing verifiable outputs but often generating verbose, complex code. Claude Sonnet 4.5 with Claude Code 2.0 is designed for speed and extended reasoning, performing well in long, complex tasks but available only through paid tiers. Gemini 2.5 with Code Assist emphasizes openness and accessibility, offering generous free limits and open-source tools but lacking the polished experience of its competitors.
Ready to discuss your software engineering needs with our team of experts?