LLM in a Suit of Armor: Thoughts on AI and Software Engineering

Coworkers at Janea Systems know Federico Zambelli as a Senior Data Engineer and early adopter of generative AI. The latter was just who I needed for an article on the topic, so we set up a call for an interview.

Reportedly keen to try out new stuff and share his experience with specific tools and models, Federico struck me as hardly an AI evangelist. It was clear a few minutes into our conversation that his vast experience imbued him with realism.

Federico is not only knowledgeable on the subject but also outspoken and opinionated. These qualities, which he recognizes and credits to his Italian descent, make him a perfect candidate for a discussion about such a hot topic as generative AI and coding.

We planned our original exchange for half an hour but ended up talking more than twice that time - long enough to fill two articles. And here you go: this is the second one.

What's Behind, What's Ahead

The previous article recounted Federico's thoughts on autocomplete, hallucinations, the impact of genAI on developer productivity and code value, and the paradoxes of using AI in coding.

This time, we'll focus on his experience with specific LLMs, his view on AI coding tools and their relationships with LLMs, finishing off with a discussion on agents and agentic mode.

On Model Idiosyncrasies

"You may have noticed yourself," starts Federico, "that each model has a unique tone of voice and style by default." I couldn't agree more. Generative AI users saw this in close-up when ChatGPT 4o was released to widespread criticism of its "sycophancy". Similarly, many have decried (or even mourned) the loss of the chatbot's "personality" with the subsequent release of GPT-5. These out-of-the-ordinary incidents galvanized the public opinion but are reflective of a broader trend: model behavior and performance vary.

We’ll look at two that Zambo swears by for coding: Gemini and Claude.

Gemini

Zambo praises Gemini for keeping its knowledge cutoff recent with frequent context updates. Gemini also seems better at following instructions (except when related to the Model Context Protocol, which we'll get to later), generates more succinct code, and handles simple coding tasks well. Lastly, a generous ree tier sweetens the deal, allowing two Pro requests per minute. This is enough for regular coding assistance. Vibe coders, however, might need to switch to the paid version.

How does Claude compare?

Claude

Let's start with a caveat: at the time of our conversation, Claude 4.5 had just recently come out of the oven. Zambo took it for a spin in vibe coding but couldn’t do the same with Gemini due to its free tier’s limitations. Therefore, the comparison between the two is necessarily imbalanced.

That being said, verbosity and high cost are two major grievances Zambo has about the model. The output looks good, even beautiful. On the whole, it might outperform that of Gemini. Nevertheless, it’s still long-winded and far from production-ready, while its outward elegance hinders effective debugging.

On the bright side, Claude excels at interacting with a variety of tools embedded in modern coding clients; these tools include but are not limited to the Model Context Protocol, which we’ll explore in more detail below. Claude’s versatility makes it a more powerful addition to your coding environment, even as it comes with considerable cost. The price per token is already quite high and the more tools you connect to it, the more tokens you’ll use.

The Verdict

The choice between Claude and Gemini comes down to what you want to do with it. For those who value tool compatibility, don’t mind a higher cost, and are inclined towards vibe coding, Claude is the better alternative. The model can turn roughly vague prompts like “Design an SQL model for this API” into workable solutions without much intervention on your part.

If you don’t feel comfortable giving an LLM the wheel and prefer to be more involved in the coding process, Gemini will fit the bill. It may not be able to vibe code a custom database from scratch on the free plan, but will help you get there faster in a back and forth exchange - almost as fast as Claude in the same scenario, minus the cost.

AI Coding Clients

When I chatted with Federico prior to our conversation, he made a point about the distinction between large language models and AI coding tools. Large language models, such as Claude or Gemini, lie at the heart of AI-assisted coding. They do all the heavy lifting and translate developers' sometimes fuzzy prompts into hopefully decent code. In my mind, that meant AI coding clients were little more than interfaces.

I couldn’t have been more wrong. Zambo set me straight at the first opportunity with a colorful analogy. To him, a good AI coding assistant is like an Iron Man suit: where the suit enables Tony Stark to get the most out of his intellectual prowess, a good coding assistant lets developers get the most out of the computational capacity of large language models.

Model Context Protocol: Standardizing Agentic AI

Many AI coding clients like Cursor enabled LLMs to interact with external tools. However, each client relied on its own implementation. Model Context Protocol (MCP) solved this by introducing a standardization layer. Rolled out by Anthropic in November 2024, MCP established a universal protocol based on JSON-RPC 2.0, which allowed AI clients to connect with external services without requiring custom integrations. Engineers like Zambo view this development as significant because the standardization enables developers to plug in external tools while maintaining under-the-hood programmatic tool calling, which makes interactions more constrained and predictable.

MCP Lifecycle: From Ideation to Integration

While Model Context Protocol still remains relevant, many of the more popular solutions it championed have since been natively integrated into LLMs and AI coding clients.

When we talked, Zambo mentioned two MCP servers he liked to use: Basic Memory and Sequential Thinking. Today, LLMs provide extended reasoning capabilities natively, and many AI clients have built-in memory features, eliminating the need to invoke additional scripts that ramp up token usage..

We‘ll go through these examples anyway, as they can offer a glimpse into how MCP can extend functionality beyond what's natively built into LLMs and their clients.

MCP Use Case Examples

Note taking With Basic Memory

Basic Memory (Fig. 1) is an MCP that allows the LLM to take, save, and retrieve notes on your behalf.

Fig. 1: Basic Memory is used to update a note in Gemini CLI (source: Basic Memory official YouTube channel)

Suppose you've been brainstorming some ideas. Ordinarily, these ideas would be forgotten forever the minute you closed the chat window. With Basic Memory in the background, you can simply ask the LLM to write a note with all the ideas you have discussed listed and summarized. And then… Well, the LLM does just that. For you. No navigating to a note taking app or the folder where you keep your notes; no manually creating a new note or copying and pasting anything, let alone typing it by hand. The LLM performs the equivalents of those actions. The next day, when you need a refresher on the ideas, they're all neatly structured in a Markdown note on your computer.

Sequential Thinking For Enhanced Reasoning

Another MCP Zambo tested is Sequential Thinking. This MCP is a bit meta in that it gives the LLM instructions on how to think through problems better. Once you add Sequential Thinking to your LLM and activate it, the chatbot starts breaking the challenge down and recording its thought process in JSON for you to inspect in real time or afterwards (Fig. 2).

Fig. 2: Claude is using Sequential Thinking to answer a user’s query (source: JeredBlu on YouTube)

Exploring MCPs

Sequential Thinking had its (brief) time. Now, LLMs can perform similar reasoning on their own, so adding an MCP that does the same thing would be pointless. But new MCPs appear every day. If you’re interested in exploring them, go to Pulse MCP or mcp-get, which collect hundreds of MCPs in a single place for you to browse and experiment with.

MCP Overload: A Word of Caution

While MCP servers can be great because they expand LLMs' functionality beyond mere chatting, there are also drawbacks. LLMs browse MCP servers in search of ones that best fit a given query. As they do so, the prompt bloats with all the server descriptions. Below a certain number of servers, that doesn't affect performance; but as those servers go into hundreds, the model might slow down significantly and make more mistakes as the amount of contextual information confuses it.

Janea Systems' engineers found a workaround for this challenge by developing JSPLIT, an intelligent agent framework that preprocesses MCP servers and preselects them for the LLM, which improves accuracy, speed, and cost efficiency in high-density scenarios. You can read more about it in our recent article.

From MCP to AI Coding Clients

Model Context Protocol expanded what AI coding clients could do. Beyond core coding assistance, MCP enabled integrations with external services—automatically posting to Slack when opening pull requests, querying databases, or triggering custom workflows—giving developers new ways to automate their development processes.

AI Coding Tools: The Power of the Feedback Loop

The "selling point" of AI coding clients is two-fold. First, they can seamlessly integrate with IDEs such as VS Code, so it still feels like old times when you're working except now you can chat with a 24/7 helper from within the VS Code interface; alternatively, you can interact with AI via the command line interface (Fig. 3).

Fig. 3: VS Code interface with chatbot, agentic, and CLI options (source: VS Code official website)

Second, AI coding clients engage in a feedback loop with the LLM: they feed the generated code back the LLM for review. Oftentimes, the LLM will "spot" mistakes in the code and try to fix it. This process leads to frequent self-corrections and ultimately results in better code. Of course, you could do the same in the browser, painstakingly pasting the generated code back into the LLM and manually formulating a prompt asking it for review. But that would take up your time and mental capacity, much like hand-crafting notes instead of having Basic Memory do them for you. That's the magic of agentic AI: the code is being reviewed and iterated upon without your involvement, and you can solve other problems at the same time.

Zambo’s Recommended Coding Assistants

There's a lot of coding clients to choose from; features to look for include IDE integration, native access to MCP, CLI support, and internal feedback loop. Zambo swears by three main options: Continue and RooCode are open source with optional paid tiers, with the former notably better in the feedback loop department; Claude Code is also a favorite, although closed-source and paid.

Overall, clients such as Continue or RooCode give LLMs a wireframe to be more independent; so much so that the industry calls them “agentic” - a reference to the latest AI buzzword: agents.

How Much Agency in Agents?

Silicon Valley has a knack for coming up with catchy names that sound ironic but aren't. Artificial "intelligence"; "hallucinations". Agents, however, stand out as a rare example. Their name aptly reflects the narrow definition of agency; one that is devoid of self-determination.

Agency in the basic sense involves the ability to orient oneself in an environment and take autonomous actions towards a goal. AI agents can do that – as long as the goal is provided. When told “buy me the cheapest iPhone 13 you can find”, an AI agent will proceed to plan and execute a series of steps to achieve the objective without anybody telling it to.

This kind of agency is useful but nowhere near that exhibited by humans.

What It Means To Be Human

Perhaps the starkest way to illuminate the difference between an AI agent and a human is this: agents "want" to make you – not themselves - happy. They will even go as far as to make things up if that's what it takes to please you.

Humans, on the other hand, want to break things; cross boundaries; "boldly go where no one has gone before". We’re rebellious and irreverent. Do you think someone had to prompt the first person who came up with the dying grandma workaround for wrenching forbidden information out of LLMs? And do you think that person had any other "objective" in mind rather than "challenge accepted"? Until LLMs start doing things for shits and giggles, we're still way ahead of back-propagation automatons.

Which is not to say that these automatons are pointless. "Agent mode" is useful. After all, let's not underestimate the fact that you can ask an LLM to plan you an itinerary on Maldives and it will do so on its own accord. That's nothing short of miraculous, even if it consumes the energetic equivalent of a million hamburgers while you can do the same on one.

Code in a Suit of Armor: Thoughts on AI and Software Engineering