Up to 24% Faster: Ramping Up ML, Data, and DevOps with AI

This is part 2 of our engineering experiment: testing AI in ML, Data, and DevOps.

For a deeper dive into how AI impacts front end and back end development, read Part 1.

AI and Software Engineering: Separating the Signal from the Noise

Anthropic CEO Dario Amodei recently wrote: “When a generative AI system does something, we have no idea, at a specific or precise level, why it makes the choices it does—why it chooses certain words over others, or why it occasionally makes a mistake.”

If even the CEO of an AI company says we don’t fully understand why generative AI does what it does, how are engineering teams supposed to make smart decisions about integrating it with their workflows?

At Janea Systems, we turned to experimentation. After working with AI-assisted coding for some time, we posed the inevitable question: how much is it actually worth?

The Experiment

We recruited four expert- to senior-level engineers for each of the following five domains: front end, back end, machine learning, data engineering, and DevOps. The twenty participants completed specific coding tasks with and without AI, then submitted quantitative and qualitative feedback on their performance and experience.

A Quick Heads-Up

This is the second and final installment in our series discussing the experiment’s results:

The first part introduced the experiment’s design and general findings, and presented results for two domains: front end and back end engineering. If you haven’t read it, we encourage you to do so.
This part focuses on results from the remaining three domains: machine learning, data engineering, and DevOps.

Key Takeaways

Machine learning and data engineering tasks were accelerated by a modest 24% and 10%, respectively.
DevOps engineering was the only domain where AI slowed progress - by 5%.
While all engineers were highly competent, machine learning, data engineering, and DevOps reported the lowest self-assessed scores in domain expertise, tool proficiency, and prompt engineering. These gaps likely contributed to the more limited performance gains observed in these domains.
AI primarily accelerated work by: (1) generating initial code, starter templates, code snippets, and architecture scaffolding; (2) referencing relevant documentation and best practices.
At the same time, engineers often described AI suggestions as wrong, broken, illogical, inscrutable, outdated, non-standard, and generic.
The inconsistent nature of AI-generated output leaves engineers skeptical about relying on AI for solutions and often requires them to spend additional time verifying its results.

The Tasks

Here’s a list of tasks that each domain expert tackled:

Machine Learning Engineering

An AI chat assistant integrated with an enterprise knowledge base, capable of meeting summarization, customer requirement recall, and engagement strategy suggestion.
A forecasting tool predicting demand across the supply chain to optimize inventory and reduce stock-outs.
A machine learning system for filtering job candidates based on skills, experience, cultural fit, alignment with company goals, and projected performance.
A computer vision and NLP system to assist in medical imagery analysis by highlighting areas of concern.

Data Engineering

Enriching Kafka events with reference data from a periodically updated, static database, ensuring data accuracy over time.
Using Spark Structured Streaming to aggregate session-specific events into a data sink.
Exporting Kafka streams to a Delta table while optimizing query performance by managing file size issues.
Exporting specific Kafka topics to multiple sinks using Spark.

DevOps Engineering

Building a simple Azure web application with an SQL server and database.
Containerizing an application with Docker in a standardized local development environment.
Implementing an automated CI/CD pipeline for building, testing, and deploying a containerized application to an Azure test environment.

Result Breakdown

The Impact of Prompt Engineering and Expertise

We suspect that performance gains across all five domains were influenced by engineers’ domain knowledge, tool familiarity, and prompt engineering proficiency. That impact, however, was particularly notable in the three domains discussed in this article:

Participants in machine learning, data engineering, and DevOps rated themselves lower in domain expertise and tool proficiency due to the novelty of the tools or problem types involved. With the exception of machine learning engineers, they also reported low proficiency in prompt engineering.
Machine learning engineers reported the highest prompt engineering proficiency, which may explain their stronger results despite lower scores in domain expertise and tool familiarity.

This may also explain why the results in these three domains - especially DevOps - appear underwhelming compared to the previous two.

Table 1 presents average self-reported assessments of expertise, tool proficiency, and prompt engineering familiarity across all domains.

Table 1: Engineer self-assessment

Task Acceleration

Figure 1 illustrates the domain-by-domain speedup. Machine learning and data engineering saw a 24% and 10% uptick, respectively. However, DevOps engineering experienced a decline, with AI slowing progress by 5%.

Fig. 1: Task performance improvement across domains

AI Solution Viability

Figure 2 shows the proportion of AI-generated solutions that worked out of the box. Machine learning worked in 87.5% of cases, followed by data engineering and DevOps at approximately 75% and 50%, respectively.

Fig. 2: Percentage of AI-generated solutions working out of the box

AI Solution Tweaking

None of the AI-generated solutions across these three domains worked perfectly. In every case, engineers needed to spend additional time refining or adjusting the output to make it viable. The following figures show how much effort went into this fine-tuning process.

Figure 3 reflects how much time engineers spent refining AI-generated solutions, where “1” indicates extensive time spent and “5” indicates minimal time.

Fig. 3: Time spent improving AI-generated solutions

Figure 4 shows how many changes engineers made to AI-generated solutions, where “1” indicates few changes and “5” indicates many.

Fig. 4: Number of changes made to AI-generated solutions

Engineer Experience Per Domain

Machine Learning Plus AI: 24% Faster

The use of AI in machine learning offered familiar advantages, but also revealed distinct limitations. Use cases included initial code structure generation, neural network architecture scaffolding, and auto-suggestions. Here too, AI was best suited for generating boilerplate and supporting quick prototyping.

Machine learning was also the first domain where AI faltered frequently enough that continuous, critical human oversight became essential. AI-generated code, statistics, and data were often plainly wrong, illogical, broken, outdated, or inconsistent. Moreover, models struggled when handling large datasets and complex data scenarios.

Data Engineering Plus AI: 10% Faster

Data engineers saw a modest 10% improvement when solving tasks with AI. This modest result, however, warrants context. Participants’ self-reported proficiency in the technologies used was intermediate, averaging just 2.875, even though their general domain expertise was high (4.0). Additionally, only one in four participants had studied prompt engineering. Both factors likely contributed to the more limited gains observed.

That said, AI did provide "technically correct" starter templates and accelerated solution validation. Even when AI suggestions fell short, engineers were able to leverage them to finish tasks faster. Nonetheless, outdated, incorrect, or incomplete AI-generated responses frequently prompted additional debugging, refinement, and iterative prompting.

DevOps Plus AI: 5% Slower

DevOps engineers reported the lowest domain expertise (3.25) and only moderate tool proficiency (3.5), with just one in four having studied prompt engineering. These factors likely impacted completion times, as participants needed to iterate prompts and spend additional effort verifying AI outputs.

Still, the engineers praised AI’s ability to generate infrastructure code snippets, standardize scripts through references to documentation, suggest best practices, and accelerate drafting of deployment pipelines for routine CI/CD steps.

These advantages were consistently offset by extensive manual correction when AI responses proved overly generic or off-target. Certain recommendations also caused Azure configuration mismatches, requiring extra troubleshooting. Taken together, these issues compounded and often made engineers spend more time fixing AI-generated code than they would have spent building it from scratch.

Our Verdict: Prompt Engineering and Domain Expertise Make a Difference

Machine learning, data engineering, and DevOps did not see anywhere near as dramatic an improvement from using AI as front end and back end. In fact, DevOps was slower by 5%. Machine learning and data engineering accelerated by 24% and 10%, respectively.

These improvements are not insignificant, but they pale in comparison with the 66.94% in front end and 55.93% in back end.

The truth is that the more an engineer knows about their domain, the tools of the trade, and prompt engineering, the more effectively they use AI for coding. Investing in engineer education is the way forward.

Janea Systems: Bringing Our AI Game

What’s 24% today could be 50% tomorrow - and 80% the day after. We believe there’s always room to improve, and we have experience to back it up::

Bing Maps Deep Learning

We re-engineered Microsoft Bing’s deep learning pipelines, making TensorFlow 50x faster and accelerating training by 7x.

AI-Ready Collections Platform

We designed and implemented a future-proof data architecture with Delta Lake and SCD Type 2 tracking, enabling large-scale predictive modeling and AI analytics.

PyTorch

We enabled PyTorch support on ARM64 architecture, facilitating AI development on new Windows machines and AI applications on the edge.

We don’t just play with AI - we make it better.

Let's make it better for you.