Unifying Data, Simplifying AI: Data + AI Summit 2025 Recap

Earlier this month, the Data + AI Summit 2025, hosted by Databricks, took place at the Moscone Center in San Francisco. As one of the largest gatherings of data professionals, engineers, AI researchers, and enterprise leaders, the summit offered a comprehensive look into the evolving data and AI ecosystem.

The central theme of this year was bringing data and AI closer together, not just in tooling but in practice. Our colleague, Bill Sanders, was on-site to take in the keynotes, breakout sessions, and hallway conversations that make this event as much about people as it is about platforms. Here’s what stood out.

Evolution of Databricks AI Summit

Databricks co-founder and CEO Ali Ghodsi opened the summit by tracing the conference’s decade-long journey — a story of exponential growth and a shift in industry focus from data wrangling to intelligent automation.

10 Years Ago. The conference (then Spark Summit) had over 3,000 attendees. The primary focus was data engineering and simplifying big data, with the recent release of Apache Spark 1.0.
3 Years Ago. Attendance grew to over 5,000. The focus shifted more towards Artificial Intelligence, leading to the conference name change to Spark + AI Summit. Ghodsi recalled live-coding with TensorFlow and generative AI at that time, expanding their focus to both data and AI.
Today. The conference now hosts over 22,000 in-person attendees, making it a "citywide conference" and the largest data and AI conference globally. It also has 65,000 worldwide viewers from 150 countries. The program includes 700 sessions and trainings, over 350 customer presentations, and 180 exhibitors.

While the event may now draw tens of thousands, its foundations remain deeply rooted in one principle — openness. And that’s where the conversation naturally turned next.

The Mission to Democratize Data & AI

Ali Ghodsi underscored what has always been at the core of Databricks’ growth: a deep commitment to open source and a mission to democratize data and AI.

Databricks has contributed to and supported some of the most widely adopted tools in the modern data stack:

Apache Spark — with more than 2 billion downloads, Spark remains the foundation for distributed data processing at scale.
Delta Lake — with over 1 billion downloads, it introduced ACID transactions to data lakes, setting the groundwork for the lakehouse architecture.
Apache Iceberg — with 360+ million downloads, and now fully supported by Databricks since its adoption last year, enhancing open table formats for performance and interoperability.
MLflow — with over 300 million downloads, MLflow powers much of the experimentation and reproducibility in MLOps and generative AI development today.

Beyond the numbers, Ghodsi reinforced the broader goal: making advanced data and AI capabilities available to as many people and organizations as possible, regardless of their infrastructure, cloud provider, or technical maturity.

This vision isn’t pursued in isolation. Ghodsi acknowledged the critical role of key partners in realizing it: AWS, Google Cloud, Microsoft Azure, Accenture, Deloitte, and the growing network of technology and marketplace partners.

He also emphasized the importance of data providers available through the Databricks Marketplace, making Delta Sharing more seamless, enabling businesses to collaborate over live data.

And at the center of it all: 15,000 customers using Databricks to power real-world outcomes. As Ghodsi put it, “That's the real impact. That's where the interesting things are happening.”

Barriers to Data and AI Adoption

While the momentum around data and AI has never been greater, many organizations — especially those with long histories and legacy systems — struggle with AI adoption. The excitement is there. The potential is clear. But the path forward remains blocked by deep architectural and organizational complexity.

For data and analytics leaders, this moment presents an opportunity to re-evaluate architectural choices, explore new modes of user engagement, and consider emerging frameworks for modernization and AI adoption.

Bill Sanders

Here’s the most common scenario Databricks encounters across industries: an ecosystem of fragmented, overlapping technologies that have grown organically over time, often without a cohesive plan.

Most enterprises deal with:

A mix of data warehouses, both on-prem and cloud-based
Massive data lakes filled with unstructured or semi-structured data
Custom-built real-time streaming systems
Bespoke ETL pipelines tied to legacy infrastructure
Multiple generations of BI tools, some in use for decades
New layers of data science platforms, ML models, and generative AI experiments added on top

Each component may solve a specific problem. But together, they form a brittle, disjointed architecture — difficult to evolve and expensive to maintain. This results in:

Operational drag — projects slow down, innovation stalls.
Runaway costs — with overlapping tools and redundant data movement.
Vendor lock-in — often invisible until it’s too late.

Ghodsi argued that the real problem lies beneath the surface — in the metadata.

Each system not only stores data but also metadata, access controls, security models, and governance logic. Enterprises don’t just manage data silos — they manage policy silos. This makes end-to-end visibility, governance, and control nearly impossible. Without a unified approach to data and metadata, AI becomes harder to operationalize — and even harder to trust.

These challenges mirror what we observe across many of our client engagements, where disconnected data pipelines, legacy BI tooling, and fragmented governance delay innovation. In our experience, optimizing data engineering tasks can result in up to 24% faster ML workflows.

The Lakehouse and Unified Governance

In response to the architectural and operational fragmentation, Databricks continues to advance a solution that is both technically rigorous and strategically bold: the Lakehouse.

It’s a concept introduced by Databricks five years ago — initially met with skepticism, particularly from traditional data warehousing vendors and cloud hyperscalers. But today, the model has gained widespread traction. And in Ghodsi’s view, it’s not just a compelling architecture — it’s a pragmatic path forward.

Step 1: Centralizing Data with Open Formats

The first move is deceptively simple: centralize data using open formats.

That means getting data out of proprietary systems and into affordable, open cloud object storage, such as Amazon S3, Azure Data Lake Storage (ADLS), or Google Cloud Storage (GCS), using non-proprietary table formats like Delta Lake or Apache Iceberg.

The goal is not just interoperability. It’s control. The Lakehouse gives organizations full ownership of their data, while preserving performance and structure that bridges the historical gap between lakes and warehouses.

Step 2: Unified Governance

But open storage alone isn’t enough. The real breakthrough comes with governance — and more specifically, a unified governance layer that spans the entire data estate.

Key Idea 1: Govern All Data Assets

Modern governance must extend beyond databases. That includes:

Unstructured data (images, text, logs)
AI and ML models
Dashboards — which Ghodsi rightly pointed out are one of the easiest ways to leak sensitive information

The vision is lineage-aware governance: track and secure data across its full lifecycle in one consistent framework.

Key Idea 2: Unified Capabilities on Top of Data

True governance isn’t just about access permissions. It’s about enabling clarity, coordination, and control across teams. The Unity Catalog, Databricks’ governance layer, is designed to support:

Discovery – so teams can easily find relevant datasets
Collaboration – by eliminating redundant work and resolving inconsistent metrics
Lineage Tracking – to understand how data moves and transforms
Business Semantics – defining authoritative data (e.g., what constitutes “revenue”)
Cost Management – centralized visibility into resource usage
Data Quality – surfacing issues early and systematically

While many solutions in the market (like Polaris) focus narrowly on access control for structured data, Databricks is positioning Unity Catalog as a governance platform for the full data + AI lifecycle.

AI Inside the Lakehouse

Databricks extends the Lakehouse into a Data Intelligence Platform, where AI is embedded throughout to make data and insights accessible to everyone. The vision focuses on two goals:

Democratizing Data Access – enabling users to query and understand data using natural language with two tools: Genie and Assistant.
Democratizing AI – empowering businesses to build their own intelligent agents that reason over proprietary data with Mosaic AI Agent Framework and classic ML & Generative AI.

By combining open data infrastructure with intelligent automation, Databricks aims to simplify the user experience and prepare organizations for the next wave of AI-native workflows. We've explored this in depth in our experiment on AI in frontend and backend engineering, where AI streamlines developer velocity across the stack.

A New Era for Enterprise Data

Chase’s Jamie Dimon from JPMorgan, one of the guest speakers, hammered home the message: “AI isn’t the hard part — data is.”

That message echoed throughout the Databricks Data + AI Summit 2025, where the spotlight was not on flashy AI demos, but on the groundwork required to make AI viable at scale. The AI Summit revealed a decisive shift in how enterprises approach data, moving beyond fragmented stacks and legacy dashboards toward simplified, AI-native platforms that emphasize usability, governance, and trust.

While some of the announcements will take time to translate into real-world impact, the direction is unmistakable: simplified experiences, governed access, and more intelligent consumption of data.

As data complexity and expectations around AI grow, data management takes a central stage. Janea Systems team brings over two decades of experience helping organizations manage and modernize their data infrastructure, from open-source platforms to enterprise-grade software.

We're proud to be one of Microsoft’s long-standing technical partners. Our engineering team co-developed mission-critical MSFT projects, including:

Whether you modernize pipelines or adopt AI across the SDLC, our team has the experience delivering intelligent systems that scale. We’ve explored AI-assisted development across the software development lifecycle and now leverage our findings for our clients’ projects.

Get in touch – let's explore ways to simplify and scale your data architecture.