X

MLPerf’s Breakout Year: Benchmarking Expands with AI

October 13, 2025

As AI spreads into every corner of the technology ecosystem, the industry is under mounting pressure to measure real performance consistently and transparently. Whether it's hyperscale training runs, edge deployments, or domain-specific AI applications like automotive, the need for shared benchmarks is coming into sharper focus.

At the AI Infra Conference in Santa Clara, Jeniece Wnorowski and I sat down with David Kanter, Founder of MLCommons and Head of MLPerf, for a Data Insights conversation that captured a pivotal moment for the benchmarking community. 2025 has been, as Kanter described it, “the summer of MLPerf.”

Benchmarking Matures as AI Expands  

“This has actually been a really breakout year for us,” Kanter said. “...In the last couple months, I was jotting an e-mail down to the team and I was describing it as the summer of MLPerf.”  

The momentum is tangible. He went on to describe new initiatives that reflect how AI adoption is spreading into new domains.

“We talked about MLPerf Storage. We also have MLPerf Automotive, which came out very recently, [and] MLPerf Client. And so, part of this is, as we’re seeing AI being adopted in more and more places, we have to come in and help fill those gaps. Storage was sort of us working with some storage folks to spot some coming challenges and...automotive was sort of in our response to the automotive folks saying, ‘You know, OK, we are going to be using more AI, we’re making more intelligent vehicles, we need to get our hands around this.’”

These expansions reflect the growing role of benchmarking beyond traditional training and inference in hyperscale environments. Storage and automotive, in particular, highlight the diversification of AI workloads across industries.  

Still in “Benchmarking Grade School”  

Kanter offered a candid perspective on MLPerf’s age compared to other well-established benchmarks.

“Some of the most established benchmarking organizations that we look up to are 30 years old and they’ve been honing their craft,” he said. “We’re seven years old...We’re still in grade school. The babies of benchmarks.”

That self-awareness underscores the pace at which MLPerf is evolving. Unlike existing benchmarks that matured slowly over decades, MLPerf operates in a landscape where new accelerators, models, and deployment modalities emerge every quarter.

“Just keeping a pace of things is both exciting, a little stressful, and a bit of a challenge,” he said.  

Building Community Through Standards  

Kanter emphasized that MLCommons isn’t just about benchmarking—it’s a volunteer-driven community.  

“I always say to anyone...if any of these resonate with you, please show up. We are a community of volunteers...It came together with a bunch of folks who just saw a problem and said we all want to solve it together.”  

Beyond MLPerf, he stressed that MLCommons works on a ton of projects around data including AI risk and reliability, research, and foundational datasets and schemas.

“How do we standardize making data accessible to AI?” he said. “How do we make AI more reliable and responsive to what humans want?”  

That openness has been key to MLPerf’s rise as the de facto performance yardstick across the AI ecosystem. Submissions from vendors large and small now shape how the industry evaluates real-world performance.  

Why Benchmarks Matter Now  

As enterprises wrestle with where to deploy AI—in centralized facilities, at the edge, or in hybrid environments—benchmarks like MLPerf give them objective tools to evaluate trade-offs. They’re also increasingly relevant for sustainability strategies, allowing organizations to understand performance per watt or per dollar across platforms.  

In an era of rapid infrastructure buildout and diversification, shared benchmarks provide a common language for vendors, operators, and developers alike.  

TechArena Take  

Benchmarking is becoming a strategic force in AI infrastructure. What started as a niche performance measurement initiative has grown into a foundational layer that shapes how chips are designed, systems are built, and deployments are optimized.

David Kanter’s reflections highlight both the rapid maturity of MLPerf and the youthful dynamism of a community racing to keep up with AI’s evolution. As AI spreads into storage systems, automotive environments, and edge devices, the role of shared, open benchmarks will only deepen.  

The bottom line: In the AI era, you can’t scale what you can’t measure. And MLCommons is ensuring the industry has the tools—and the community—to do just that.

Watch the full podcast | Subscribe to our newsletter

As AI spreads into every corner of the technology ecosystem, the industry is under mounting pressure to measure real performance consistently and transparently. Whether it's hyperscale training runs, edge deployments, or domain-specific AI applications like automotive, the need for shared benchmarks is coming into sharper focus.

At the AI Infra Conference in Santa Clara, Jeniece Wnorowski and I sat down with David Kanter, Founder of MLCommons and Head of MLPerf, for a Data Insights conversation that captured a pivotal moment for the benchmarking community. 2025 has been, as Kanter described it, “the summer of MLPerf.”

Benchmarking Matures as AI Expands  

“This has actually been a really breakout year for us,” Kanter said. “...In the last couple months, I was jotting an e-mail down to the team and I was describing it as the summer of MLPerf.”  

The momentum is tangible. He went on to describe new initiatives that reflect how AI adoption is spreading into new domains.

“We talked about MLPerf Storage. We also have MLPerf Automotive, which came out very recently, [and] MLPerf Client. And so, part of this is, as we’re seeing AI being adopted in more and more places, we have to come in and help fill those gaps. Storage was sort of us working with some storage folks to spot some coming challenges and...automotive was sort of in our response to the automotive folks saying, ‘You know, OK, we are going to be using more AI, we’re making more intelligent vehicles, we need to get our hands around this.’”

These expansions reflect the growing role of benchmarking beyond traditional training and inference in hyperscale environments. Storage and automotive, in particular, highlight the diversification of AI workloads across industries.  

Still in “Benchmarking Grade School”  

Kanter offered a candid perspective on MLPerf’s age compared to other well-established benchmarks.

“Some of the most established benchmarking organizations that we look up to are 30 years old and they’ve been honing their craft,” he said. “We’re seven years old...We’re still in grade school. The babies of benchmarks.”

That self-awareness underscores the pace at which MLPerf is evolving. Unlike existing benchmarks that matured slowly over decades, MLPerf operates in a landscape where new accelerators, models, and deployment modalities emerge every quarter.

“Just keeping a pace of things is both exciting, a little stressful, and a bit of a challenge,” he said.  

Building Community Through Standards  

Kanter emphasized that MLCommons isn’t just about benchmarking—it’s a volunteer-driven community.  

“I always say to anyone...if any of these resonate with you, please show up. We are a community of volunteers...It came together with a bunch of folks who just saw a problem and said we all want to solve it together.”  

Beyond MLPerf, he stressed that MLCommons works on a ton of projects around data including AI risk and reliability, research, and foundational datasets and schemas.

“How do we standardize making data accessible to AI?” he said. “How do we make AI more reliable and responsive to what humans want?”  

That openness has been key to MLPerf’s rise as the de facto performance yardstick across the AI ecosystem. Submissions from vendors large and small now shape how the industry evaluates real-world performance.  

Why Benchmarks Matter Now  

As enterprises wrestle with where to deploy AI—in centralized facilities, at the edge, or in hybrid environments—benchmarks like MLPerf give them objective tools to evaluate trade-offs. They’re also increasingly relevant for sustainability strategies, allowing organizations to understand performance per watt or per dollar across platforms.  

In an era of rapid infrastructure buildout and diversification, shared benchmarks provide a common language for vendors, operators, and developers alike.  

TechArena Take  

Benchmarking is becoming a strategic force in AI infrastructure. What started as a niche performance measurement initiative has grown into a foundational layer that shapes how chips are designed, systems are built, and deployments are optimized.

David Kanter’s reflections highlight both the rapid maturity of MLPerf and the youthful dynamism of a community racing to keep up with AI’s evolution. As AI spreads into storage systems, automotive environments, and edge devices, the role of shared, open benchmarks will only deepen.  

The bottom line: In the AI era, you can’t scale what you can’t measure. And MLCommons is ensuring the industry has the tools—and the community—to do just that.

Watch the full podcast | Subscribe to our newsletter

Transcript

Subscribe to TechArena

Subscribe