
Solidigm Redefines Measuring Storage Efficiency for AI Workloads
As AI workloads devour data center power budgets, storage efficiency and optimization have become fundamental design requirements. In this conversation, I sat down with Solidigm’s Dave Sierra to unpack how storage teams should be thinking about watts, racks, and wear in 2025.
Dave, who is part of Solidigm’s Data Center Solutions Marketing team, walks us through the real crossover point between HDD and QLC flash for AI workloads, the architectural choices delivering measurable efficiency gains today, and why the metrics that mattered last year are being replaced by new KPIs tied directly to GPU utilization and power consumption.
Q1: Dave, since our 2024 discussion, what’s changed in data center compute efficiency? If you had to pick one KPI storage teams should optimize in 2025 (e.g., watts per TB, PB per rack, GB/s per GPU), what is it—and why?
I think last year it was an idea but now it’s a fact—there isn’t enough energy to go around for AI. Accounting for every watt consumed in the data center is now a design imperative. To that end one major efficiency story in 2025 is the mainstreaming of liquid cooling technology in the AI data center. With upcoming GPU systems exceeding 150kW per rack, liquid cooling is not only becoming essential for effective heat management, but it can also improve overall data center energy efficiency by 15% or more. Storage efficiency metrics will vary depending on usage, but in compute scenarios the best measure of energy efficiency will be throughput per watt (GB/s per W). These workloads—particularly training—involve massive sequential data transfers for loading datasets, checkpoints, and model updates, where high bandwidth directly correlates with GPU utilization and overall system performance.
Q2: AI pipelines stress storage differently across pre-training, fine-tuning, and RAG. How do you advise customers to place hot/warm/cold data across TLC and QLC tiers so GPUs stay fed without over-provisioning compute or power?
It’s apparent that AI and analytics are making data warmer. Managing more frequent data access across TLC and QLC centers around striking a balance between performance, capacity, and efficiency. For pre-training, where speed and throughput are critical, you’d place the hot data—like training batches—on TLC to ensure GPUs are never starved for data, while larger but still warm datasets can sit on QLC to save power and space. Use TLC for fine-tuning, which deals with smaller, targeted datasets, while QLC takes care of archived checkpoints or referenced data. And for RAG workloads, where latency is king, you’d want your most accessed knowledge corpora on TLC while QLC stores the bulk of less-frequently accessed but potentially relevant data. This tiered approach ensures smooth data flow without over-provisioning power or compute, keeping AI infrastructure efficient and scalable.
Q3: On the architecture side, which choices have delivered the biggest real-world efficiency gains: EDSFF (E1/E3) density, NVMe-oF disaggregation, QLC consolidation, or firmware-level power management? If you can, quantify in servers removed, watts saved, or racks reclaimed.
To some extent all these factors can work together to improve efficiency in a modern data center. But a recent Solidigm and VAST Data paper highlights how QLC SSD consolidation paired with E1.L density is revolutionizing data center efficiency, especially compared to traditional nearline HDDs. The study shows that high-density QLC SSDs, such as the D5-P5336 in E1.L form factor, help to reduce storage-based power consumption by an impressive 77% and reclaim 90% of the physical rack space compared to legacy HDD infrastructures. This consolidation eliminates the need for multi-tiered storage systems, offering a unified flash architecture with unmatched scalability and efficiency, ideal for AI, ML, and other data-intensive workloads. For those scaling to multi-petabyte or even exabyte levels, this approach exemplifies the best path forward for reducing costs, energy, and infrastructure overhead.
Q4: Write amplification and endurance are top of mind for AI data lakes and vector indexes. How are innovations like storage acceleration software, smarter caching, or autonomous power states reducing both power draw and wear—and what proof points should operators ask vendors to show?
Solidigm’s Cloud Storage Acceleration Layer (CSAL) is an example of a game-changer for AI data lakes and vector indexes. CSAL minimizes write amplification factor (WAF) to near 1.0 and improves QLC endurance through smart data management and efficient write handling. This is free-to-use software designed to deliver smarter caching to prioritize key data and autonomous power states for Solidigm SSDs. These innovations lower wear and power costs while boosting performance and reduce energy draw during idle times. Operators should ask for proof points like reductions in write amplification factor (WAF), watts per terabyte, and cache hit rates to validate how these features translate into real-world efficiency gains.
Q5: Operators are optimizing for power, space, and service time. From a purely operational lens, where do you see the HDD-to-flash crossover in 2025 for AI and analytics if the goal is maximum throughput per watt and petabytes per rack?
There’s a couple of factors at play here, not the least of which is a pronounced and well-documented nearline HDD supply shortage. This supply situation alone has directed more attention to high-capacity SSDs for AI and analytics. When it comes to meeting performance and density needs, our 122TB SSD delivers more than 20x the throughput and nearly 10x the petabytes per rack compared to a 30TB HDD. These advantages translate directly into real world cost benefits, whether it be reduced energy consumption, far fewer racks to manage, or less space to fill with HVAC and power infrastructure. AI customers are increasingly motivated to have the QLC SSD crossover conversation, and as they do we see an audience that’s now much more receptive to a TCO-driven analysis versus legacy storage.
As AI workloads devour data center power budgets, storage efficiency and optimization have become fundamental design requirements. In this conversation, I sat down with Solidigm’s Dave Sierra to unpack how storage teams should be thinking about watts, racks, and wear in 2025.
Dave, who is part of Solidigm’s Data Center Solutions Marketing team, walks us through the real crossover point between HDD and QLC flash for AI workloads, the architectural choices delivering measurable efficiency gains today, and why the metrics that mattered last year are being replaced by new KPIs tied directly to GPU utilization and power consumption.
Q1: Dave, since our 2024 discussion, what’s changed in data center compute efficiency? If you had to pick one KPI storage teams should optimize in 2025 (e.g., watts per TB, PB per rack, GB/s per GPU), what is it—and why?
I think last year it was an idea but now it’s a fact—there isn’t enough energy to go around for AI. Accounting for every watt consumed in the data center is now a design imperative. To that end one major efficiency story in 2025 is the mainstreaming of liquid cooling technology in the AI data center. With upcoming GPU systems exceeding 150kW per rack, liquid cooling is not only becoming essential for effective heat management, but it can also improve overall data center energy efficiency by 15% or more. Storage efficiency metrics will vary depending on usage, but in compute scenarios the best measure of energy efficiency will be throughput per watt (GB/s per W). These workloads—particularly training—involve massive sequential data transfers for loading datasets, checkpoints, and model updates, where high bandwidth directly correlates with GPU utilization and overall system performance.
Q2: AI pipelines stress storage differently across pre-training, fine-tuning, and RAG. How do you advise customers to place hot/warm/cold data across TLC and QLC tiers so GPUs stay fed without over-provisioning compute or power?
It’s apparent that AI and analytics are making data warmer. Managing more frequent data access across TLC and QLC centers around striking a balance between performance, capacity, and efficiency. For pre-training, where speed and throughput are critical, you’d place the hot data—like training batches—on TLC to ensure GPUs are never starved for data, while larger but still warm datasets can sit on QLC to save power and space. Use TLC for fine-tuning, which deals with smaller, targeted datasets, while QLC takes care of archived checkpoints or referenced data. And for RAG workloads, where latency is king, you’d want your most accessed knowledge corpora on TLC while QLC stores the bulk of less-frequently accessed but potentially relevant data. This tiered approach ensures smooth data flow without over-provisioning power or compute, keeping AI infrastructure efficient and scalable.
Q3: On the architecture side, which choices have delivered the biggest real-world efficiency gains: EDSFF (E1/E3) density, NVMe-oF disaggregation, QLC consolidation, or firmware-level power management? If you can, quantify in servers removed, watts saved, or racks reclaimed.
To some extent all these factors can work together to improve efficiency in a modern data center. But a recent Solidigm and VAST Data paper highlights how QLC SSD consolidation paired with E1.L density is revolutionizing data center efficiency, especially compared to traditional nearline HDDs. The study shows that high-density QLC SSDs, such as the D5-P5336 in E1.L form factor, help to reduce storage-based power consumption by an impressive 77% and reclaim 90% of the physical rack space compared to legacy HDD infrastructures. This consolidation eliminates the need for multi-tiered storage systems, offering a unified flash architecture with unmatched scalability and efficiency, ideal for AI, ML, and other data-intensive workloads. For those scaling to multi-petabyte or even exabyte levels, this approach exemplifies the best path forward for reducing costs, energy, and infrastructure overhead.
Q4: Write amplification and endurance are top of mind for AI data lakes and vector indexes. How are innovations like storage acceleration software, smarter caching, or autonomous power states reducing both power draw and wear—and what proof points should operators ask vendors to show?
Solidigm’s Cloud Storage Acceleration Layer (CSAL) is an example of a game-changer for AI data lakes and vector indexes. CSAL minimizes write amplification factor (WAF) to near 1.0 and improves QLC endurance through smart data management and efficient write handling. This is free-to-use software designed to deliver smarter caching to prioritize key data and autonomous power states for Solidigm SSDs. These innovations lower wear and power costs while boosting performance and reduce energy draw during idle times. Operators should ask for proof points like reductions in write amplification factor (WAF), watts per terabyte, and cache hit rates to validate how these features translate into real-world efficiency gains.
Q5: Operators are optimizing for power, space, and service time. From a purely operational lens, where do you see the HDD-to-flash crossover in 2025 for AI and analytics if the goal is maximum throughput per watt and petabytes per rack?
There’s a couple of factors at play here, not the least of which is a pronounced and well-documented nearline HDD supply shortage. This supply situation alone has directed more attention to high-capacity SSDs for AI and analytics. When it comes to meeting performance and density needs, our 122TB SSD delivers more than 20x the throughput and nearly 10x the petabytes per rack compared to a 30TB HDD. These advantages translate directly into real world cost benefits, whether it be reduced energy consumption, far fewer racks to manage, or less space to fill with HVAC and power infrastructure. AI customers are increasingly motivated to have the QLC SSD crossover conversation, and as they do we see an audience that’s now much more receptive to a TCO-driven analysis versus legacy storage.