Your Next AI Query May Travel Where the Power Is

The rise of electricity-guzzling data centers has forced the AI industry to get creative about finding power. One of the latest ideas: Build micro data centers next to utility substations and operate them in concert, shifting the computation around based on power availability.

That’s the approach Nvidia and its collaborators are taking in a new pilot project they plan to build later this year. They’ll construct about 25 of these small data centers, each ranging from 5 to 20 megawatts, across 5 utilities in the United States. If one substation is overloaded with power demand, or if there’s an outage, the compute will be shifted to a different data center near a substation that has spare capacity.

To develop the fleet, Nvidia is partnering with data center builder InfraPartners , real estate service provider Prologis , and the nonprofit EPRI (formerly known as the Electric Power Research Institute).

The project aims to demonstrate a new way for data centers to be more flexible and accommodating of electricity availability. It’s also a way for data center developers to quickly secure power from the grid—an increasingly precious commodity, even in small chunks.

“We started looking at how much [unused] power is available at individual substations, and what we found was that on average, like 5 MW is nominally available…max 20 MW,” says Ben Sooter , director of Agentic AI Initiatives and Distributed AI Architecture at EPRI.

That’s too small to interest most data center operators, but building several at that size and operating them as if they’re one larger one is useful, Sooter says. Plus, shifting compute away from over-burdened substations to those with more headroom can double the overall available power, he says.

“There are 55,000 substations in the U.S., and if they each have 5, 10, or 20 MW of spare capacity, that number adds up pretty fast,” adds Marc Spieler , senior director of energy at Nvidia.

Building energy flexibility into data centers

Squeezing every spare megawatt out of the grid will become increasingly important as data center construction continues to ramp up. In the United States, where half of all new data centers are being built , data centers could consume 9 to 17 percent of electricity generation by 2030 . That’s more than double the current use, according to EPRI’s estimates. Facilities that train AI models are being built at the gigawatt scale , drawing about the same amount of power as a midsize U.S. city.

As grid operators figure out how to accommodate such massive new loads, data center developers sometimes end up waiting up to a decade to get approved for a grid connection. In response, the developers are making incredibly bold decisions around power—moves that would have been unthinkable just two years ago.

Many are building their own gas power plants on site . Some are offering to pay for the cost of new transmission lines and other grid infrastructure. And a few are even investing in startup companies that are developing fusion and next-generation nuclear fission reactors, in the hope of meeting power needs a decade from now.

But there’s a lot more power available on the grid than is used day to day. U.S. grid operators use only about 53 percent of their generation capacity on average, according to a landmark 2025 report from Duke University’s Nicholas Institute for Energy, Environment and Sustainability.

That’s because the U.S. electricity supply was built to meet peak demand—periods of the highest energy use of the year, such as the hottest days of the summer. Those peak loads can be almost double the load on a mild-temperature day, and typically occur for less than 200 hours a year. The rest of the time, whole power plants sit idle.

If AI data centers can find a way to reduce or shift power consumption during these periods of peak demand, the extraordinary measure of building on-site power generation may not always be necessary. U.S. grids could provide an additional 76 GW —about 10 percent of peak demand—if large loads like data centers curtailed their power use just 0.25 percent of the time, according to a report from the Brattle Group published in March.

Energy flexibility could also allow data centers to connect to the grid faster, because they wouldn’t have to wait for new power plants to be built. And placing small data centers right next to substations reduces the need for new grid infrastructure, such as power lines and poles, and upgraded transformers and switch gear. As a bonus, these substations already have fiber-optic lines for high-speed internet, Nvidia’s Spieler points out. So the small data center can connect to those existing lines.

The inference advantage

The type of flexibility data centers can offer depends, in part, on the workload. The two main types of workload are AI training (the process of developing, say, a large language model or image generation model) and inference (using that model to, say, generate responses to users’ chatbot questions and requests for images).

Training requires huge data centers with tightly interconnected GPUs. For example, Meta’s Llama 3.1 403B model took about two and a half months to train on 16,000 GPUs. During training, adjusting all the model weights at once at each step requires the GPUs to be connected via high-speed links, such as Nvidia’s NVlink and InfiniBand interconnects. It wouldn’t be practical to spread out AI training workloads among a fleet of mini data centers. On the bright side, because training takes months, it’s possible to pause for short periods of time to curtail energy use during peak demand.

Inference doesn’t require as many GPUs or as much fancy networking. Instead of a huge corpus of data, a single user’s query is fed into the model, and the model spits out the answer. No backpropagation is involved—that is, no large-scale coordination between different chunks of input data is needed. And so inference is amenable to smaller data centers. However, timing is key. When you ask an image generator for a picture of your face pasted onto a cute cat, you understandably expect to see the result right away. So rather than briefly pausing compute during peak demand, the energy flexibility can come through creatively shifting the workload to a different location.

“Inference is one of the few workloads that can be dynamically routed,” says Valerie Crafton , senior vice president of strategy and operations at modular data center company Mod42 . “Which means that you can align the compute with wherever the power is actually available. That’s one unique piece that’s really driving the push for a lot of these smaller data centers where the power exists.”

Both Nvidia and EPRI have been on a tear to demonstrate different kinds of data center flexibility. They’re calling their substation-based strategy “distributed inference.” Announced in February , the project aims to begin construction of the pilot fleet of small data centers by the end of 2026. Nvidia and EPRI estimate that compute workloads will need to be moved to a different substation only about 0.1 percent of the time.

Going micro in data center size is an idea that’s picking up speed. “We’re in this compute wave currently where everybody’s building these really large data centers—five gigawatt, mammoth things,” says Sooter. But “there’s a second compute wave coming,” involving much smaller data centers handling inference, he says. Tech companies are “really beating the drum on this because they see demand for inference compute really picking up in 2027,” he says.

Your Next AI Query May Travel Where the Power Is

Building energy flexibility into data centers

The inference advantage

Besoin d'un workflow n8n ou d'aide pour l'installer ?