Why AI is so resource hungry and how it will impact data centres

Why AI is so resource hungry and how it will impact data centres

The explosion of AI implementation has presented many questions in the sector surrounding AI and its power consumption, with minimal information available to answer these questions. Ed Ansett, Founder and Chairman, i3 Solutions Group, explores how AI will become increasingly demanding for data centres.

Ed Ansett, Founder and Chairman, i3 Solutions Group

At the end of 2023 any forecast of how much energy will be required by Generative AI is inexact. Headlines tend towards guesstimates of ‘5x, 10x, 30x power needed for AI’ and ‘Enough power to run 100,000s of homes’ etc. Meanwhile, reports in specialist publications such as the data centre press, talk of power densities rising to 50kW or 100kW per rack.

Why is Generative AI so resource hungry? What moves are being made to calculate its potential energy cost and carbon footprint? Or as one research paper puts it, what is the ‘huge computational cost of training these behemoths’? Today, much of this information is not readily available.

Analysts have forecast their own estimates for specific workload scenarios, but with few disclosed numbers from the cloud hyperscalers at the forefront of model building, there is very little hard data to go on at this time.

Where analysis has been conducted, the carbon cost of AI model building from training to inference has produced some sobering figures. According to a report in the Harvard Business Review, researchers have argued that training a ‘single Large Language Deep Learning model’ such as OpenAI’s GPT-4 or Google’s PaLM, is estimated to use around 300 tons of CO2. Other researchers calculated that training a medium-sized Generative AI model using a technique called ‘neural architecture search’ used electricity and energy consumption equivalent to 626,000 tons of CO2 emissions.

So, what’s going on to make AI so power hungry?

Is it the data set, i.e. volume of data? The number of parameters used? The transformer model? The encoding, decoding and fine tuning? The processing time? The answer is of course a combination of all of the above.

Data

It is often said that GenAI Large Language Models (LLMs) and Natural Language Processing (NLP) require large amounts of training data. However, measured in terms of traditional data storage, this is not actually the case.

For example, ChatGPT used www.commoncrawl.com data. Commoncrawl says of itself that it is the primary training corpus in every LLM and that it supplied 82% of raw tokens used to train GPT-3: “We make wholesale extraction, transformation and analysis of open web data accessible to researchers… Over 250 billion pages spanning 16 years. 3-5 billion new pages added each month.”

It is thought that ChatGPT-3 was trained on 45Terabytes of Commoncrawl plaintext, filtered down to 570GB of text data. It is hosted on AWS for free as its contribution to Open Source AI data.

But storage volumes, the billions of web pages or data tokens that are scraped from the Web, Wikipedia and elsewhere then encoded, decoded and fine-tuned to train ChatGPT and other models, should have no major impact on a data centre. Similarly, the terabytes or petabytes of data needed to train a text to speech, text to image or text to video model should put no extraordinary strain on the power and cooling systems in a data centre built for hosting IT equipment storing and processing hundreds or thousands of petabytes of data.

An example of a text to image model is LAION (Large Scale AI Open Network) – a German AI model with billions of images. One of its models, known as LAION 400m, is a 10TB web data set. Another, LAION5B has 5.85 billion clip filtered text image pairs.

One reason that training data volumes remain a manageable size is that it’s been the fashion amongst the majority of AI model builders to use Pre-Training Models (PTMs), instead of search models trained from scratch. Two examples of PTMs that are becoming familiar are Bidirectional Encoder Representations from Transformers (BERT) and the Generative Pre-trained Transformer (GPT) series – as in ChatGPT.

Parameters

Another measurement of AI training that are of interest to data centre operators are parameters. AI parameters are used by Generative AI models during training – the greater the number of parameters, the greater the accuracy of the prediction of the desired outcome. ChatGPT-3 was built on 175 billion parameters. But for AI, the number of parameters is already rising rapidly. WU Dao, a Chinese LLM first version used 1.75 trillion parameters. WU Dao, as well as being an LLM is also providing text to image and text to video. Expect the numbers to continue to grow. With no hard data available it is reasonable to surmise that the computational power required to run a model with 1.7 trillion parameters is going to be significant. As we move into more AI video generation, the data volumes and number of parameters used in models will surge.

Transformers

Transformers are a type of neural network architecture developed to solve the problem of sequence transduction, or neural machine translation.That means any task that transforms an input sequence to an output sequence. Transformer layers rely on loops so that where the input data moves into one transformer layer, the data is looped back to its previous layer and out to the next layer. Such layers improve the predictive output of what comes next. It helps improve speech recognition, text-to-speech transformation and such.

How much is enough power? What researchers, analysts and the press aresaying

A report by S&P Global, titled POWER OF AI: Wild predictions of power demand from AI put industry on edge, quotes several sources: “Regarding US power demand, it’s really hard to quantify how much demand is needed for things like ChatGPT,” said David Groarke, Managing Director at Consultant Indigo Advisory Group, in a recent phone interview. “In terms of macro numbers, by 2030 AI could account for 3% to 4% of global power demand. Google said right now AI is representing 10% to 15% of their power use or 2.3 TWh annually.”

A calculation of the actual power used to train AI models was offered by RI.SE – the Research Institute of Sweden. It said: “Training a super-Large Language Model like GPT-4, with 1.7 trillion parameters and using 13 trillion tokens (word snippets), is a substantial undertaking. OpenAI has revealed that it cost them US$100 million and took 100 days, utilising 25,000 NVIDIA A100 GPUs. Servers with these GPUs use about 6.5kW each, resulting in an estimated 50GWh of energy usage during training.”

This is important because the energy used by AI is rapidly becoming a topic of public discussion.

Data centres are already on the map and ecologically focused organisations are taking note. According to the site, 8billiontrees, there are no published estimates as of yet for the AI industry’s total footprint, and the field of AI is exploding so rapidly that an accurate number would be nearly impossible to obtain. Looking at the carbon emissions from individual AI models is the gold standard at this time. The majority of the energy is dedicated to powering and cooling the hyperscale data centres, where all the computation occurs.

Conclusion

As we wait for the numbers to emerge for past and existing power use for ML and AI what is clear is that it is once models get into production and use, we will be in the exabyte and exaflop scale of computation. For data centre power and cooling, it is then that things become really interesting and more challenging.

Browse our latest issue

Intelligent Data Centres

View Magazine Archive