Generative AI Breaks The Data Center: Data Center Infrastructure And Operating Costs Projected To Increase To Over $76 Billion By 2028

0

With the launch of Large Language Models (LLMs) for Generative Artificial Intelligence (GenAI), the world has become both enamored and concerned with the potential for AI. The ability to hold a conversation, pass a test, develop a research paper, or write software code are tremendous feats of AI, but they are only the beginning to what GenAI will be able to accomplish over the next few years. All this innovative capability comes at a high cost in terms of processing performance and power consumption. So, while the potential for AI may be limitless, physics and costs may ultimately be the boundaries.

Tirias Research forecasts that on the current course, generative AI data center server infrastructure plus operating costs will exceed $76 billion by 2028, with growth challenging the business models and profitability of emergent services such as search, content creation, and business automation incorporating GenAI. For perspective, this cost is more than twice the estimated annual operating cost of Amazon’s cloud service AWS, which today holds one third of the cloud infrastructure services market according to Tirias Research estimates. This forecast incorporates an aggressive 4X improvement in hardware compute performance, but this gain is overrun by a 50X increase in processing workloads, even with a rapid rate of innovation around inference algorithms and their efficiency. Neural Networks (NNs) designed to run at scale will be even more highly optimized and will continue to improve over time, which will increase each server’s capacity. However, this improvement is countered by increasing usage, more demanding use cases, and more sophisticated models with orders of magnitude more parameters. The cost and scale of GenAI will demand innovation in optimizing NNs and is likely to push the computational load out from data centers to client devices like PCs and smartphones.

For background, today, the vast majority of NN inferences are executed on servers accelerated by Graphics or Tensor Processing Units (GPUs or TPUs), which are designed to perform the parallel math of matrix calculations. Each accelerator applies thousands of coefficient “parameters” (whose analogue is a synapse) to each “node” (whose analogue is a neuron). Networks are arranged in layers, where each layer consists of thousands of nodes, and each node has thousands of connections to nodes in the prior and subsequent layer. In LLMs, these nodes ultimately map to tokens, or text language objects and symbols. The history of previously generated tokens – such as a prompt and the subsequent generated response – are then employed to assign probabilities and choose one from among the most likely next tokens.

The next wave of LLMs such as GPT-4 are being trained on massive data sets with a goal of creating neural networks estimated to exceed one trillion parameters. Today, one model must often run across multiple accelerators and multiple servers to execute a trained LLM, which will drive costs up rapidly. Even smaller models ranging in tens or hundreds of billions of parameters can easily exceed the memory capacity and performance requirements of powerful, cloud-based GPU or TPU accelerators with large amounts of memory designed to run the algorithms efficiently.

To forecast the operating cost of GenAI, Tirias Research applies a Forecast Total Cost of Operations (FTCO) model of complex data center workloads on various hardware configurations. The FTCO model incorporates advances in technology, changes in end user demand, and changes to workloads like media streaming, cloud gaming, and machine learning (ML). In the case of GenAI, this means factoring in processing advances, which for the foreseeable future will continue to be driven by GPU accelerator technology; exponential increases in the data sets and the resulting number of parameters of trained NN models; improvements to model optimizations; and the insatiable demand for GenAI.

First let’s address user demand. Today, GenAI is being used to generate text, software code, and images along with emerging applications including video and sound, and 3D animation. In the future, these foundational capabilities will power increasingly sophisticated GenAI applications including generating video entertainment, creating metaverses, teaching, and even for generating processes for urban, industrial, and business applications. Today, OpenAI’s ChatGPT is rapidly approaching 2 billion monthly visitors, and Midjourney, the popular GenAI art community, has over 15 million users.

To forecast the demand, Tirias Research analyzed three foundational GenAI capabilities – text, imagery, and video – and segmented the emerging markets into ad-driven consumers, paid subscription users, and automated tasking. For text GenAI, demand for tokens, analogous to words or symbols, is forecast to exceed 10 trillion by the end of 2023 with over 400 million monthly active users concentrated in developed markets. By the end of 2028, the forecast estimates over 6 billion users or about 90% of smartphone market penetration and over 1 quadrillion annual tokens or a 100X increase. For image GenAI, the increase is forecast to be significantly higher at over 400X to over 10 trillion images, driven by the emergence of video, which will require the production of sequences of thematically and visually connected images using more sophisticated image generation tools and sophisticated prompting loops.

Second, let’s address the computational workload. GenAI models are improving in efficiency as an unprecedented amount of academic and business knowledge pours into the field of machine learning (ML) and GenAI. The quality of GenAI imagery and tokens varies across segment and by factors such as resolution and model size, with paid usage assigned to higher quality outputs and a corresponding higher utilization of data center compute resources. Projected workloads will combine demanding large models with more efficient, computationally optimized, smaller NNs. “The emergence of more efficient neural networks, trained by more sophisticated NNs, will be one of several forces that drive generative AI to more viable economics and lower environmental impact.” said Simon Solotko, Senior Analyst at Tirias Research and developer of the FTCO model. Massive parameter networks will be employed to rapidly train smaller networks, able to run more cost effectively and on distributed platforms including PCs, smartphones, vehicles, and mobile XR. HuggingFace recently demonstrated two new trained ChatGPT-like LLMs, the 30 billion parameter vicuna-30B and the 13 billion parameter vicuna-13B, using Facebook’s LLaMA LLM framework trained employing ChatGPT user logs. This clever technique resulted in a ChatGPT-like LLM that can run on a single consumer device with responses that are not dissimilar to the larger models that trained it. Highly optimized, or even simpler and more specialized models, are expected to reduce data center costs at scale, both by reducing model sizes in the cloud, and by pushing the workload out of the cloud entirely, enabling distribution of GenAI applications to smartphones and PCs.

Tirias Research forecasts 2028 data center power consumption of close to 4,250 megawatts, a 212X increase over 2023, at a total server amortized capital plus operational cost of over $76B dollars in today’s dollars. This cost excludes the cost of the data center building structure but includes labor, power, cooling, ancillary hardware, and 3-year amortized server costs. The FTCO model is baselined on server benchmarks utilizing 10 Nvidia GPU accelerators having a peak power of just over 3000 watts, and operating power at 50% average utilization at just over 60% of peak. “Using high density 10 GPU servers provided by data center innovator Krambu, Tirias Research is able to benchmark multiple open-source generative AI models to derive the computational demands of future, higher-parameter models.” continued Mr. Solotko. The forecast includes insights into GPU and TPU accelerator roadmaps over the next five years and uses these roadmaps to compute the workload that could be accomplished by each server in each of the use cases – text, imagery, and video. Perhaps the FTCO model’s biggest insight is that there is an equilibrium – as workloads become more complex, and server performance improves by about 4X, the server throughput per token or image remains relatively stable year over year.

As demand for GenAI continues exponentially, breakthroughs in processing or chip design seem like long bets with the slowing of Moore’s Law. There is no free lunch – consumers will demand better GenAI output, and that will counteract efficiency and performance gains. As consumer usage increases, costs will inevitably increase. Mr. Solotko concludes, “We are just starting to understand the data center economics of machine learning. By modeling the entire cycle of demand, processing, and cost, we can discover what ultimately will shift the workload and economics in favorable directions. Moving compute down to the edge and distributing it to clients like PC’s, smartphones and XR devices is on the critical path to lowering capital and operating costs.”

Companies began sounding the alarm about data center power consumption five years ago at the annual Hot Chips semiconductor technology conference by predicting that worldwide compute demand could exceed the total world electricity power generation within a decade. That was prior to the rapid adoption of GenAI, which has the potential to grow compute demand at an even faster rate. Technology enhancements alone will not overcome the processing challenges represented by the adoption of GenAI. It will require changes in the way that processing is performed, significant improvements in model optimization without a significant loss of accuracy, and new business models to cover the costs of what will still be required to be processed in the cloud. These points will be covered in Part 2 of GenAI Breaks The Data Center: Moving GenAI To The Edge.

Stay connected with us on social media platform for instant update click here to join our  Twitter, & Facebook

We are now on Telegram. Click here to join our channel (@TechiUpdate) and stay updated with the latest Technology headlines.

For all the latest Technology News Click Here 

Read original article here

Denial of responsibility! Rapidtelecast.com is an automatic aggregator around the global media. All the content are available free on Internet. We have just arranged it in one platform for educational purpose only. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials on our website, please contact us by email – [email protected]. The content will be deleted within 24 hours.
Leave a comment