NVIDIA Keeps The Performance Crown For AI Inference For The 6th Time In A Row

In The Data Center And On The Edge, the bottom line is that the H100 (Hopper-based) GPU is up to four times faster than the NVIDIA A100 on the newly released MLPerf V2.1 benchmark suite. The A100 retains leadership in many benchmarks versus other available products while the NVIDIA Jetson AGX Orin leads in edge computing.

Every three months we share our perspectives with you as the MLCommons consortium coordinates the publication of new AI benchmarks for inference and training. While NVIDIA lost the absolute performance crown during the last smack-down for training while awaiting the H100, NVIDIA totally stole the performance leadership for Inference processing with the H100 this time around, and we expect it will repeat this dominance in the training round in three months. Let’s take a look.

The Results

AS is typical with successive generations of NVIDIA GPUs, the Hopper GPU is 1.5x to 2.5x the performance of the Ampere A100 on most AI benchmarks. But in the fast growing world of Natural Language Processing (NLP), the H100’s Transformer Engine and Software showed better than four times the performance of the A100 and handily beat the Chinese-built Biren BR 104 GPU. NLP is extremely computationally intense, as it can take hundreds or even thousands of GPUs to train the large models, while the inference processing of those models is also performance-hungry.

NVIDIA H100 stole the show for inference performance, especially for the transformer network … [+] BERT-Large at 99.9% accuracy.

NVIDIA

Once again, NVIDIA’s mature software stack enabled it to submit results for all 7 inference benchmarks, while its competitors cherry-pick one or two benchmarks they have been tuning for customer projects. This is a critical point for companies considering new technologies, especially cloud service providers, since a) multi-model inference processing is quickly becoming critical, and 2) the fungibility of a processor across a large number of workloads greatly simplifies IT management, reduces capital expenses, and improves utilization rates dramatically, increasing profits for CSPs.

Here is an example of how a multi-model inference workflow looks today for a natural language interface to a query. It takes nine AI models working together to run this simple query, “what kind of flower is this?”. You certainly don’t want to have to tackle this query across multiple chips, right? Especially since you need the answer in milliseconds. We strongly encourage all the vendors to submit across all models in the MLPerf suite.

A simple query can decompose into nine AI models. Together these models need to get the answer to … [+] the user in 10 milliseconds.

NVIDIA

Before we wrap up, we want to share the significant performance improvements NVIDIA has realized for their edge platform, the Jetson Orin.

A 50% performance improvement also translates to a 50% improvement in power efficiency.

NVIDIA

Speaking of encouraging vendors, I have a message for those still MIA when it comes to these benchmarking efforts (I’m talking to you, AMD, AWS, Groq, SambaNova, Tenstorrent, ….). (Note that Intel, Google, and Graphcore have frequently submitted training results to MLPerf.) You can’t hide. You know you run these models all the time to support sales efforts and to use internally to identify performance enhancements (or bottlenecks). The ML community values openness and transparency. Get on board, or I will assume you don’t have a chip good enough to compete. If you think you do, prove it. (Ok, I’m sure I just lost a lot of business!)

Speaking of new submissions, this was the first by SAPEON X220-Enterprise from SK-Telecom and the Biren BR104 GPU out of China, the latter of which showed impressive performance for image processing with just 32GB of HBM2E memory. We look forward to seeing how these chips perform on Training benchmarks in a few months. We congratulate SK-Telecom and Biren on YOUR openness and transparency, and on excellent results.

(As an aside, the BR104 seems to make the US Governments prohibition on A100 and H100 GPU shipments to China look a little odd to us; that horse already left the barn.)

Power Efficiency is another story

While NVIDIA H100 is undoubtably the fastest chip for AI, Qualcomm steals the show when it comes to power efficiency, a critical requirement for many edge and edge cloud applications. Once again, the Qualcomm Cloud AI 100 bested all submissions for power efficiency, albeit only for image processing and NLP (BERT). The recent adoption of the Cloud AI 100 by a string of server vendors including Dell, HPE, and Lenovo are early indicators of significant customer interest in using the Qualcomm platform for edge deployments. We have covered more details regarding the Qualcomm submissions in another article here on Forbes and on our website.

Conclusions

The story is pretty simple for now. In nnference processing, NVIDIA wins the performance race (albeit with a chip that is not yet generally available) and Qualcomm wins the power efficiency battle (albeit only for image and natural language processing to date). But stay tuned; new vendors will eventually see the value, and will hopefully submit their results. Unfortunately, I suspect some of these organizations will adhere to the old adage, “Don’t release any benchmark you didn’t win!”. In other words, the wall of shame listed above probably has a lot of chips that just aren’t as good as their marketing hype, or as good as NVIDIA and Qualcomm.

I hope you aren’t growing tired of these MLPerf stories. I am not, as I firmly believe that the MLCommons organization, which has grown and evolved significantly over the last 2 years, is providing a valuable service to chip makers, server vendors, and their customers.

Disclosures: This article expresses the opinions of the authors, and is not to be taken as advice to purchase from nor invest in the companies mentioned. Cambrian AI Research is fortunate to have many, if not most, semiconductor firms as our clients, including Blaize, Cerebras, D-Matrix, Esperanto, FuriosaAI, Graphcore, GML, IBM, Intel, Mythic, NVIDIA, Qualcomm Technologies, Si-Five, SiMa.ai, Synopsys, and Tenstorrent. We have no investment positions in any of the companies mentioned in this article and do not plan to initiate any in the near future. For more information, please visit our website at https://cambrian-AI.com.

Stay connected with us on social media platform for instant update click here to join our T witter, & Facebook

We are now on Telegram. Click here to join our channel (@TechiUpdate) and stay updated with the latest Technology headlines.

For all the latest Technology News Click Here

Read original article here

Denial of responsibility! Rapidtelecast.com is an automatic aggregator around the global media. All the content are available free on Internet. We have just arranged it in one platform for educational purpose only. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials on our website, please contact us by email – [email protected]. The content will be deleted within 24 hours.