At Huawei Innovative Data Infrastructure Forum 2024 in Baku, themed ‘Data Awakening: Building Leading AI-Ready Data Infrastructure’, Dr. Peter Zhou, Vice President of Huawei and President of Huawei Data Storage Product Line, stole the show with a speech about ‘Redefining Data Storage in the Data Awakening Era’.
Dr. Zhou envisioned the future of data storage as being driven by multiple capabilities, including ultra-performance, data resilience, new data paradigm, scalability, sustainability and data fabric.
With the rise of generative AI, the demand for robust data storage solutions has become even more critical in today’s technological landscape. As the cluster scale of large AI models has grown to include tens of thousands, and even hundreds of thousands, of GPUs, this expansion has resulted in more frequent cluster faults and training interruptions.
The lengthy process of repeatedly writing checkpoint data and resuming training leads to extended idle time of computing cards, thereby causing cluster utilisation to drop below 50%. What’s more, by 2026, the power consumption of global data centers is expected to reach 2.3 times that of 2022, and is equivalent to the annual power consumption of Japan. More than half of the power in data centers will be consumed by AI.
AI looks to disrupt traditional data storage, by not only focusing on performance, reliability and data paradigm, but also on scalability, sustainability and data fabric. In the data awakening era, Huawei will redefine data storage through leading innovation in these six dimensions:
- Ultra-performance: Huawei enhances storage performance by a factor of 10, compared to traditional storage. The storage also supports bandwidth in PB/s and 100 million IOPS, which greatly improves the efficiency of the entire generative AI process.
- Data resilience: Innovative architecture and technologies boost high reliability of 99.9999%. The built-in ransomware detection engine raises the detection accuracy to 99.99%. Even the checkpoint recovery time during AI training is shortened to less than a minute.
- New data paradigm: The multi-dimensional tensor data is enabled to support fast data retrieval via an intelligent search engine. The retrieval-augmented generation (RAG) technology works with the embedded knowledge base to eliminate hallucination in large AI models.
- Scalability: A single storage cluster can be scaled out for EB-level capacity and each engine can be scaled up with more GPUs, DPUs or NPUs for near-storage computing.
- Sustainability: Innovations in storage media and devices have brought about outstanding storage energy efficiency (less than 1 watt/TB) and storage density (greater than 1 PB/U).
- Data fabric: The capabilities of storage metadata management and search enable global data visibility and manageability, as well as data mobility that is 10 times more efficient.
These impressive innovations have laid the ground for the release of the high-performance OceanStor A800, which is a powerful addition to Huawei OceanStor A series storage models. Tailored to AI, OceanStor A800 can increase AI cluster utilisation by 30%, and as for performance, it delivers high bandwidth and IOPS, which are four and eight times better than its peer vendor’s.
Regarding scalability, OceanStor A800 supports scaling out to EB-level capacity with up to 512 controllers, as well as scaling up to a maximum of 4,096 computing cards. As for conserving space and energy, it achieves an outstanding storage density of one PB/U and energy efficiency of 0.7 watt/TB. It also provides a new data paradigm with vector index, tensor data and RAG. In terms of data resilience, the accuracy of ransomware detection is improved from 99.9% to 99.99%. In addition, the data fabric capability facilitates data asset management.
At the same time, storage media innovations are driving sustainable development. Huawei’s newly released high-capacity SSDs provide 10 times more capacity with the same disk size, which can further reduce energy consumption of a data center.
With 128 TB capacity per disk, the new SSDs consume 88% less storage space and 92% less energy than the peer vendor’s SSDs when storing every one PB of data.
To be AI-ready, enterprises must get data-ready. The Omni-Dataverse global file system built in the DME makes enterprise data assets visible, manageable, and mobile across regions, thereby building a solid AI data lake storage foundation for enterprises.
Dr. Peter Zhou ended by emphasising Huawei’s commitment to redefining data storage that focuses on customer challenges and demands in the data awakening era, and building leading AI-ready data infrastructure for greater customer value.