IBM brings the speed of light to the Generative AI era with optics breakthrough

IBM brings the speed of light to the Generative AI era with optics breakthrough

New co-packaged optics innovation could replace electrical interconnects in data centers.

IBM researchers have pioneered a new process for co-packaged optics (CPO) that could dramatically improve how data centers train and run generative AI models.

The next generation of optics technology enables connectivity within data centers at the speed of light through optics to complement existing short reach electrical wires.

By designing and assembling the first publicly announced successful polymer optical waveguide (PWG) to power this technology, IBM researchers have shown how CPO will redefine the way the computing industry transmits high-bandwidth data between chips, circuit boards and servers.

Although data centers use fiber optics for their external communications networks, racks in data centers still predominantly run communications on copper-based electrical wires. These wires connect GPU accelerators that may spend more than half of their time idle, waiting for data from other devices in a large, distributed training process which can incur significant expense and energy.

IBM researchers have demonstrated a way to bring optics’ speed and capacity inside data centers.

In a technical paper, IBM introduces a new CPO prototype module that can enable high-speed optical connectivity. This technology could significantly increase the bandwidth of data center communications, minimizing GPU downtime while drastically accelerating AI processing. This research innovation, as described, would enable:  

  • Lower costs for scaling generative AI through a more than 5x power reduction in energy consumption compared to mid-range electrical interconnects,while extending the length of data center interconnect cables from one to hundreds of meters.
  • Faster AI model training, enabling developers to train a Large Language Model (LLM) up to five times faster with CPO than with conventional electrical wiring. CPO could reduce the time it takes to train a standard LLM from three months to three weeks, with performance gains increasing by using larger models and more GPUs.
  • Dramatically increased energy efficiency for data centers, saving the energy equivalent of 5,000 U.S. homes’ annual power consumption per AI model trained.

“As generative AI demands more energy and processing power, the data center must evolve – and co-packaged optics can make these data centers future-proof,” said Dario Gil, SVP and Director of Research, IBM.

“With this breakthrough, tomorrow’s chips will communicate much like how fiber optics cables carry data in and out of data centers, ushering in a new era of faster, more sustainable communications that can handle the AI workloads of the future.”

CPO technology aims to scale the interconnection density between accelerators by enabling chipmakers to add optical pathways connecting chips on an electronic module beyond the limits of today’s electrical pathways. IBM’s paper outlines how these new high bandwidth density optical structures, coupled with transmitting multiple wavelengths per optical channel, have the potential to boost bandwidth between chips as much as 80 times compared to electrical connections.

Browse our latest issue

Intelligent Data Centres

View Magazine Archive