Channel, News, Vendor

AWS enters into an inference chip deal

Amazon Web Services (AWS) is planning to use chips from a new provider in the form of start-up Cerebras Systems. By using these chips alongside its own in-house processors they aim to deliver what they claim will be the fastest AI inference offering available on Amazon Bedrock.

The integrated system pairs Cerebras’ CS‑3 systems, which are specialised for the slower, memory‑intensive decode phase, with AWS’ Trainium processors for the prefill stage of AI inference.

By disaggregating inference into those two components and linking the hardware with high‑bandwidth elastic fabric adapter (EFA) networking, the companies aim to dramatically increase throughput and reduce latency for generative AI and large language model workloads.

AWS stated the approach will deliver order‑of‑magnitude performance gains for demanding applications such as real‑time coding assistants.

Cerebras CEO Andrew Feldman and AWS VP David Brown both emphasised the benefits of accelerating inference speed for global enterprise customers.

The collaboration makes AWS the first hyperscaler to offer Cerebras’ disaggregated inference platform. Financial terms and the length of the arrangement are not available. It also marks a potential shift for increased competition between Cerebras and market giant Nvidia.

AWS plans to offer a new service based on the partnership in the second half of 2026.

Source: Mobile World Live

Image Credit: AWS

Previous ArticleNext Article

GET TAHAWULTECH.COM IN YOUR INBOX

The free newsletter covering the top industry headlines