Llama 3.1

Today, Cerebras Systems launched its new AI inference solution called Cerebras Inference which the company stated is the fastest AI solution in the world. That means it provides 1,800 tokens in a second for Llama 3.1 in the context of this solution. For, 1850 tokens per second for Llama 3.1 8B, and 450 tokens per second. 1 70B of model capacity, which is 20X faster than NVIDIA GPU-based hyperscale clouds.

Llama 3.1 Tokens Verified By Cerebas

Unlike other approaches that reduce accuracy for the benefit of optimizing performance, Cerebras provides high levels of performance while, at the same time, achieving levels of accuracy that are of leading industry standards by, therefore, keeping the process of inference fully in the 16-bit system.

While Cerebras scored well on all the benchmarks, Inference costs a much fraction of its GPU-based rivals, with 10 cents per million tokens for Llama 3.1 under the pay-as-you-go model. Five cents per token per million tokens for Version 1 8B and 60 cents per million tokens for Llama 3.1 70B.

Cerebras solves the inherent memory bandwidth problems with the GPUs where the models have to be brought to the compute cores for every output token. Thereby, there is a limitation to the speed of inference especially for larger models such as Llama 3.1 70B with 70 billion parameters and 140GB of memory inline size.

Apart from the performance promises, Cerebras is marketing its service as cheaper than existing solutions out there. The company said its service begins at $0.10 per million tokens, a cost-performance that it said is 100x higher for AI inference.

By Yash Verma

Yash Verma is the main editor and researcher at AyuTechno, where he plays a pivotal role in maintaining the website and delivering cutting-edge insights into the ever-evolving landscape of technology. With a deep-seated passion for technological innovation, Yash adeptly navigates the intricacies of a wide array of AI tools, including ChatGPT, Gemini, DALL-E, GPT-4, and Meta AI, among others. His profound knowledge extends to understanding these technologies and their applications, making him a knowledgeable guide in the realm of AI advancements. As a dedicated learner and communicator, Yash is committed to elucidating the transformative impact of AI on our world. He provides valuable information on how individuals can securely engage with the rapidly changing technological environment and offers updates on the latest research and development in AI. Through his work, Yash aims to bridge the gap between complex technological advancements and practical understanding, ensuring that readers are well-informed and prepared for the future of AI.

Leave a Reply

Your email address will not be published. Required fields are marked *