Kimi K2.5 is now on Workers AI, helping you power agents entirely on Cloudflare’s Developer Platform. Learn how we optimized our inference stack and reduced inference costs for internal agent use case
AI Summary
Cloudflare has integrated Moonshot AI's Kimi K2.5 model into its Workers AI platform, enabling large models to power developer agents. The model's 256k context window and support for multi-turn tool calling make it suitable for various agentic tasks. By serving large models directly within the Cloudflare Developer Platform, the company aims to facilitate efficient and cost-effective agent development. The integration of Kimi K2.5 has resulted in significant cost savings, with a 77% reduction in costs for one particular agent that processes over 7B tokens per day. Cloudflare has also optimized its inference stack to serve large models efficiently, using custom kernels and advanced techniques to improve performance and GPU utilization. To support agentic workloads, Cloudflare has released new features, including prefix caching and surfacing cached tokens, as well as a new session affinity header to improve cache hit rates and reduce inference costs. These improvements make it easier for developers to build and deploy efficient and cost-effective agents using