Meta Adaptive Ranking Model: Bending the Inference Scaling Curve to Serve LLM-Scale Models for Ads
Meta's engineers have designed the Meta Adaptive Ranking Model to efficiently serve Large Language Model (LLM)-scale Ads Recommender runtime models. This is achieved through three key innovations: Inference-Efficient Model Scaling, Model/System Co-Design, and Reimagined Serving Infrastructure. The Adaptive Ranking Model bends the inference scaling curve, maintaining sub-second latency and reducing costs while increasing model complexity.
The Meta Adaptive Ranking Model uses Request-Oriented Optimization and Request-Oriented Sequence Scaling to minimize computational redundancy and optimize storage footprints, significantly reducing costs. Additionally, the Wukong Turbo architecture introduces refinements to improve throughput, stability, and efficiency, enabling the use of LLM-scale models without compromising latency or cost efficiency.
By leveraging these innovations, the Meta Adaptive Ranking Model delivers a +3% increase in ad conversions and a +5% increase in ad click-through rate for targeted users.
SocialScale