Achieving Near-Linear Training Scalability for Pinterest’s Foundation Models
Here's a 3-sentence summary of the blog post on achieving near-linear training scalability for Pinterest's foundation models: To achieve near-linear training scalability for Pinterest's Foundation Models, Pinterest Engineering team optimized distributed training by reducing the volume and cost of distributed embedding communication, with a focus on communication bandwidth. They implemented quantized communications (QComms) to compress embedding tensors, balanced sharding to ensure an even distribution of embedding workload among GPUs, and bandwidth-aware embedding optimization to reshape the payload itself, leading to near-linear scaling across multiple nodes. These optimizations enabled the team to unlock larger models that drove significant engagement gains across Pinterest's recommendation surfaces, with a 7.5x scaling factor achieved on 8 nodes.