Climber: Toward Efficient Scaling Laws for Large Recommendation Models
Authors: Songpei Xu, Shijia Wang, Da Guo, Xianwen Guo, Qiang Xiao, Bin Huang, Guanlin Wu, Chuanjiang Luo
Abstract: Transformer-based generative models have achieved remarkable success across domains with various scaling law manifestations. However, our extensive experiments reveal persistent challenges when applying Transformer to recommendation systems: (1) Transformer scaling is not ideal with increased computational resources, due to structural incompatibilities with recommendation-specific features such as multi-source data heterogeneity; (2) critical online inference latency constraints (tens of milliseconds) that intensify with longer user behavior sequences and growing computational demands. We propose Climber, an efficient recommendation framework comprising two synergistic components: the model architecture for efficient scaling and the co-designed acceleration techniques. Our proposed model adopts two core innovations: (1) multi-scale sequence extraction that achieves a time complexity reduction by a constant factor, enabling more efficient scaling with sequence length; (2) dynamic temperature modulation adapting attention distributions to the multi-scenario and multi-behavior patterns. Complemented by acceleration techniques, Climber achieves a 5.15$\times$ throughput gain without performance degradation by adopting a "single user, multiple item" batched processing and memory-efficient Key-Value caching. Comprehensive offline experiments on multiple datasets validate that Climber exhibits a more ideal scaling curve. To our knowledge, this is the first publicly documented framework where controlled model scaling drives continuous online metric growth (12.19\% overall lift) without prohibitive resource costs. Climber has been successfully deployed on Netease Cloud Music, one of China's largest music streaming platforms, serving tens of millions of users daily.
Explore the paper tree
Click on the tree nodes to be redirected to a given paper and access their summaries and virtual assistant
Look for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.