Live Session
Session 1: Large Language Models 1
Industry
A Hybrid Multi-Agent Conversational Recommender System with LLM and Search Engine in E-commerce
Guangtao Nie (JD.com), Rong Zhi (JD.com), Xiaofan Yan (JD.com), Yufan du (JD.com), Xiangyang Zhang (JD.com), Jianwei Chen (JD.com), Mi Zhou (JD.com), Hongshen Chen (JD.com), Tianhao Li (JD.com), Sulong Xu (JD.com), Jinghe Hu (JD.com) and Ziguang Cheng (jd.com)
Abstract
Multi-agent collaboration is central to building conversational recommender systems (CRS), especially with the widespread use of Large Language Models (LLMs) recently. Typically, these systems employ several LLM agents, each serving distinct roles to meet user needs. In an industrial setting, it’s essential for a CRS to exhibit low first token latency (i.e., the time taken from a user’s input until the system outputs its first response token.) and high scalability—for instance, minimizing the number of LLM inferences per user request—to enhance user experience and boost platform revenue. For example, JD.com’s baseline CRS features two LLM agents and a search API but suffers from high first token latency and requires two LLM inferences per request (LIPR), hindering its performance. To address these issues, we introduce a Hybrid Multi-Agent Collaborative Recommender System (Hybrid-MACRS). It includes a central agent powered by a fine-tuned proprietary LLM and a search agent combining a related search module with an engine. This hybrid system notably reduces first token latency by about 70% and cuts the LIPR from 2 to 1. We conducted thorough online A/B testing to confirm this approach’s efficiency.