Contact Us

Learnings From Building a Multilingual AI Support System with Guided Chat and RAG

Share on

Customer support systems rarely fail because of a lack of documentation. They fail because users cannot find the right answer when they need it. 

When we partnered with a global live-streaming platform focused on gaming, entertainment, and creator-driven content, their support model had reached that exact breaking point. Every support request—whether it was a simple FAQ or a complex account issue—entered a human agent queue. The average resolution time was around eight minutes per case, and as the platform expanded globally, costs and inconsistencies grew quickly. 

The goal was clear: reduce support costs, improve response accuracy, and scale support across multiple languages without degrading quality. 

To achieve that, we redesigned the support system from the ground up. 

 

The Problem: Knowledge Exists, But Retrieval Fails 

The platform already had a large knowledge base covering most common user issues. The problem was not a lack of information—it was information retrieval. 

We identified several structural problems: 

  • Every support request required a human agent, even when the answer already existed in documentation. 
  • The knowledge base was difficult to navigate for the customer support agents, especially across languages. 
  • Users frequently left the platform to search externally for answers. 
  • Support quality varied depending on the language and agent assigned. 
  • The existing infrastructure could not scale with demand. 

At its core, the support system lacked the ability to reliably deliver the right information, to the right user, in the right language, at the right moment. 

This is precisely the type of problem where retrieval-augmented generation (RAG) can work—if implemented carefully. 

 

Architecture Overview 

We built a RAG-powered guided chat system integrated with Salesforce that combines knowledge retrieval, multilingual support, and human escalation. 

The architecture includes five core components: 

  1. Knowledge ingestion and structuring 
  2. Retrieval and context construction 
  3. Language-aware query handling 
  4. LLM response generation 
  5. Continuous evaluation and monitoring 

Each component solved a specific failure point in the original support system. 

 

Structured Knowledge Ingestion 

Most RAG failures begin with poor data preparation. 

The platform’s knowledge base consisted primarily of HTML documentation containing tables, FAQs, and step-by-step guides. Standard chunking approaches often break these structures apart, producing fragmented retrieval results. 

To address this issue, we built a structure-preserving chunking engine. 

Instead of blindly splitting documents by token length, the system: 

  • Detects structural elements like tables and FAQ blocks 
  • Preserves semantic groupings 
  • Generates retrieval chunks that maintain instructional context 

This ensures that when the system retrieves content, it returns complete, usable answers instead of disconnected fragments. 

 

Multilingual Retrieval Without Duplicating Knowledge Bases 

The platform supports users across seven languages, but challenge was to maintain separate documentation sets and that would have created massive operational overhead. 

Instead, we implemented language-aware retrieval using: 

  • AWS Translate for query normalization 
  • AWS Bedrock Knowledge Base for retrieval orchestration 

The workflow works like this: 

  1. A user asks a question in their native language. 
  2. The query is normalized and translated for retrieval. 
  3. Relevant knowledge is retrieved from the shared knowledge base. 
  4. The final answer is generated and delivered in the user’s language. 

This approach enables true multilingual support without duplicating documentation. 

 

Guided Chat and Intelligent Escalation 

Automation should handle routine questions, but not everything can—or should—be automated. 

We designed the system as a guided support experience, not just a chatbot. 

The system includes predefined triggers for escalation when: 

  • Query intent is ambiguous 
  • The retrieved knowledge confidence is low 
  • The request involves sensitive account actions 

When escalation occurs, the case is routed to a human agent with full conversation context, eliminating the common frustration of repeating the problem after transfer. 

This creates a hybrid support model where automation handles scale and humans handle complexity. 

 

Observability: Evaluating RAG in Production 

Many AI systems perform well in controlled testing but degrade in production. 

To prevent this, we implemented continuous evaluation using Ragas, with metrics stored in DynamoDB. 

The system measures: 

  • Faithfulness – whether responses remain grounded in retrieved knowledge 
  • Relevancy – whether the answer addresses the user query 
  • Context utilization – whether retrieved context is actually used 

This evaluation pipeline runs continuously, providing real-time insight into system quality rather than relying on static evaluation sets. 

Before rollout, we also load-tested the system to handle approximately 7,500 requests per minute. 

 

Results in Production 

After deployment, several improvements became immediately visible. 

Answer accuracy improved across all seven languages. 

Because responses were grounded in documentation rather than agent interpretation, the system eliminated much of the inconsistency that previously existed between languages. 

Routine support requests became automated. 

High-volume issues such as FAQs and documentation-based questions were resolved instantly, reducing agent workload significantly. 

User behavior changed. 

Instead of leaving the platform to search for answers externally, users began resolving issues directly within the support experience. 

Operationally, the system delivered three key outcomes: 

  • Lower support costs 
  • Faster response times 
  • Consistent multilingual support 

Most importantly, the platform gained something it previously lacked: observability into support performance. 

For the first time, support quality could be measured and improved continuously. 

 

Beyond Customer Support 

The architecture we built is not limited to support systems. 

The same pattern applies to many enterprise problems: 

  • Internal knowledge assistants 
  • Developer documentation search 
  • Operations runbooks 
  • Enterprise workflow automation 

In all of these cases, the core challenge is the same: 

Retrieve the right knowledge, apply the right context, and deliver the right answer. 

 

Tags :