Tutorials
Retrieval-Augmented Generation (RAG) with LangChain, Llama2 and ChromaDB on PropulsionAI
Understanding Retrieval-Augmented Generation (RAG)
Retrieval-Augmented Generation (RAG) is a powerful technique that combines the strengths of language models and information retrieval systems to generate more accurate and contextually relevant responses. By leveraging a knowledge base alongside a language model, RAG enables the model to access and incorporate relevant information when generating answers, leading to improved performance on a wide range of natural language processing tasks, such as question answering, dialogue systems, and content generation.
How RAG Works?
The RAG architecture consists of two main components: a retriever and a generator.
The retriever is responsible for searching and identifying relevant passages or documents from a knowledge base based on the input query. This is typically achieved using vector similarity search, where the query and the documents are embedded into a shared vector space, and the most similar documents are retrieved.
The generator, usually a language model like Llama 2, takes the input query and the retrieved passages as input and generates a response that incorporates the relevant information. By conditioning the language model on the retrieved passages, RAG ensures that the generated response is informed by the most relevant and up-to-date information available in the knowledge base.
Benefits of RAG
RAG offers several benefits over traditional language model-based approaches:
Improved Accuracy: By incorporating relevant information from a knowledge base, RAG can generate more accurate and factually correct responses, reducing the risk of hallucination or inconsistency.
Enhanced Contextual Relevance: RAG allows the language model to access and utilize relevant information from the knowledge base, enabling it to generate responses that are more contextually relevant to the input query.
Scalability: RAG can handle large-scale knowledge bases efficiently, thanks to the use of vector similarity search for retrieval. This enables the system to scale to vast amounts of information without compromising performance.
Flexibility: RAG can be applied to a wide range of natural language processing tasks, making it a versatile and powerful technique for enhancing language model performance.
In this tutorial, we’ll demonstrate how RAG can augment the Domino’s India Copilot, a specialized language model, by ensuring that the latest menu items and prices are available in all responses, guaranteeing correctness and up-to-date information.
Prerequisites
To follow along with this tutorial, you’ll need:
A fine-tuned Llama2 model on PropulsionAI. If you haven’t fine-tuned a model yet, check out our blog post on How to Fine-Tune LLaMA 2 on Your Own Data to learn how you can use PropulsionAI to fine-tune Llama2 using your own data.
Python 3.6 or higher
Required libraries: chromadb, langchain, requests, sentence-transformers, langchain-community, jq
GitHub Repository
You can find the complete code for this tutorial, including the Jupyter notebook and the dominos-menu.jsonl
file, in our GitHub repository: propulsion-ai/tutorial-rag-with-llama2-chromadb.
Step 1: Data Preparation
Begin by loading the Domino’s Pizza India menu data from the dominos-menu.jsonl
file using the JSONLoader
from the langchain_community
library. This data will serve as our knowledge base for the question-answering system.
loader = JSONLoader( file_path="./dominos-menu.jsonl", jq_schema="{name: .name, price: .price, description: .description, category: .category}", text_content=False, json_lines=True, ) data = loader.load()
Step 2: Vector Database Setup
Next, set up the ChromaDB vector database using the loaded data and the HuggingFaceEmbeddings
for efficient similarity search.
embeddings = HuggingFaceEmbeddings() vectordb = Chroma.from_documents(data, embeddings)
Step 3: Llama2 API Configuration
Configure the Llama2 API by providing your PropulsionAI API key and model version ID.
PROPULSION_API_KEY = "your api key" PROPULSION_VERSION_ID = "your model version id"
Step 4: Custom LLM Setup
Create a custom LLM class called PropulsionLLM
that interfaces with the Llama2 API. This class will handle the API calls and return the generated responses.
class PropulsionLLM(LLM): @property def _llm_type(self) -> str: return "custom" def _call( self, prompt: str, stop: Optional[List[str]] = None, run_manager: Optional[CallbackManagerForLLMRun] = None, **kwargs: Any, ) -> str: if stop is not None: raise ValueError("stop kwargs are not permitted.") return call_llama2(prompt) @property def _llm_type(self): return "custom" llm = PropulsionLLM()
Step 5: RAG Chain Setup
Set up the Retrieval-Augmented Generation (RAG) chain using the custom LLM and the ChromaDB vector database retriever.
qa = RetrievalQA.from_llm(llm, retriever=vectordb.as_retriever())
Step 6: Ask Questions
With the RAG chain in place, you can now ask questions related to Domino’s Pizza in India. The system will retrieve relevant information from the knowledge base and generate accurate responses, ensuring that the latest menu items and prices are included in the answers.
query = "What are some vegetarian options on the menu? Give me the prices as well." result = qa.invoke(query) print(result)
Example output:
{'query': 'What are some vegetarian options on the menu? Give me the prices as well.', 'result': 'There are several vegetarian options on the menu, including the Veg Paradise (INR 589) and Veg Extravaganza (INR 619) meals, which feature a regular pizza and a regular side.'}
Conclusion
In this tutorial, we demonstrated how to leverage Retrieval-Augmented Generation using Llama2 and ChromaDB on PropulsionAI to create a powerful question-answering system focused on Domino’s Pizza in India. By augmenting the Domino’s India Copilot with RAG, we ensure that the system provides accurate and up-to-date responses, incorporating the latest menu items and prices.
To further explore the capabilities of Llama2 and learn how to fine-tune it using your own data, check out our blog post on How to Fine-Tune LLaMA 2 on Your Own Data.
Happy coding!
Get started
Start building powerful LLM apps with PropulsionAI Beta.
Free while in Beta, pay only for GPU usage.