Have a chat with a range of document types over Cisco Webex, using Retrieval Augmented Generation with a Large Language Model such as ChatGPT
The service is started with:
uvicorn main:app --reload --host --port 80
e.g.
$uvicorn main:app --reload --host 172.31.22.244 --port 80
This happens at boot time for the solution and creates a pipeline for ingesting data from a source and indexing it
First we will load our data. This is done with a range Document Loaders that can accept docuent in the following formats:
Text splitters will break the Documents into smaller chunks. This is useful both for indexing data and passing it into a model, as large chunks are harder to search over and won't fit in a model's finite context window.
We will store and index our splits, so that they can be searched over later. This will be done using a VectorStore and Embeddings model.
This is the actual RAG chain, which takes the user query (from Webex) at run time and retrieves the relevant data from the index, then passes that to the model.
Taking the users input (typically a question) relevant splits are retrieved from storage using a Retriever.
A ChatModel / LLM produces an answer using a prompt that includes both the question with the retrieved data. This project uses a privte instance of Azure ChapGPT, though other LLMs may be substituted.
The system was tested using Python 3.12.3 running on Ubuntu 24.04.1 LTS in AWS EC2
The solution requires a Chatbot in Webex, this can be created via: https://developer.webex.com/docs/bots. The following details are avaialble via this process for inclusion in the config/settings.toml file:
message_callback_url should provide the publically accesible URL of the server running the solution, with '/webexMessage' appended, e.g. a server running in AWS EC2 would take the form
"http://ec2-14-59-23-21.eu-south-1.compute.amazonaws.com/webexMessage"
This URL is used for the reciept of Webhook notifications from Webex when user messages are recieved.
The OPENAI retreiver requries an API key. THis can be obtained from OPENAI via https://platform.openai.com/settings/organization/api-keys
This key should then be placed in a .env file containing the following form:
The following shows the form of detail required for a private Azure LLM instance (with substituted details). The form of these details are likely to vary between different LLMproviders.
The tool has been enabled for multiuser concurrency, though the feature has not been extensively tested.
The tool has not been tested for scalability.
User state (the ephemeral_chat_history object for a particular user) is currently held in memory (in the form of the userStore{} Dict in GPTInterface.py. This should be moved to a dB as a next development step to aid scalability. The object is overwritten during each token refresh cycle (once per hour).
The LLM in use times out every 60 minutes. In this event, a new user interaction will trigger a token update.
LLM temperature is a parameter that controls how random a large language model (LLM) is when it generates text and is set withing GPTIntrface.py, e.g.
temperature=0.00
Low temperature
The LLM is more likely to choose the most probable tokens, resulting in more predictable and conservative outputs
High temperature
The LLM is more likely to choose less probable tokens, resulting in more varied and creative outputs
Sets a similarity score threshold and only returns documents with a score above that threshold.
Set the number of resutls to return
Chunk size is the maximum number of characters that a chunk can contain.
Chunk overlap is the number of characters that should overlap between two adjacent chunks
Code Exchange Community
Get help, share code, and collaborate with other developers in the Code Exchange community.View Community