Server Components
API Server
ExpressJS server that acts as a interface to the large language model server (llm) and a vector database (vectordb) via LangChain.
Vector Database
ChromaDB is a self-hosted vector database that is used to store data that is used to train the llm. It’s also used to store data that is used to summarize data.
LLM
This is using the Ollama.ai server to access the large language model. It’s a self-hosted version of the GPT-3 model.
Each request returns the following information:
total_duration
: time spent generating the responseload_duration
: time spent in nanoseconds loading the modelprompt_eval_count
: number of tokens in the promptprompt_eval_duration
: time spent in nanoseconds evaluating the prompteval_count
: number of tokens the responseeval_duration
: time in nanoseconds spent generating the responsecontext
: an encoding of the conversation used in this response, this can be sent in the next request to keep a conversational memoryresponse
: empty if the response was streamed, if not streamed, this will contain the full response
TinyDolphin
This a small 1.1B parameter model that is used for summarization and other tasks.
Mistral
This is a bigger 7B parameter model that is used for summarization and other tasks.