Server Components

API Server

ExpressJS server that acts as a interface to the large language model server (llm) and a vector database (vectordb) via LangChain.

Vector Database

ChromaDB is a self-hosted vector database that is used to store data that is used to train the llm. It’s also used to store data that is used to summarize data.

LLM

This is using the Ollama.ai server to access the large language model. It’s a self-hosted version of the GPT-3 model.

Each request returns the following information:

total_duration: time spent generating the response
load_duration: time spent in nanoseconds loading the model
prompt_eval_count: number of tokens in the prompt
prompt_eval_duration: time spent in nanoseconds evaluating the prompt
eval_count: number of tokens the response
eval_duration: time in nanoseconds spent generating the response
context: an encoding of the conversation used in this response, this can be sent in the next request to keep a conversational memory
response: empty if the response was streamed, if not streamed, this will contain the full response

TinyDolphin

This a small 1.1B parameter model that is used for summarization and other tasks.

Mistral

This is a bigger 7B parameter model that is used for summarization and other tasks.