Skip to content

Server Components

API Server

ExpressJS server that acts as a interface to the large language model server (llm) and a vector database (vectordb) via LangChain.

Vector Database

ChromaDB is a self-hosted vector database that is used to store data that is used to train the llm. It’s also used to store data that is used to summarize data.

LLM

This is using the Ollama.ai server to access the large language model. It’s a self-hosted version of the GPT-3 model.

Each request returns the following information:

  • total_duration: time spent generating the response
  • load_duration: time spent in nanoseconds loading the model
  • prompt_eval_count: number of tokens in the prompt
  • prompt_eval_duration: time spent in nanoseconds evaluating the prompt
  • eval_count: number of tokens the response
  • eval_duration: time in nanoseconds spent generating the response
  • context: an encoding of the conversation used in this response, this can be sent in the next request to keep a conversational memory
  • response: empty if the response was streamed, if not streamed, this will contain the full response

TinyDolphin

This a small 1.1B parameter model that is used for summarization and other tasks.

Mistral

This is a bigger 7B parameter model that is used for summarization and other tasks.