These instructions are a community contribution (thanks @candidosales!)> and are provided on a best-effort basis.
These instructions detail steps in the process for setting up this project from scratch. They cover Python environment setup up through deployment of the backend. They do not cover Python installation or creation of the Discord bot.
If you notice and resolve an error during setup, we'd be grateful if you submitted a PR to fix it and these docs.
The Makefile
gets configuration information like usernames and secrets from
dotenv files.
We've included an empty template file, .env.example
. Copy it to .env
with:
cp .env.example .env
If you want a dev environment you can also copy it to .env.dev
:
cp .env.example .env.dev
To switch between the production and development environments, set the ENV
variable to prod
or dev
, respectively.
export ENV=dev # set it as an environment variable
ENV=prod make help # or set it for each make command
There are many ways
to manage Python environments.
We use pyenv
+ pyenv-virtualenv
,
but you're welcome to use another method.
The only restriction is that the Makefile
presumes that python -m pip install
works for installing into the intended environment.
If you violate this assumption, you will not be able to use any of the make
commands.
If you're not using pyenv
+ pyenv-virtualenv
,
skip to step 4.
Follow the installation instructions
here for pyenv
and here for pyenv-virtualenv
.
Don't forget to follow the instructions for
setting up your shell environment!
To activate pyenv
and pyenv-virtualenv
,
restart your shell after setting up the shell environment.
pyenv
installs Python.
We use Python 3.10 and later in our development. You can install it with:
pyenv install 3.10.9
Environments isolate libraries used in one context from those used in another context.
For example, we can use them to isolate the libraries used in this project from those used in other projects.
Done naively, this would result in an explosion of space taken up by duplicated libraries.
Virtual environments allow the sharing of Python libraries across environments if they happen to be using the same version.
We create one for this project with:
pyenv virtualenv 3.10.9 ask-fsdl
To start using it, we need to "activate" it:
pyenv activate ask-fsdl
We've set it as the default environment for this directory with:
pyenv local ask-fsdl
which generates a .python-version
file in the current directory.
Now that we have an environment for our project, we can install the dependencies.
If you're interested in contributing, run
make dev-environment
which adds a few code quality checkers. Otherwise, run
make environment
From here, the Makefile
will handle a lot of the heavy lifting
for coordinating all the pieces of the project.
Run make help
to see all of the things it can do.
make help
However, it needs some information from you and some resources, like accounts on managed services, cannot be created automatically.
1 - Create a Modal account: https://modal.com/
We'll be running our application on Modal, which provides serverless infrastructure for data science/ML projects with a best-in-class developer experience.
Modal includes a free tier that should support hundreds of requests per day for our app.
Follow the instructions for getting an account -- at time of writing, Modal is in private beta but requests are generally approved quickly.
Once you've done that, run
make modal-token
and follow the instructions in the terminal.
Run
make modal-auth
to confirm that you've set up your Modal account correctly.
2 - Create an OpenAI account: https://openai.com/
We use language models and embeddings from OpenAI's language-modeling-as-a-service API.
So you'll need to create an OpenAI account.
Sign up here and get an API key.
Make sure you set up a payment method!
Creating the embeddings is cheap, on the order of a fifty cents.
But each query of the chatbot costs a few cents, so you might also want to set a limit on your account.
Add the OpenAI API key to the dotenv file.
We store our source corpus in MongoDB, which is a document database. The things we store look like JSON objects, which makes it easy to work with them in Python and Javascript.
The easiest way to use MongoDB is with the managed service from the creators of MongoDB, MongoDB Atlas. The project fits within the free tier very comfortably.
Alternatively, you can run a MongoDB instance locally or on a server you control, but we don't include any instructions for this path. Once you have the database set up, jump to the final step below.
Follow instructions here.
A document collection is like a table in a relational database.
Name them fsdl
and ask-fsdl
, respectively.
See instructions here. Add the username and password to the dotenv file.
Make sure you save the password somewhere safe as well, like a password manager.
To allow the application to connect to the database, we need to construct a connection string, a URI for connecting to databases.
To construct the connection string, we need three pieces of information:
- The name of the host where the database is running
- The name of a database user with read/write access to the database and its collections
- The password for that user
This information goes in the dotenv file. You should have already entered the username and password while creating the user. If not, you'll need to retrieve the password from wherever you stored it.
You can find the rest of the information in the "Connect" tab of the MongoDB Atlas dashboard:
For the application to run, it needs the information in the dotenv file.
We push that information to Modal with
make secrets
As a side effect, this will confirm that all of the necessary information is provided.
The sources used by the chatbot come from many places and are stored in a variety of formats.
We bring them in and format them all as JSON documents with a particular structure.
We also make them searchable with a vector index.
make document-store
Optionally, to learn more about the process, check out the included Jupyter notebook.
Run all steps to create the full document store.
We use a vector index to find the most similar documents to a given query.
You can create it with
make vector-index
At time of writing, the vector index is a FAISS index that is read from disk at query time.
You're now ready to ask the chatbot a question!
Use this command, and feel free to substitute your own query.
make cli-query QUERY="What can you do?"
You can turn this into a web service with
make backend
Retrying langchain.embeddings.openai.embed_with_retry.._embed_with_retry in 4.0 seconds as it raised RateLimitError: Your account is not active, please check your billing details on our website.
Set up payment method: https://platform.openai.com/account/billing/overview