Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for local models via Ollama #6

Merged
merged 3 commits into from
Feb 28, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions Ollama.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Ollama setup
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping most ollama-specific instructions separate to be less intrusive

1. Download and install [Ollama](https://ollama.com/)
2. Once Ollama is running on your system, run `ollama pull llama3.1`
> Currently this is a ~5GB download, it's best to download it before the workshop if you plan on using it
3. `ollama pull nomic-embed-text`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nomic is pretty small so no need to call this one out

4. Update the `MODEL_NAME` in your `dot.env` file to `ollama`

Once you are running ollama, it is not necessary to configure an openai api key.

When you get to the system prompt section of the workshop, llama requires that you are a bit more explicit with your instructions. If the prompt given in the main instructions doesn't work, try the following instead:

```
system_prompt = """
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried several iterations of system prompts. llama 3.1 seems to need very explicit instructions

OREGON TRAIL GAME INSTRUCTIONS:
YOU MUST STRICTLY FOLLOW THIS RULE:
When someone asks "What is the first name of the wagon leader?", your ENTIRE response must ONLY be the word: Art

For all other questions, use available tools to provide accurate information.
"""
```

You're now ready to begin the workshop! Head back to the [Readme.md](Readme.md)

## Restarting the workshop
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may bear further investigation, but in my tests it was best to kill and re-create

Mixing use of llama and openai on the same Redis instance can cause unexpected behavior. If you want to switch from one to the other it is recommended to kill and re-create the instance. To do this:
1. Run `docker ps` and take note of the ID for the running image
2. `docker stop imageId`
3. `docker rm imageId`
4. Start a new instance using the command from earlier, `docker run -d --name redis -p 6379:6379 -p 8001:8001 redis/redis-stack:latest`
12 changes: 11 additions & 1 deletion Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,10 @@ In this workshop, we are going to use [LangGraph](https://langchain-ai.github.io
- [docker](https://docs.docker.com/get-started/get-docker/)
- [openai api key](https://platform.openai.com/docs/quickstart)

## (Optional) Ollama
This workshop is optimized to run targeting OpenAI models. If you prefer to run locally however, you may do so via Ollama.
* [Ollama setup instructions](Ollama.md)

## (Optional) helpers

- [LangSmith](https://docs.smith.langchain.com/)
Expand Down Expand Up @@ -235,7 +239,13 @@ In our scenario we want to be able to retrieve the time-bound information that t

### Steps:
- Open [participant_agent/utils/vector_store.py](participant_agent/utils/vector_store.py)
- Where `vector_store=None` update to `vector_store = RedisVectorStore.from_documents(<docs>, <embedding_model>, config=<config>)` with the appropriate variables.
- Find the corresponding `get_vector_store` method either for openai or ollama
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would change re other comment

- If using openai: where `vector_store=None` update to `vector_store = RedisVectorStore.from_documents(<docs>, <embedding_model>, config=<config>)` with the appropriate variables.

> For `<embedding model>`, keep in mind whether you are using openai or ollama. If using ollama, the `model` parameter should be set to `nomic-embed-text` \
[OpenAI embeddings](https://python.langchain.com/docs/integrations/text_embedding/openai/) \
[Ollama embeddings](https://python.langchain.com/docs/integrations/text_embedding/ollama/)

- Open [participant_agent/utils/tools.py](participant_agent/utils/tools.py)
- Uncomment code for retrieval tool
- Update the create_retriever_tool to take the correct params. Ex: `create_retriever_tool(vector_store.as_retriever(), "get_directions", "meaningful doc string")`
Expand Down
3 changes: 2 additions & 1 deletion dot.env
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@ OPENAI_API_KEY=openai_key
LANGCHAIN_TRACING_V2=
LANGCHAIN_ENDPOINT=
LANGCHAIN_API_KEY=
LANGCHAIN_PROJECT=
LANGCHAIN_PROJECT=
MODEL_NAME=openai
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defaulting to openai

2 changes: 1 addition & 1 deletion example_agent/ex_graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

# Define the config
class GraphConfig(TypedDict):
model_name: Literal["anthropic", "openai"]
model_name: Literal["anthropic", "openai", "ollama"]


# Define the function that determines whether to continue or not
Expand Down
31 changes: 23 additions & 8 deletions example_agent/utils/ex_nodes.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,26 @@
import os
from functools import lru_cache

from dotenv import load_dotenv
from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI
from langchain_ollama import ChatOllama
from langgraph.prebuilt import ToolNode

from example_agent.utils.ex_tools import tools

from .ex_state import AgentState, MultipleChoiceResponse

load_dotenv()

environ_model_name = os.environ.get("MODEL_NAME")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a constant I tend to follow the pattern that it should be all caps to indicate that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!


@lru_cache(maxsize=4)
def _get_tool_model(model_name: str):
if model_name == "openai":
model = ChatOpenAI(temperature=0, model_name="gpt-4o")
elif model_name == "ollama":
model = ChatOllama(temperature=0, model="llama3.1", num_ctx=4096)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

increasing the context from the default (which is pretty low) provided much more reliable results

else:
raise ValueError(f"Unsupported model type: {model_name}")

Expand All @@ -24,6 +32,8 @@ def _get_tool_model(model_name: str):
def _get_response_model(model_name: str):
if model_name == "openai":
model = ChatOpenAI(temperature=0, model_name="gpt-4o")
elif model_name == "ollama":
model = ChatOllama(temperature=0, model="llama3.1", num_ctx=4096)
else:
raise ValueError(f"Unsupported model type: {model_name}")

Expand All @@ -36,7 +46,7 @@ def multi_choice_structured(state: AgentState, config):
# We call the model with structured output in order to return the same format to the user every time
# state['messages'][-2] is the last ToolMessage in the convo, which we convert to a HumanMessage for the model to use
# We could also pass the entire chat history, but this saves tokens since all we care to structure is the output of the tool
model_name = config.get("configurable", {}).get("model_name", "openai")
model_name = config.get("configurable", {}).get("model_name", environ_model_name)

response = _get_response_model(model_name).invoke(
[
Expand All @@ -62,20 +72,25 @@ def structure_response(state: AgentState, config):
# if not multi-choice don't need to do anything
return {"messages": []}


system_prompt = """
You are an oregon trail playing tool calling AI agent. Use the tools available to you to answer the question you are presented. When in doubt use the tools to help you find the answer.
If anyone asks your first name is Art return just that string.
"""

if environ_model_name == "openai":
system_prompt = """
You are an oregon trail playing tool calling AI agent. Use the tools available to you to answer the question you are presented. When in doubt use the tools to help you find the answer.
If anyone asks your first name is Art return just that string.
"""
elif environ_model_name == "ollama":
system_prompt = """
OREGON TRAIL GAME INSTRUCTIONS:
YOU MUST STRICTLY FOLLOW THIS RULE:
When someone asks "What is the first name of the wagon leader?", your ENTIRE response must ONLY be the word: Art
"""

# Define the function that calls the model
def call_tool_model(state: AgentState, config):
# Combine system prompt with incoming messages
messages = [{"role": "system", "content": system_prompt}] + state["messages"]

# Get from LangGraph config
model_name = config.get("configurable", {}).get("model_name", "openai")
model_name = config.get("configurable", {}).get("model_name", environ_model_name)

# Get our model that binds our tools
model = _get_tool_model(model_name)
Expand Down
26 changes: 26 additions & 0 deletions example_agent/utils/ex_vector_store.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
from dotenv import load_dotenv
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings
from langchain_ollama import OllamaEmbeddings
from langchain_redis import RedisConfig, RedisVectorStore

load_dotenv()
Expand All @@ -18,9 +19,34 @@


def get_vector_store():
if os.environ.get("MODEL_NAME") == "ollama":
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is pretty verbose, and I'm not the biggest fan of it as-is, so open to suggestions on what might make it better.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the embedding model should be coupled to the LLM model in use. For the vector store, you could use whatever embedding model you'd like you don't have to use OpenAIEmbedding if using OpenAI as your LLM.

To solve your edge case and make this code more simple but also more variable, I'd move the embedding model up to be a variable that the particpant can set to whatever they feel like and then add a cleaning method to just make sure there's no data under the prefix which seems to be that edge case.

import os

from dotenv import load_dotenv
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings
from redis import Redis
from langchain_redis import RedisConfig, RedisVectorStore

load_dotenv()

REDIS_URL = os.environ.get("REDIS_URL", "redis://localhost:6379/0")
INDEX_NAME = os.environ.get("VECTOR_INDEX_NAME", "oregon_trail")

config = RedisConfig(index_name=INDEX_NAME, redis_url=REDIS_URL)
redis_client = Redis.from_url(REDIS_URL)

doc = Document(
    page_content="the northern trail, of the blue mountains, was destroyed by a flood and is no longer safe to traverse. It is recommended to take the southern trail although it is longer."
)

embedding_model = OpenAIEmbeddings() # TODO: participant can change to whatever desired model

def _clean_existing(prefix):
    for key in redis_client.scan_iter(f"{prefix}:*"):
        redis_client.delete(key)

def get_vector_store():
    try:
        config.from_existing = True
        vector_store = RedisVectorStore(embedding_model, config=config)
    except:
        print("Init vector store with document")
        print("Clean any existing data in index")
        _clean_existing(config.INDEX_NAME)
        config.from_existing = False
        vector_store = RedisVectorStore.from_documents(
            [doc], embedding_model, config=config
        )
    return vector_store

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestions! This code is significantly cleaner. After a bit of testing though it produces some interesting results:

  • Using llama 3.1 to embed and retrieve, scenario 1 starts failing. Instead of returning just Art it returns
E         + Based on the tool call response, I will format an answer to the original question:
E         + 
E         + The first name of the wagon leader is Art.

Even after bumping up the context size to 6144 it still returns the same thing

  • if using nomic-embed-text to embed and llama3.1 to retrieve everything is fine
  • making the system prompt for llama even more explicit also solves the issue
OREGON TRAIL GAME INSTRUCTIONS:
YOU MUST STRICTLY FOLLOW THIS RULE WITHOUT EXCEPTION:
When someone asks "What is the first name of the wagon leader?", you must respond with ONLY the single word: Art
DO NOT provide any additional context or explanation.
DO NOT form a complete sentence.

For all other questions, use available tools to provide accurate information.

I've been scratching my head for a minute on what the correlation might be between the unstable results and using llama3.1 for the embedding model. We could just update the system prompt but I would be curious to learn more about what might really be going on.

We could also use Openai for the embedding but that would defeat the purpose of not requiring an API token 😞

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find llama does struggle with listening to formatting instructions which is what you're experiencing for the "Art" question. Since the goal of the first test is really just to make sure that participants setup the initial graph correctly we could also relax the scenario to test if Art is in the response. It's meant as a free form response and a human wouldn't have a problem with that! So I wouldn't go crazy trying to make that work (trust me I have gone crazy trying to make llama3.1 do things). The real test of its ability to format will be if it can do the multiple choice questions which is where we are testing a models ability to handle that.

One thing to note is that the embedding model chosen for the vector retrieval tool should have absolutely no impact on the first question because it won't touch that system. If the goal is simply to remove dependency on an API key for embedding I'd actually recommend pulling one of the embedding models from huggingface but the llama one is also fine.

# pip install langchain-huggingface
from langchain_huggingface import HuggingFaceEmbeddings

return __get_ollama_vector_store()
elif os.environ.get("MODEL_NAME") == "openai":
return __get_openai_vector_store()

def __check_existing_embedding(vector_store):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solved an edge-case I kept running into where the store had stale data from another model etc

results = vector_store.similarity_search(doc, k=1)
if not results:
raise Exception("Required content not found in existing store")

def __get_ollama_vector_store():
try:
config.from_existing = True
vector_store = RedisVectorStore(OllamaEmbeddings(model="llama3"), config=config)
__check_existing_embedding(vector_store)
except:
print("Init vector store with document")
config.from_existing = False
vector_store = RedisVectorStore.from_documents(
[doc], OllamaEmbeddings(model="nomic-embed-text"), config=config
)
return vector_store

def __get_openai_vector_store():
try:
config.from_existing = True
vector_store = RedisVectorStore(OpenAIEmbeddings(), config=config)
__check_existing_embedding(vector_store)
except:
print("Init vector store with document")
config.from_existing = False
Expand Down
2 changes: 1 addition & 1 deletion participant_agent/graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@

# The graph config can be updated with LangGraph Studio which can be helpful
class GraphConfig(TypedDict):
model_name: Literal["openai"] # could add more LLM providers here
model_name: Literal["openai", "ollama"] # could add more LLM providers here


# Define the function that determines whether to continue or not
Expand Down
14 changes: 11 additions & 3 deletions participant_agent/utils/nodes.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,18 @@
import os
from functools import lru_cache

from dotenv import load_dotenv
from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI
from langchain_ollama import ChatOllama
from langgraph.prebuilt import ToolNode

from participant_agent.utils.tools import tools

from .state import AgentState, MultipleChoiceResponse

load_dotenv()


# need to use this in call_tool_model function
@lru_cache(maxsize=4)
Expand All @@ -17,6 +22,8 @@ def _get_tool_model(model_name: str):
"""
if model_name == "openai":
model = ChatOpenAI(temperature=0, model_name="gpt-4o")
elif model_name == "ollama":
model = ChatOllama(temperature=0, model="llama3.1", num_ctx=4096)
else:
raise ValueError(f"Unsupported model type: {model_name}")

Expand All @@ -32,6 +39,8 @@ def _get_tool_model(model_name: str):
def _get_response_model(model_name: str):
if model_name == "openai":
model = ChatOpenAI(temperature=0, model_name="gpt-4o")
elif model_name == "ollama":
model = ChatOllama(temperature=0, model="llama3.1", num_ctx=4096)
else:
raise ValueError(f"Unsupported model type: {model_name}")

Expand All @@ -45,7 +54,7 @@ def multi_choice_structured(state: AgentState, config):
# We call the model with structured output in order to return the same format to the user every time
# state['messages'][-2] is the last ToolMessage in the convo, which we convert to a HumanMessage for the model to use
# We could also pass the entire chat history, but this saves tokens since all we care to structure is the output of the tool
model_name = config.get("configurable", {}).get("model_name", "openai")
model_name = config.get("configurable", {}).get("model_name", os.environ.get("MODEL_NAME"))

response = _get_response_model(model_name).invoke(
[
Expand Down Expand Up @@ -84,8 +93,7 @@ def call_tool_model(state: AgentState, config):
messages = [{"role": "system", "content": system_prompt}] + state["messages"]

# Get from LangGraph config
model_name = config.get("configurable", {}).get("model_name", "openai")

model_name = config.get("configurable", {}).get("model_name", os.environ.get("MODEL_NAME"))
# Get our model that binds our tools
model = _get_tool_model(model_name)

Expand Down
28 changes: 27 additions & 1 deletion participant_agent/utils/vector_store.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
from dotenv import load_dotenv
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings
from langchain_ollama import OllamaEmbeddings
from langchain_redis import RedisConfig, RedisVectorStore

load_dotenv()
Expand All @@ -18,13 +19,38 @@


def get_vector_store():
if os.environ.get("MODEL_NAME") == "ollama":
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, not the dryest file

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see other comment

return __get_ollama_vector_store()
elif os.environ.get("MODEL_NAME") == "openai":
return __get_openai_vector_store()

def __check_existing_embedding(vector_store):
results = vector_store.similarity_search(doc, k=1)
if not results:
raise Exception("Required content not found in existing store")

def __get_ollama_vector_store():
try:
config.from_existing = True
vector_store = RedisVectorStore(OllamaEmbeddings(model="llama3"), config=config)
__check_existing_embedding(vector_store)
except:
print("Init vector store with document")
config.from_existing = False

# TODO: define vector store for ollama
vector_store = None
return vector_store

def __get_openai_vector_store():
try:
config.from_existing = True
vector_store = RedisVectorStore(OpenAIEmbeddings(), config=config)
__check_existing_embedding(vector_store)
except:
print("Init vector store with document")
config.from_existing = False

# TODO: define vector store
# TODO: define vector store for openai
vector_store = None
return vector_store
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
langgraph==0.2.56
langchain==0.3.13
langchain-openai==0.2.3
langchain-ollama==0.2.2
langchain-redis==0.1.1
pydantic==2.9.2
python-dotenv==1.0.1
Expand Down
9 changes: 8 additions & 1 deletion test_setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,18 @@

from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_ollama import ChatOllama
from redis import Redis

load_dotenv()

llm = ChatOpenAI(model="gpt-4o")
if os.environ.get("MODEL_NAME") == "openai":
llm = ChatOpenAI(model="gpt-4o")
elif os.environ.get("MODEL_NAME") == "ollama":
llm = ChatOllama(model="llama3.1")
else:
raise Exception("Setup failed, MODEL_NAME not defined in .env")

client = Redis.from_url(os.environ.get("REDIS_URL"))


Expand Down