Skip to content

RC: LangCache public preview #1703

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 82 additions & 0 deletions content/operate/rc/langcache/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
---
alwaysopen: false
categories:
- docs
- operate
- rc
description: Store LLM responses for AI applications in Redis Cloud.
hideListLinks: true
linktitle: LangCache
title: Semantic caching with LangCache
weight: 36
---

LangCache is a semantic caching service available as a REST API that stores LLM responses for fast and cheaper retrieval, built on the Redis vector database. By using semantic caching, customers can significantly reduce API costs and lower the average latency of their generative AI applications.

## LangCache overview

LangCache uses semantic caching to store and reuse previous LLM responses for repeated queries. Instead of calling the LLM for every request, LangCache checks if a similar response has already been generated and is stored in the cache. If a match is found, LangCache returns the cached response instantly, saving time and resources.

Imagine you’re using an LLM to build an agent to answer questions about your company's products. Your users may ask questions like the following:

- "What are the features of Product A?"
- "Can you list the main features of Product A?"
- "Tell me about Product A’s features."

These prompts may have slight variations, but they essentially ask the same question. LangCache can help you avoid calling the LLM for each of these prompts by caching the response to the first prompt and returning it for any similar prompts.

Using LangCache as a semantic caching service in Redis Cloud has the following benefits:

- **Lower LLM costs**: Reduce costly LLM calls by easily storing the most frequently-requested responses.
- **Faster AI app responses**: Get faster AI responses by retrieving previously-stored requests from memory.
- **Simpler Deployments**: Access our managed service via a REST API with automated embedding generation, configurable controls.
- **Advanced cache management**: Manage data access and privacy, eviction protocols, and monitor usage and cache hit rates.

### LLM cost reduction with LangCache

LangCache reduces your LLM costs by caching responses and avoiding repeated API calls. When a response is served from cache, you don’t pay for output tokens. Input token costs are typically offset by embedding and storage costs.

For every cached response, you'll save the output token cost. To calculate your monthly savings with LangCache, you can use the following formula:

```bash
Est. monthly savings with LangCache =
(Monthly output token costs) × (Cache hit rate)
```

The more requests you serve from LangCache, the more you save, because you’re not paying to regenerate the output.

Here’s an example:
- Monthly LLM spend: $200
- Percentage of output tokens in your spend: 60%
- Cost of output tokens: $200 × 60% = $120
- Cache hit rate: 50%
- Estimated savings: $120 × 50% = $60/month

{{<note>}}
The formula and numbers above provide a rough estimate of your monthly savings. Actual savings will vary depending on your usage.
{{</note>}}

## LangCache architecture

The following diagram displays how you can integrate LangCache into your GenAI app:

{{< image filename="images/rc/langcache-process.png" >}}

1. A user sends a prompt to your AI app.
1. Your app sends the prompt to LangCache through the `POST /v1/caches/{cacheId}/search` endpoint.
1. LangCache calls an embedding model service to generate an embedding for the prompt.
1. LangCache searches the cache to see if a similar response already exists by matching the embeddings of the new query with the stored embeddings.
1. If a semantically similar entry is found (also known as a cache hit), LangCache gets the cached response and returns it to your app. Your app can then send the cached response back to the user.
1. If no match is found (also known as a cache miss), your app receives an empty response from LangCache. Your app then queries your chosen LLM to generate a new response.
1. Your app sends the prompt and the new response to LangCache through the `POST /v1/caches/{cacheId}/entries` endpoint.
1. LangCache stores the embedding with the new response in the cache for future use.

## Get started with LangCache on Redis Cloud

To set up LangCache on Redis Cloud:

1. [Create a database]({{< relref "/operate/rc/databases/create-database" >}}) on Redis Cloud.
2. [Create a LangCache service]({{< relref "/operate/rc/langcache/create-service" >}}) for your database.
3. [Use the LangCache API]({{< relref "/operate/rc/langcache/use-langcache" >}}) from your client app.

After you set up LangCache, you can [view and edit the cache]({{< relref "/operate/rc/langcache/view-edit-cache" >}}) and [monitor the cache's performance]({{< relref "/operate/rc/langcache/monitor-cache" >}}).
132 changes: 132 additions & 0 deletions content/operate/rc/langcache/create-service.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
---
alwaysopen: false
categories:
- docs
- operate
- rc
description: null
hideListLinks: true
linktitle: Create service
title: Create a LangCache service
weight: 5
---

Redis LangCache provides vector search capabilities and efficient caching for AI-powered applications. This guide walks you through creating and configuring a LangCache service in Redis Cloud.

## Prerequisites

To create a LangCache service, you will need:

- A Redis Cloud database. If you don't have one, see [Create a database]({{< relref "/operate/rc/databases/create-database" >}}).

{{< note >}}
LangCache does not support the following databases during public preview:
- Databases with a [CIDR allow list]({{< relref "/operate/rc/security/cidr-whitelist" >}})
- [Active-Active]({{< relref "/operate/rc/databases/configuration/active-active-redis" >}}) databases
- Databases with the [default user]({{< relref "/operate/rc/security/access-control/data-access-control/default-user" >}}) turned off
{{< /note >}}

- An [OpenAI API key](https://platform.openai.com/api-keys). LangCache uses OpenAI to generate embeddings for prompts and responses.

## Create a LangCache service

From the [Redis Cloud console](https://cloud.redis.io/), select **LangCache** from the left-hand menu.

When you access the LangCache page for the first time, you will see a page with an introduction to LangCache. Select **Let's create a service** to create your first service.

{{<image filename="images/rc/langcache-create-first-service.png" alt="The Let's create a service button." width="200px" >}}

If you have already created a LangCache service, select **New service** to create another one.

{{<image filename="images/rc/langcache-new-service.png" alt="The New service button." width="150px" >}}

This takes you to the **Create LangCache service** page. The page is divided into the following sections:

1. The [General settings](#general-settings) section defines basic properties of your service.
1. The [Embedding settings](#embedding-settings) section defines the embedding model used by your service.
1. The [Attributes settings](#attributes-settings) section allows you to define attributes for your service.

### General settings

The **General settings** section defines basic properties of your service.

{{<image filename="images/rc/langcache-general-settings.png" alt="The General settings section." >}}

| Setting name |Description|
|:----------------------|:----------|
| **Service name** | Enter a name for your LangCache service. We recommend you use a name that describes your service's purpose. |
| **Select database** | Select the Redis Cloud database to use for this service from the list. |
| **TTL** | The number of seconds to cache entries before they expire. Default: `No expiration` - items in the cache will remain until manually removed. |
| **User** | The [database access user]({{< relref "/operate/rc/security/access-control/data-access-control/role-based-access-control" >}}) to use for this service. LangCache only supports the [`default` user]({{< relref "/operate/rc/security/access-control/data-access-control/default-user" >}}) during public preview. |

### Embedding settings

The **Embedding settings** section defines the embedding model used by your service.

{{<image filename="images/rc/langcache-embedding-settings.png" alt="The Embedding settings section." >}}

| Setting name |Description|
|:----------------------|:----------|
| **Supported Embedding Provider** | The embedding provider to use for your service. LangCache only supports OpenAI during public preview. |
| **Embedding provider API key** | Enter your embedding provider's API key. |
| **Model** | Select the embedding model to use for your service. |
| **Similarity threshold** | Set the minimum similarity score required to consider a cached response a match. Range: `0.0` to `1.0`. Default: `0.9`<br/><br/>A higher value means more precise matches, but if it's too high, you will compromise on the number of matches and may lose relevant matches. A lower value means more matches, but may include less relevant matches. We recommend starting between `0.8` and `0.9` and then fine-tuning based on your results. |

### Attributes settings

Attributes provide powerful scoping capabilities for your LangCache operations. Think of them as tags or labels that help you organize and manage your cached data with precision.

The **Attributes settings** section allows you to define attributes for your service. It is collapsed by default.

{{<image filename="images/rc/langcache-attribute-settings.png" alt="The Attributes settings section, expanded." >}}

LangCache allows you to define up to 5 custom attributes that align with your specific use case. To add a new attribute:

1. Select **Add attribute**.

{{<image filename="images/rc/langcache-add-attribute.png" alt="The Add attribute button." width="150px" >}}

1. Give your custom attribute a descriptive name and select the check mark button to save it.

{{<image filename="images/rc/langcache-custom-attributes.png" alt="The custom attributes section. Select the Confirm add attribute button to save your attribute." >}}

After you save your custom attribute, it will appear in the list of custom attributes. Use the **Delete** button to remove it.

{{<image filename="images/rc/icon-delete-teal.png" width="36px" alt="Select the Delete button to delete the selected attribute." >}}

You can also select **Add attribute** again to add an additional attribute.

{{<image filename="images/rc/langcache-add-attribute.png" alt="The Add attribute button." width="150px" >}}

### Create service

When you are done setting the details of your LangCache service, select **Create** to create it.

{{<image filename="images/rc/button-access-management-user-key-create.png" alt="Use the Create button to create a LangCache service." >}}

A window containing your LangCache service key will appear. Select **Copy** to copy the key to your clipboard.

{{<image filename="images/rc/langcache-service-key.png" alt="The LangCache service key window. Use the Copy button to save the service key to the clipboard." >}}

{{<warning>}}
This is the only time the value of the user key is available. Save it to a secure location before closing the dialog box.<br/><br/>

If you lose the service key value, you will need to [replace the service key]({{< relref "/operate/rc/langcache/view-edit-cache#replace-service-api-key" >}}) to be able to use the LangCache API.
{{</warning>}}

You'll be taken to your LangCache service's **Configuration** page. You'll also be able to see your LangCache service in the LangCache service list.

{{<image filename="images/rc/langcache-service-list.png" alt="The LangCache service in the LangCache service list." >}}

If an error occurs, verify that:
- Your database is active.
- You have provided a valid OpenAI API key.
- You have provided valid values for all the required fields.

For help, [contact support](https://redis.io/support/).

## Next steps

After your cache is created, you can [use the LangCache API]({{< relref "/operate/rc/langcache/use-langcache" >}}) from your client app.

You can also [view and edit the cache]({{< relref "/operate/rc/langcache/view-edit-cache" >}}) and [monitor the cache's performance]({{< relref "/operate/rc/langcache/monitor-cache" >}}).
55 changes: 55 additions & 0 deletions content/operate/rc/langcache/monitor-cache.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
alwaysopen: false
categories:
- docs
- operate
- rc
description: null
hideListLinks: true
linktitle: Monitor cache
title: Monitor a LangCache service
weight: 20
---

You can monitor a LangCache service's performance from the **Metrics** tab of the service's page.

{{<image filename="images/rc/langcache-metrics.png" alt="The metrics tab of the LangCache service's page." >}}

The **Metrics** tab provides a series of graphs showing performance data for your LangCache service.

You can switch between daily and weekly stats using the **Day** and **Week** buttons at the top of the page. Each graph also includes minimum, average, maximum, and latest values.

## LangCache metrics reference

### Cache hit ratio

The percentage of requests that were successfully served from the cache without needing to call the LLM API. A healthy cache will generally show an increasing hit ratio over time as it becomes more populated by cached responses.

To optimize your cache hit ratio:

- Tune similarity thresholds to capture more semantically related queries.
- Analyze recurring query patterns to fine-tune your embedding strategies.
- Test different embedding models to understand their impact on cache hit rates.

A higher cache hit ratio does not always mean better performance. If the cache is too lenient in its similarity matching, it may return irrelevant responses, leading to a higher cache hit rate but poorer overall performance.

### Cache search requests

The number of read attempts against the cache at the specified time. This metric can help you understand the load on your cache and identify periods of high or low activity.

### Cache latency

The average time to process a cache lookup request. This metric can help you identify performance bottlenecks and optimize your cache configuration.

Cache latency is highly dependent on embedding model performance, since the cache must generate embeddings for each request in order to compare them to the cached responses.

High cache latency may indicate one of the following:

- Inefficient embedding generation from the embedding provider
- Large cache requiring longer comparison times
- Network latency between the cache and embedding provider
- Resource constraints

### Cache items

The total number of entries stores in your cache. Each item includes the query string, embedding, response, and other metadata.
121 changes: 121 additions & 0 deletions content/operate/rc/langcache/use-langcache.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
---
alwaysopen: false
categories:
- docs
- operate
- rc
description: null
hideListLinks: true
linktitle: Use LangCache
title: Use the LangCache API with your GenAI app
weight: 10
---

You can use the LangCache API from your client app to store and retrieve LLM, RAG, or agent responses.

To access the LangCache API, you need:

- LangCache API base URL
- LangCache service API key
- Cache ID

The base URL and cache ID are available in the LangCache service's **Configuration** page in the [**Connectivity** section]({{< relref "/operate/rc/langcache/view-edit-cache#connectivity" >}}).

The LangCache API key is only available immediately after you create the LangCache service. If you lost this value, you will need to [replace the service API key]({{< relref "/operate/rc/langcache/view-edit-cache#replace-service-api-key" >}}) to be able to use the LangCache API.

When you call the API, you need to pass the LangCache API key in the `Authorization` header as a Bearer token and the Cache ID as the `cacheId` path parameter.

For example, to check the health of the cache using `cURL`:

```bash
curl -s -X GET "https://$HOST/v1/caches/$CACHE_ID/health" \
-H "accept: application/json" \
-H "Authorization: Bearer $API_KEY"
```

- The example expects several variables to be set in the shell:

- **$HOST** - the LangCache API base URL
- **$CACHE_ID** - the Cache ID of your cache
- **$API_KEY** - The LangCache API token

{{% info %}}
This example uses `cURL` and Linux shell scripts to demonstrate the API; you can use any standard REST client or library.
{{% /info %}}

## Check cache health

Use `GET /v1/caches/{cacheId}/health` to check the health of the cache.

```sh
GET https://[host]/v1/caches/{cacheId}/health
```

## Search LangCache for similar responses

Use `POST /v1/caches/{cacheId}/search` to search the cache for matching responses to a user prompt.

```sh
POST https://[host]/v1/caches/{cacheId}/search
{
"prompt": "User prompt text"
}
```

Place this call in your client app right before you call your LLM's REST API. If LangCache returns a response, you can send that response back to the user instead of calling the LLM.

If LangCache does not return a response, you should call your LLM's REST API to generate a new response. After you get a response from the LLM, you can [store it in LangCache](#store-a-new-response-in-langcache) for future use.

You can also scope the responses returned from LangCache by adding an `attributes` object to the request. LangCache will only return responses that match the attributes you specify.

```sh
POST https://[host]/v1/caches/{cacheId}/search
{
"prompt": "User prompt text",
"attributes": {
"customAttributeName": "customAttributeValue"
}
}
```

## Store a new response in LangCache

Use `POST /v1/caches/{cacheId}/entries` to store a new response in the cache.

```sh
POST https://[host]/v1/caches/{cacheId}/entries
{
"prompt": "User prompt text",
"response": "LLM response text"
}
```

Place this call in your client app after you get a response from the LLM. This will store the response in the cache for future use.

You can also store the responses with custom attributes by adding an `attributes` object to the request.

```sh
POST https://[host]/v1/caches/{cacheId}/entries
{
"prompt": "User prompt text",
"response": "LLM response text",
"attributes": {
"customAttributeName": "customAttributeValue"
}
}
```

## Delete cached responses

Use `DELETE /v1/caches/{cacheId}/entries/{entryId}` to delete a cached response from the cache.

You can also use `DELETE /v1/caches/{cacheId}/entries` to delete multiple cached responses at once. If you provide an `attributes` object, LangCache will delete all responses that match the attributes you specify.

```sh
DELETE https://[host]/v1/caches/{cacheId}/entries
{
"attributes": {
"customAttributeName": "customAttributeValue"
}
}
```
Loading