Cache and manage the embeddings in a persistent storage #616
Replies: 2 comments
-
Posting here some notes from the PR:
In an early SK prototype I used to cache embedding in the underlying http layer of the embedding generator, so I could use as a cache key the AI provider (e.g. OpenAI endpoint), the AI model name and other params contributing to uniqueness. These params are available more easily in the generator, rather than in code calling the generator. My recommendation would be integrating a cache behavior inside the generators, rather than caching in each client calling an embedding generator. |
Beta Was this translation helpful? Give feedback.
-
Looks like the PR has become stale, with a few things to address. If this is a pressing problem, the approach should be reusable (e.g. not having to add caching logic in every handler - usually caching is a cross-cutting concern solved with generic KV stores decoupled from specific scenarios), scale over multiple VMs (e.g. allow to extend the solution with Redis/Memcache), and being optional via config settings. |
Beta Was this translation helpful? Give feedback.
-
Context / Scenario
This post is to dive deeper into this PR for the related topic: #389
The problem
The problem is simple: we want to avoid calling the embedding API as much as possible since it is often slow and expensive.
One quick and cheap solution is to cache the embeddings by the content hash and see if there is any chance for the collision to happen when feeding the KM with a large documentation or multiple ones with repeated content (that's all above PR is all about).
BUT, I don't think this is an ideal solution for real world scenarios. Why? Because:
Let's skip the first one and go straight into the second scenario:
There are lots of cases where we want to update the existing document(s) or re-ingest them as content getting refreshed or updated, either it is a text document or a web page. In both cases, most of the content remain the same but embedding will happen again and again even if you re-import them using the same document id. This is a scenario I believe where a persistent embedding cache storage is needed for improving the speed and reducing the cost of continuously ingested documents.
Proposed solution
In addition to the FileStorageDb and MemoryDb for the vectors and text, we could have another abstraction + implementation for the EmbeddingsCacheDb where it can be configured and used by the GenerateEmbeddingsHandler to avoid re-generating the embeddings for the same partitioned content over time across workers. Ideally storing the content hash in a distributed cache storage like Redis and storing the associated embeddings in a blob storage to work across multiple workers.
We might just need to re-design or update the way how we store the embeddings to make sure it is easy to find if the embedding already exists for the given content hash, so we don't need to store them twice. Ideally just an additional hash mapping of the two is needed or maybe we include the hash in the entity name itself etc.
User should be able to:
Importance
would be great to have
Beta Was this translation helpful? Give feedback.
All reactions