Performance for very large dataset #663

roldengarm · 2024-06-12T23:21:58Z

roldengarm
Jun 12, 2024

We're using Kernel Memory as a service to ingest about 9 million text records. It's set up as a service on an Azure App Service, with Azure Queues, embedding-3-large on Azure OpenAI and Postgres as database.
To ingest, we're using an Azure Function that calls the KM Web Service to ingest a document & waits until it's ready. It's configured to do max 12 in parallel.
The current throughput is about 150-200 text records per minute, so the entire data set will take 30-40 days.

Initially I had it running without any throttling, i.e. the Azure Function would just keep ingesting documents, but then the KM service would fall over, for further details see here. That's when I implemented the parallelization.
In that topic, batching was discussed, but that isn't ready yet.

App Service Plan runs at ~10-20% CPU. The main bottleneck seems to be the embedding generation; when I let it go unthrottled it went down because of quota in Azure OpenAI.

I've just been on the Semantic Kernel Office Hours chat and they recommended to reach out here.

In the interim, is there anything we can do to improve the performance?

dluc · 2024-06-15T05:20:08Z

dluc
Jun 15, 2024
Maintainer

I'm assuming this is related to #666

A few recommendations:

try using model text-embedding-3-small for embeddings. Most likely you won't notice a difference in ranking, while the process should be faster and cheaper. To avoid throwing away embeddings already calculated, use a separate deployment, so you can go back to the embeddings you already paid for. Make sure you don't reuse the Azure blobs, when document IDs are indexed.
if the service is crashing because OpenAI is returning too many 500 Server Errors:
- make sure you're using batch requests - if the code is not merged, consider patching it manually
- increase the AI service quota/SLA, ie scale out the resources dedicated to your workload so the work can be done faster

8 replies

roldengarm Jul 1, 2024
Author

@dluc After research, I've created an HNSW index using Cosine distance. The index generated fine after ~1 hour.
However, I'm still getting HTTP/500 while calling Ask/Search using the MemoryClient.

When checking the logs on KM, it's showing a Timeout exception every single time. The ingestion has stopped, and the Postgres server has more than enough CPU/memory.
I've installed PGAdmin4 and I can see that this query is being executed:

                SELECT id,tags,content,payload, 1 - (embedding <=> $1) AS __similarity
                FROM public."km-default"
                WHERE TRUE
                ORDER BY __similarity DESC
                LIMIT $2
                OFFSET $3

Why is it doing a "WHERE TRUE" query? When checking the other stats, I can see this would return 6 million tuples. Every peak is a try to the Search/Ask endpoint

I'm absolutely not a Postgres expert, so perhaps I'm misreading things, but at the moment I can't use Kernel Memory.
There are about 5 million records ingested, I would not expect this to be an issue.

Is this a bug in Kernel Memory?

dluc Jul 2, 2024
Maintainer

The WHERE TRUE SQL means you are calling the memory APIs without filters, probably passing just a text. The query is paginating records using LIMIT (how many records) and OFFSET (starting from which record).

Since it's only searching for text, it has to compare all records and sort them, before returning each page, using the index on the embedding column.

You should be able to analyze the SQL with PG, to make sure it's using the index, e.g. here's an example: https://thoughtbot.com/blog/reading-an-explain-analyze-query-plan

roldengarm Jul 3, 2024
Author

@dluc thanks for your reply.

It seems the index is working fine. As a test, I've retrieved the embedding value of a document, and then ran this query:

EXPLAIN ANALYSE SELECT id,embedding FROM public."km-default"
     ORDER BY embedding <=> '[8.325463e-05,-0.0010178639,-.....]
     LIMIT 5;

The result came back within 2 seconds and says it has done an index scan:

However... when I try to explain the query that's executed by KernelMemory, it takes minutes & uses up all memory (64GB), the EXPLAIN doesn't even finish.

After fiddling with the query, I found out why: it's because the "1 - (...)" for the similarity calculation.
Without 1 - it works fine.

As a test, I've removed the 1 - and changed it to order ASCENDING, deployed it to our KM instance, and then I get a response within 5-10 seconds!

So it looks like a bug in Kernel Memory to me. What are your thoughts? Do you want me to raise a PR?

This is the change I made:

dluc Jul 3, 2024
Maintainer

that looks like a bug, I wonder why we wrote the query that way..

roldengarm Jul 3, 2024
Author

Yeah, think it's a bug, I don't see any reason why it's needed. With my change, the end result should be the same.

I've created a pull request: #684

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance for very large dataset #663

{{title}}

Replies: 1 comment 8 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Performance for very large dataset #663

roldengarm Jun 12, 2024

Replies: 1 comment · 8 replies

dluc Jun 15, 2024 Maintainer

roldengarm Jul 1, 2024 Author

dluc Jul 2, 2024 Maintainer

roldengarm Jul 3, 2024 Author

dluc Jul 3, 2024 Maintainer

roldengarm Jul 3, 2024 Author

roldengarm
Jun 12, 2024

Replies: 1 comment 8 replies

dluc
Jun 15, 2024
Maintainer

roldengarm Jul 1, 2024
Author

dluc Jul 2, 2024
Maintainer

roldengarm Jul 3, 2024
Author

dluc Jul 3, 2024
Maintainer

roldengarm Jul 3, 2024
Author