Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Having trouble just to make it work #1082

Closed
1 of 2 tasks
GTimothee opened this issue Mar 13, 2025 · 6 comments
Closed
1 of 2 tasks

[Bug]: Having trouble just to make it work #1082

GTimothee opened this issue Mar 13, 2025 · 6 comments
Labels
bug Something isn't working

Comments

@GTimothee
Copy link

GTimothee commented Mar 13, 2025

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • I believe this is a legitimate bug, not just a question or feature request.

Describe the bug

I am trying to use lightrag with my llm deployment (openai api compatible) and it fails at multiple points:

While ingesting data I get this:

Processing documents: 25%|████████▎ | 2/8 [00:01<00:04, 1.35it/s]INFO:lightrag:Inserting 1 to doc_status INFO:lightrag:Stored 1 new unique documents INFO:lightrag:Number of batches to process: 1. INFO:lightrag:Start processing batch 1 of 1. INFO:lightrag:Inserting 1 to doc_status INFO:lightrag:Inserting 1 to chunks INFO:lightrag:Inserting 1 to full_docs INFO:lightrag:Inserting 1 to text_chunks INFO:lightrag:Non-embedding cached missed(mode:default type:extract) ERROR:lightrag:Failed to process document doc-40723ec49f1dad04b4823be95d04b22c: index 0 is out of bounds for axis 0 with size 0

or this:

INFO:lightrag:Non-embedding cached missed(mode:default type:extract) INFO:lightrag:Non-embedding cached missed(mode:default type:extract) INFO:lightrag:Non-embedding cached missed(mode:default type:extract) ERROR:lightrag:Failed to process document doc-984c029a349b75d6a5d1293e65c59695: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 4096 and the array at index 1 has size 1024

And when I try to use the rag of course it does not work and gives me error like :

INFO:lightrag:Inserting 1 to llm_response_cache INFO:lightrag:Non-embedding cached missed(mode:default type:extract) INFO:lightrag:Inserting 1 to llm_response_cache INFO:lightrag:Non-embedding cached missed(mode:default type:extract) INFO:lightrag:Inserting 1 to llm_response_cache INFO:lightrag:Non-embedding cached missed(mode:default type:extract) INFO:lightrag:Inserting 1 to llm_response_cache INFO:lightrag:Query nodes: SQuAD, Stanford Question Answering Dataset, Natural Language Processing, Machine learning, top_k: 60, cosine: 0.2 INFO:lightrag:Query edges: Question answering, Language models, Artificial intelligence, top_k: 60, cosine: 0.2 ERROR:lightrag:Error in get_kg_context: shapes (0,4096) and (1024,) not aligned: 4096 (dim 1) != 1024 (dim 0) Sorry, I'm not able to provide an answer to that question.[no-context]

The errors are really not explicit, we don't even know what component fails. I cannot activate logging because your README example is not up to date with the PYPI package and therefore these imports fail (module not found errors):

from lightrag.kg.shared_storage import initialize_pipeline_status from lightrag.utils import setup_logger

Steps to reproduce

Here is the file I am using: https://gist.github.com/GTimothee/32027026e8aef7dc5cb290b9b913953b

Expected Behavior

Just work without error

LightRAG Config Used

Default

Logs and screenshots

No response

Additional Information

  • LightRAG Version: lightrag-hku==1.2.3
  • Operating System: linux
  • Python Version: 3.10.14
  • Related Issues:
@GTimothee GTimothee added the bug Something isn't working label Mar 13, 2025
@ekinsenler
Copy link

It looks like you didn't set your EMBEDDING_DIM to match your embedding model's dim 4096 in your case.

@GTimothee
Copy link
Author

thanks for your help. Where should I put the EMBEDDING_DIM then ? I am passing the embedding model to the same rag object that embeds the data and that processes the query so I would expect that it does both with the model I pass ? What I am missing ?

@JoramMillenaar
Copy link
Contributor

The error looks unfamiliar, but after you passed the EMBEDDING_DIM, you might need to drop your current vector db's. Since it might be mismatching the dimensions with what you already have stored.

@ekinsenler
Copy link

You have to set that inside the .env file

@bastianwegge
Copy link

@GTimothee for me this response helped: #727 (comment)
Essentially, you want to select an embedding-model and the according dimension it comes with, if I understood this correctly.

@GTimothee
Copy link
Author

GTimothee commented Mar 20, 2025

Thanks for your answers. No need to set the env variable, I just had to change the argument at embedding model's creation time, I've set an embedding size that did not match my model. I also had the same problem with the llm context size; setting the max context length parameter solved the issue. I have a working project now 👍 (Found my answer in @bastianwegge 's suggestion, thanks)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants