Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: <title>value too long for type character varying(255) #1088

Open
2 tasks done
jasperchen01 opened this issue Mar 14, 2025 · 3 comments
Open
2 tasks done

[Bug]: <title>value too long for type character varying(255) #1088

jasperchen01 opened this issue Mar 14, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@jasperchen01
Copy link

jasperchen01 commented Mar 14, 2025

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • I believe this is a legitimate bug, not just a question or feature request.

Describe the bug

I'm using the main branch, demo code as below:

import asyncio
import logging
import os
import time
from dotenv import load_dotenv

from lightrag import LightRAG, QueryParam
from lightrag.llm.zhipu import zhipu_complete
from lightrag.llm.ollama import ollama_embedding
from lightrag.llm.openai import openai_embed,openai_complete_if_cache
from lightrag.utils import EmbeddingFunc
from lightrag.kg.shared_storage import initialize_pipeline_status
import numpy as np


CURRENT_DIR = os.path.dirname(os.path.abspath(__file__))
WORKING_DIR = f"{CURRENT_DIR}/lightrag_data"

FILE_PATH = f"{CURRENT_DIR}/../../data_dir/marker_output/2305_15323v1.md"
FILE_NAME = os.path.basename(FILE_PATH)

logging.basicConfig(format="%(levelname)s:%(message)s", level=logging.DEBUG)

if not os.path.exists(WORKING_DIR):
    os.mkdir(WORKING_DIR)

# PG
os.environ["AGE_GRAPH_NAME"] = "dickens"
os.environ["POSTGRES_HOST"] = "localhost"
os.environ["POSTGRES_PORT"] = "5432"
os.environ["POSTGRES_USER"] = "postgres"
os.environ["POSTGRES_PASSWORD"] = ""
os.environ["POSTGRES_DATABASE"] = "lightrag"

# neo4j
os.environ["NEO4J_URI"] = "neo4j://localhost:7687"
os.environ["NEO4J_USERNAME"] = "neo4j"
os.environ["NEO4J_PASSWORD"] = "admin123"

async def _llm_model_func(
    prompt, system_prompt=None, history_messages=[], keyword_extraction=False, **kwargs
) -> str:
    return await openai_complete_if_cache(
        model="qwen-plus-latest",
        prompt=prompt,
        system_prompt=system_prompt,
        history_messages=history_messages,
        api_key="***",
        base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
        **kwargs
    )

async def _embedding_func(texts: list[str]) -> np.ndarray:
    return await openai_embed(
        texts,
        model="text-embedding-v3",
        api_key="***",
        base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
    )

async def initialize_rag():
    rag = LightRAG(
        namespace_prefix=FILE_NAME,
        working_dir=WORKING_DIR,
        llm_model_func=_llm_model_func,
        llm_model_max_async=4,
        llm_model_max_token_size=32768,
        enable_llm_cache_for_entity_extract=True,
        embedding_func=EmbeddingFunc(
            embedding_dim=1024,
            max_token_size=8192,
            func=_embedding_func,
        ),
        embedding_batch_num=10,
        embedding_func_max_async=10,
        embedding_cache_config={
            "enabled": "true",
            "similarity_threshold": 0.95,
            "use_llm_check": False,
        },
        kv_storage="PGKVStorage",
        doc_status_storage="PGDocStatusStorage",
        graph_storage="Neo4JStorage",
        vector_storage="PGVectorStorage",
        auto_manage_storages_states=False,
        # llm_model_kwargs={
        #     "response_format": {"type": "json_object"},
        #     "extra_body": {"enable_search": True}
        #     },
        addon_params={
            "language": "Chinese"
        },
    )

    await rag.initialize_storages()
    await initialize_pipeline_status()

    return rag


async def main():
    # Initialize RAG instance
    rag = await initialize_rag()


    # add embedding_func for graph database, it's deleted in commit 5661d76860436f7bf5aef2e50d9ee4a59660146c
    rag.chunk_entity_relation_graph.embedding_func = rag.embedding_func

    with open(FILE_PATH, "r", encoding="utf-8") as f:
        await rag.ainsert(f.read(), ids=[FILE_NAME])


if __name__ == "__main__":
    asyncio.run(main())

And get errors:

error:value too long for type character varying(255)
Failed to extract entities and relationships
Failed to process document doc-6b187f963bb8be55d3cc73c6faf9f7db: value too long for type character varying(255)

Steps to reproduce

No response

Expected Behavior

No response

LightRAG Config Used

Paste your config here

Logs and screenshots

No response

Additional Information

  • LightRAG Version:
  • Operating System:
  • Python Version:
  • Related Issues:
@jasperchen01 jasperchen01 added the bug Something isn't working label Mar 14, 2025
@JoramMillenaar
Copy link
Contributor

JoramMillenaar commented Mar 14, 2025

I ran into a similar issue. I had a lot of data revolving around a single entity and LightRAG appends a new description to the entity every time the LLM recognizes that entity in the data. So, if that entity is detected in many parts of your data, it will keep appending new descriptors of that entity, until you get this error saying that the Postgres column only allows for 255 characters.

We should probably look into fixing this. But if you're looking for a quick fix, you could change the field to a TEXT field instead of VARCHAR and rebuild your db.

@bzImage
Copy link

bzImage commented Mar 20, 2025

after doing a git pull, creating again the postgres database and processing a bunch of input files:

--

error:value too long for type character varying(255) Failed to extract entities and relationships Failed to process document EW2401-004: value too long for type character varying(255)
__

@JoramMillenaar
Copy link
Contributor

JoramMillenaar commented Mar 20, 2025

I looked through it and tested it again and it's working for me (It is storing 255+ characters).
Try to rebuild your graph with a fresh db.

There was this PR recently merged #1120 that changed the field again (to a VAR(255) Array), which might be why you're experiencing issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants