[Question]: Why status for full_docs and chunks storage in OracleDB #1008

lselch · 2025-03-05T13:02:45Z

Do you need to ask a question?

I have searched the existing question and discussions and this question is not already answered.
I believe this is a legitimate question, not just a bug or feature request.

Your Question

What is the purpose of storing the document status in the LIGHTRAG_DOC_FULL and LIGHTRAG_DOC_CHUNKS for the oracle implementation? I haven't seen it in other storage implementations, only for the document status storage, which is currently not implemented for Oracle. When I try to run the graph indexing, I get a KeyError in the upsert method for the OracleKVStorage, as the document status is not provided here:

async def upsert(self, data: dict[str, dict[str, Any]]) -> None:
        logger.info(f"Inserting {len(data)} to {self.namespace}")
        if not data:
            return

        if is_namespace(self.namespace, NameSpace.KV_STORE_TEXT_CHUNKS):
            list_data = [
                {
                    "id": k,
                    **{k1: v1 for k1, v1 in v.items()},
                }
                for k, v in data.items()
            ]
            contents = [v["content"] for v in data.values()]
            batches = [
                contents[i : i + self._max_batch_size]
                for i in range(0, len(contents), self._max_batch_size)
            ]
            embeddings_list = await asyncio.gather(
                *[self.embedding_func(batch) for batch in batches]
            )
            embeddings = np.concatenate(embeddings_list)
            for i, d in enumerate(list_data):
                d["__vector__"] = str(embeddings[i].tolist())

            merge_sql = SQL_TEMPLATES["merge_chunk"]
            for item in list_data:
                _data = {
                    "id": item["id"],
                    "content": item["content"],
                    "workspace": self.db.workspace,
                    "tokens": item["tokens"],
                    "chunk_order_index": item["chunk_order_index"],
                    "full_doc_id": item["full_doc_id"],
                    "content_vector": item["__vector__"],
                    "status": item["status"],
                }
                await self.db.execute(merge_sql, _data)

Additional Context

No response

The text was updated successfully, but these errors were encountered:

lselch added the question Further information is requested label Mar 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: Why status for full_docs and chunks storage in OracleDB #1008

[Question]: Why status for full_docs and chunks storage in OracleDB #1008

lselch commented Mar 5, 2025

[Question]: Why status for full_docs and chunks storage in OracleDB #1008

[Question]: Why status for full_docs and chunks storage in OracleDB #1008

Comments

lselch commented Mar 5, 2025

Do you need to ask a question?

Your Question

Additional Context