Rewrite llm query operator #1216

bohou-aryn · 2025-03-06T23:34:32Z

This is part of the series of tasks for converting llm operator to use Henry's new LLMMap or LLMElementMap.

HenryL27

Thanks! Now can you delete execute_query / _query_text_object (assuming this thing is equivalent)?

HenryL27 · 2025-03-06T23:39:38Z

lib/sycamore/sycamore/llms/prompts/prompts.py

+        if self.include_image and len(result.messages) > 0:
+            from sycamore.utils.pdf_utils import get_element_image
+
+            result.messages[-1].images = [get_element_image(elt, doc)]


might also want to add the image of prev here?

HenryL27 · 2025-03-06T23:44:12Z

lib/sycamore/sycamore/transforms/base_llm.py

+        skips = []
+        counter = 0
+        for e, _ in elt_doc_pairs:
+            if self._filter(e) and (not self._number_of_elements or counter < self._number_of_elements):
+                counter += 1
+            else:
+                skips.append(e)


skips should be an array of bools (with length = num of elements) if the rest of its usage remains the same

dhruvkaliraman7 · 2025-03-07T00:44:18Z

lib/sycamore/sycamore/transforms/base_llm.py

+        skips = []
+        counter = 0
+        for e, _ in elt_doc_pairs:
+            if self._filter(e) and (not self._number_of_elements or counter < self._number_of_elements):


Why are we filtering twice when I want to run Table Merger? Once here for table and then again in render_elements.

For table merger, I think mostly, the prompt side does not have the filter or skip information that's collected in the llm_map_elements.

dhruvkaliraman7 · 2025-03-07T00:46:23Z

lib/sycamore/sycamore/llms/prompts/prompts.py

+        self._user_templates: Union[None, list[Template]] = None
+
+    def render_element(self, elt: Element, doc: Document) -> RenderedPrompt:
+        filtered = [e for e in doc.elements if e.type == "table"]


Am I understanding this wrong or for every document we have O(len(doc.elements)^2)?

yes, seems N square, any suggestion to avoid this?

@HenryL27 no wiggle room in render_elements for this?

I think the alternative is to do a single pass beforehand and write down a pointer on each element to prev.
then in render elements you're O(1) right?

You might be able to avoid needing this particular class entirely since you can do that lookup in Jinja - something like

""" {%- set prev = doc.elements[elt.properties["_prev_table"]] if "_prev_table" in elt.properties else None -%} """

tho I guess if you want to include the image of prev you need to know prev in the render fn

dhruvkaliraman7

Only transform where you care about index of prev filtered element. Can revisit later if processing is too slow.

This is part of the series of tasks for converting llm operator to use Henry's new LLMMap or LLMElementMap.

bohou-aryn requested review from HenryL27 and dhruvkaliraman7 March 6, 2025 23:34

HenryL27 reviewed Mar 6, 2025

View reviewed changes

dhruvkaliraman7 reviewed Mar 7, 2025

View reviewed changes

bohou-aryn force-pushed the llm_query branch 2 times, most recently from d6ff7c5 to 969d929 Compare March 7, 2025 22:02

dhruvkaliraman7 approved these changes Mar 8, 2025

View reviewed changes

Rewrite llm query operator

52ec710

This is part of the series of tasks for converting llm operator to use Henry's new LLMMap or LLMElementMap.

bohou-aryn force-pushed the llm_query branch from 969d929 to 52ec710 Compare March 8, 2025 00:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite llm query operator #1216

Rewrite llm query operator #1216

bohou-aryn commented Mar 6, 2025

HenryL27 left a comment

HenryL27 Mar 6, 2025

HenryL27 Mar 6, 2025

dhruvkaliraman7 Mar 7, 2025

bohou-aryn Mar 7, 2025

dhruvkaliraman7 Mar 7, 2025

bohou-aryn Mar 7, 2025

dhruvkaliraman7 Mar 8, 2025

HenryL27 Mar 8, 2025

HenryL27 Mar 8, 2025 •

edited

Loading

dhruvkaliraman7 left a comment

Rewrite llm query operator #1216

Are you sure you want to change the base?

Rewrite llm query operator #1216

Conversation

bohou-aryn commented Mar 6, 2025

HenryL27 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HenryL27 Mar 8, 2025 • edited Loading

Choose a reason for hiding this comment

dhruvkaliraman7 left a comment

Choose a reason for hiding this comment

HenryL27 Mar 8, 2025 •

edited

Loading