-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite llm query operator #1216
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Now can you delete execute_query / _query_text_object (assuming this thing is equivalent)?
if self.include_image and len(result.messages) > 0: | ||
from sycamore.utils.pdf_utils import get_element_image | ||
|
||
result.messages[-1].images = [get_element_image(elt, doc)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might also want to add the image of prev here?
skips = [] | ||
counter = 0 | ||
for e, _ in elt_doc_pairs: | ||
if self._filter(e) and (not self._number_of_elements or counter < self._number_of_elements): | ||
counter += 1 | ||
else: | ||
skips.append(e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skips should be an array of bools (with length = num of elements) if the rest of its usage remains the same
skips = [] | ||
counter = 0 | ||
for e, _ in elt_doc_pairs: | ||
if self._filter(e) and (not self._number_of_elements or counter < self._number_of_elements): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we filtering twice when I want to run Table Merger? Once here for table and then again in render_elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For table merger, I think mostly, the prompt side does not have the filter or skip information that's collected in the llm_map_elements.
self._user_templates: Union[None, list[Template]] = None | ||
|
||
def render_element(self, elt: Element, doc: Document) -> RenderedPrompt: | ||
filtered = [e for e in doc.elements if e.type == "table"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I understanding this wrong or for every document we have O(len(doc.elements)^2)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, seems N square, any suggestion to avoid this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HenryL27 no wiggle room in render_elements for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the alternative is to do a single pass beforehand and write down a pointer on each element to prev.
then in render elements you're O(1) right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be able to avoid needing this particular class entirely since you can do that lookup in Jinja - something like
"""
{%- set prev = doc.elements[elt.properties["_prev_table"]] if "_prev_table" in elt.properties else None -%}
"""
tho I guess if you want to include the image of prev you need to know prev in the render fn
d6ff7c5
to
969d929
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only transform where you care about index of prev filtered element. Can revisit later if processing is too slow.
This is part of the series of tasks for converting llm operator to use Henry's new LLMMap or LLMElementMap.
This is part of the series of tasks for converting llm operator to use Henry's new LLMMap or LLMElementMap.