Add document-retrieval to Hub as a task #1097

merveenoyan · 2025-01-13T10:11:35Z

This PR adds document retrieval to Hub for following models that are used heavily and now there's a lot of them:

Icon looks like this:

Other names I thought of:

multimodal-feature-extraction (ambiguous, also overlaps with zero shot image classification)
document-feature-extraction (sounds like plain document embeddings for a document backbone, best if this doesn't happen and those models are covered under image-feature-extraction instead)
zero-shot-document-classification (too long, sounds odd but would be accurate, but we use these models often for retrieval and not this purpose)
this name is very to the point so best if it stays like it

merveenoyan · 2025-01-13T12:29:14Z

@NielsRogge

merveenoyan · 2025-01-15T13:55:25Z

@pcuenca can you leave a review? 👀💗

julien-c

is it not already very close to image-feature-extraction? How many models would be covered by this? (i'm wondering if it isn't too specific)

julien-c · 2025-01-17T17:55:47Z

(no strong opinion though)

pcuenca · 2025-01-17T17:59:52Z

packages/tasks/src/pipelines.ts

@@ -676,6 +676,11 @@ export const PIPELINE_DATA = {
 		color: "red",
 		hideInDatasets: true,
 	},
+	"document-retrieval": {


Sounds a bit NLP-y to me. Could we use something like "visual-document-retrieval", for symmetry with "visual-question-answering"?

pcuenca · 2025-01-17T18:02:16Z

How many models would be covered by this? (i'm wondering if it isn't too specific)

I have the same question. Would, for example, all the ColPali models be included here?

pcuenca · 2025-01-17T18:03:27Z

We'd also have to tag multiple models before merging, as usual.

merveenoyan · 2025-01-18T09:11:27Z

@pcuenca I wanted to wait for the naming consensus before opening PRs to them. we can do visual-document-retrieval yes.

@julien-c it's actually not. those are singular image backbones used to train traditional vision models. these models on the contrast are zero shot models built on VLMs to do document retrieval on multimodal RAG pipelines. they're a bit like CLIP, but for documents, and they're not used for classification (they have long context length and have fine grained image understanding). the number of models keep increasing as number of VLMs increase and they're all wrongly tagged hence this PR. (there's ColPali, ColQwen, ColSmolVLM, DSE models and more now)

here's an tldr explainer on how they're used https://x.com/mervenoyann/status/1831409380040044762?s=46

julien-c · 2025-01-20T11:22:41Z

ok, sounds good

Vaibhavs10

maybe image-document-retrieval so that it's more in sync with the current task vision/ audio related tasks names?

Vaibhavs10 · 2025-01-21T13:51:22Z

packages/tasks/src/pipelines.ts

+	"document-retrieval": {
+		name: "Document Retrieval",
+		modality: "multimodal",
+		color: "yellow",


Suggested change

color: "yellow",

color: "yellow",

hideInDatasets: true,

Don't think there are many datasets related to this task?

there's a whole benchmark called ViDoRe and more datasets outside of this benchmark, and similar datasets exist in multiple languages so I would expect more datasets -- https://huggingface.co/spaces/vidore/vidore-leaderboard

for instance these are what I found for Turkish

https://huggingface.co/datasets/selimc/tr-textbook-ColPali

https://huggingface.co/datasets/muhammetfatihaktug/bilim_teknik_mini_colpali

I'm not in favor of image-document-retrieval, it sounds a bit like images are involved separately and sounds off imo. we have a similar task actually, visual-question-answering which separates from textual QA

add document-retrieval to Hub as a task

79eb092

merveenoyan requested review from SBrandeis, gary149, Wauplin, julien-c, pcuenca and ngxson as code owners January 13, 2025 10:11

julien-c reviewed Jan 17, 2025

View reviewed changes

pcuenca reviewed Jan 17, 2025

View reviewed changes

Vaibhavs10 reviewed Jan 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add document-retrieval to Hub as a task #1097

Add document-retrieval to Hub as a task #1097

merveenoyan commented Jan 13, 2025 •

edited

Loading

merveenoyan commented Jan 13, 2025

merveenoyan commented Jan 15, 2025

julien-c left a comment

julien-c commented Jan 17, 2025

pcuenca Jan 17, 2025

pcuenca commented Jan 17, 2025

pcuenca commented Jan 17, 2025

merveenoyan commented Jan 18, 2025 •

edited

Loading

julien-c commented Jan 20, 2025

Vaibhavs10 left a comment •

edited

Loading

Vaibhavs10 Jan 21, 2025

merveenoyan Jan 22, 2025 •

edited

Loading

merveenoyan Jan 22, 2025 •

edited

Loading

Add document-retrieval to Hub as a task #1097

Are you sure you want to change the base?

Add document-retrieval to Hub as a task #1097

Conversation

merveenoyan commented Jan 13, 2025 • edited Loading

merveenoyan commented Jan 13, 2025

merveenoyan commented Jan 15, 2025

julien-c left a comment

Choose a reason for hiding this comment

julien-c commented Jan 17, 2025

pcuenca Jan 17, 2025

Choose a reason for hiding this comment

pcuenca commented Jan 17, 2025

pcuenca commented Jan 17, 2025

merveenoyan commented Jan 18, 2025 • edited Loading

julien-c commented Jan 20, 2025

Vaibhavs10 left a comment • edited Loading

Choose a reason for hiding this comment

Vaibhavs10 Jan 21, 2025

Choose a reason for hiding this comment

merveenoyan Jan 22, 2025 • edited Loading

Choose a reason for hiding this comment

merveenoyan Jan 22, 2025 • edited Loading

Choose a reason for hiding this comment

merveenoyan commented Jan 13, 2025 •

edited

Loading

merveenoyan commented Jan 18, 2025 •

edited

Loading

Vaibhavs10 left a comment •

edited

Loading

merveenoyan Jan 22, 2025 •

edited

Loading

merveenoyan Jan 22, 2025 •

edited

Loading