Skip to content

Text filtering consumes way to many messages, why not set max.poll.records? #1162

@credmond

Description

@credmond

Issue submitter TODO list

  • I've looked up my issue in FAQ
  • I've searched for an already existing issues here
  • I've tried running main-labeled docker image and the issue still persists there
  • I'm running a supported version of the application which is listed here

Describe the bug (actual behavior)

E.g., a topic with 300k rows, requires consumption of 1.5 million messages when applying a text filter (in NEWEST order).

Why?!

Image

For all searches (not just with text filter), there's a large amount of waste per poll after page 1, for example) The reason for this is:

E.g.:

      polledRecords.records(tp).stream()
            .filter(r -> r.offset() < fromTo.to)  <--- this, returning false a lot
            .forEach(result::add);

If you're polling 500 records each time -- which is the Kafka library default -- but only displaying 25 and then moving offsets by say 25 on the next page, then your next poll is going to receive up to 475 messages it doesn't need (i.e., it's already shown them on previous pages), and filter them out.

When it comes to text filtering, the problem is compounded, especially if you cannot fill a page size. But all of these re-retrieved, wasted messages, are counted as consumed (hence 1.5 million for a 300k messages topic).

Is there some reason kafbat is not setting max.poll.records: limit in the consumer config to 25 (or whatever the pagesize / limit is)?

That would reduce a lot of wasted processing and filtering in general.

Expected behavior

Kafbat doesn't consume more messages than the size of the topic.

Your installation details

master: 7ef51b5

Steps to reproduce

Text filter any topic with more than a few hundreds/thousand messages and observe consumed message stats

Screenshots

No response

Logs

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions