-
-
Notifications
You must be signed in to change notification settings - Fork 154
Description
Issue submitter TODO list
- I've looked up my issue in FAQ
- I've searched for an already existing issues here
- I've tried running
main
-labeled docker image and the issue still persists there - I'm running a supported version of the application which is listed here
Describe the bug (actual behavior)
E.g., a topic with 300k rows, requires consumption of 1.5 million messages when applying a text filter (in NEWEST order).
Why?!
For all searches (not just with text filter), there's a large amount of waste per poll after page 1, for example) The reason for this is:
.filter(r -> r.offset() < fromTo.to) |
E.g.:
polledRecords.records(tp).stream()
.filter(r -> r.offset() < fromTo.to) <--- this, returning false a lot
.forEach(result::add);
If you're polling 500 records each time -- which is the Kafka library default -- but only displaying 25 and then moving offsets by say 25 on the next page, then your next poll is going to receive up to 475 messages it doesn't need (i.e., it's already shown them on previous pages), and filter them out.
When it comes to text filtering, the problem is compounded, especially if you cannot fill a page size. But all of these re-retrieved, wasted messages, are counted as consumed (hence 1.5 million for a 300k messages topic).
Is there some reason kafbat is not setting max.poll.records: limit in the consumer config to 25 (or whatever the pagesize / limit is)?
That would reduce a lot of wasted processing and filtering in general.
Expected behavior
Kafbat doesn't consume more messages than the size of the topic.
Your installation details
master: 7ef51b5
Steps to reproduce
Text filter any topic with more than a few hundreds/thousand messages and observe consumed message stats
Screenshots
No response
Logs
No response
Additional context
No response