-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Current paging solution > 10k is not RESTful #109
Comments
I have been looking this week at possible solutions for random, deep pagination (past the 10,000th results) and I have not been able to come up with a solution that provides timely responses. I had thought that I could build a cache, indeed, random pagination is possible from a cached response very quickly, the challenge comes if the query response is not cached and the cache has to be built. Some simple analysis for building these cache objects: Number of items in the dataset: Processing time (based on building cache for a dataset with 59,700 items, pages sizes of 1,000 with processing time = 6.23 * number_of_pages + 5.37): The lower times might be acceptable as a one-off, which will then allow parallelised workflows to interact with the whole result set and reduced subsequent response times, but the upper end clearly is not. A cache might be useful, more generally, if your workflows often repeat the same query to the same endpoints. This will require minimal engineering but might still provide a useful improvement. In my research around the subject, it seems the answer to deep pagination is that you don’t. Instead you:
To help me figure out what is the next step, please may you answer the following questions:
|
Spacebel use an extra parameter in the GET request to pass state. This parameter is passed in the |
Can use base64 encoding and decoding to convert the elasticsearch sort key into a string which can be sent in the URL. import json
import base64
# send with response
sort_key = response['sort']
sort_b = json.dumps(sort_key).encode('utf-8)
b64 = base64.encode(sort_b)
# process with request
search_after = request.GET['search_after']
sort_b = base64.decode(search_after)
sort_key = json.loads(sort_b) |
how can we make random access pagination work on top of elasticsearch?
The text was updated successfully, but these errors were encountered: