Description
Another problem with providing a lot of power with JSON:API is that it comes with a lot of responsibility on client side. A client can easily self DOS server with queries which it thinks are innocuous. This is what admin search does now.
First of all, it sends search filter events on every keystroke, an issue for which is already opened #3532
But that is not the main issue. Let's say I search for fossasia. This is the query it'll send to the server:
See any problem? It requests the tickets, the sessions, the speakers, organizers,coorganizers,track-organizers,registrars,moderators of the event with the event itself. Just to render this:
Why anybody thought of this as a good idea is beyond me. This is already so bad that I was shocked. But OK, at least the page size is set to 10. How much data it could possibly fetch? Right?
This is the response:
700 nodes in included! 700. The response is 400 KB and the request takes 30 seconds to complete. 30 seconds It's a relief there are not a lot of events with a lot of speakers and sessions, or you could cook and eat instant ramen before the request completes.
But it's just one request, right?
Because of this request, every other request, requests of even 70 bytes are slowed down because the server is pummeled.
So what is happening? Apparently, JSON:API spec doesn't talk about restricting included items in the payload. Or the library used in server is poorly written, but basically, the results are limited to 10, but included items contain every single ticket, session and speaker in all the filtered 10 events!!!
Every . single . one
Just imagine the poor database. No wonder the server crashes so frequently.
Hundreds of speakers and hundreds of sessions are fetched and serialized and sent to frontend to render the above row for an event. And the fact that frontend does that on every keystroke doesn't help anything, just makes it countless times worst.
The web has no notion of canceling web requests, what is sent is sent. If you write FOSSASIA, now 8 requests are sent for each appending letter and the database and server are busy fetching 700 items for each request when it won't even be used. So, 7 out of 8 queries are useless.
That's why people hate ORMs, they allow such easy access to the database, that it becomes very easy to overfetch things in an incredibly inefficient fashion. Ember Data is an ORM, but a thousand times worse, because when it makes things easy to fetch(shoot yourself in the foot), the impact is not just of overfetching of data from DB (300~500 ms), it is going over the network and pummelling the DB, serializing thousands of items and then rendering a row (30 seconds), and crashing its own server.
I don't even know how to prevent it. This is a high priority issue for frontend to not fetch anything other than the event for admin search, but JSON:API is essentially a time bomb in the server, anyone can use the above query to take down the server and no amount of rate limiting can help when a single query can become a DOS vector.
The only thing I feel can fix this is to patch the flask-json-restapi library to not include more than 10 items, but it'll definitely break a lot of stuff, on both the site and apps.
Simply speaking, don't use include for one to many relationships, use specific endpoints to use paged data. So, if you want to get speakers, don't use event/1?include=speakers.
Use /event/1/speakers which is automatically paginated.
If the queries become more problematic, we'll have to disable included feature for one to many relationships. Because we don't want to send 1000 speakers if there are 1000 speakers, and there is no way to paginate on included data AFAIK. If I am wrong on any point, please let me know.