-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HEAD
request unnecessarily sends query data back to PostgREST
#2849
Comments
It sounds like a possible optimization on HEAD, but I don't think we should do it. RFC 2616 says:
Note "identical to GET". Also, I've used HEAD to debug a GET before and just a slight change in the query wouldn't make it reliable for this.
If you only need the count I think we can add a How does that sound? |
I don't have a very strong opinion on this, just that the current behavior seems a bit unexpected and unintuitive. I think sticking with the current behavior is fine as long as the documentation is clearly updated to make this clear. |
On second thought, I think you're right. It's expected that HEAD is a cheap/fast operation. Will fix this. |
To test this, the simplest way that comes to mind is comparing the
However we can't do that since we select columns for the count and custom headers. So the way should be comparing the generated query: EXPLAIN (FORMAT TEXT, verbose) WITH pgrst_source AS ( SELECT "test"."projects"."id", "test"."projects"."name" FROM "test"."projects" ) SELECT null::bigint AS total_result_set,
pg_catalog.count(_postgrest_t) AS page_total, coalesce(json_agg(_postgrest_t), '[]') AS body, nullif(current_setting('response.headers', true), '') AS response_headers, nullif(current_setting(
'response.status', true), '') AS response_status FROM ( SELECT * FROM pgrst_source ) _postgrest_t;
-- we get the aggregation here, json_agg
Aggregate (cost=28.00..28.02 rows=1 width=112)
Output: NULL::bigint, count(ROW(projects.id, projects.name)), COALESCE(json_agg(ROW(projects.id, projects.name)), '[]'::json), NULLIF(current_setting('response.headers'::text, true), ''::te
xt), NULLIF(current_setting('response.status'::text, true), ''::text)
-> Seq Scan on test.projects (cost=0.00..22.00 rows=1200 width=36)
Output: projects.id, projects.name, projects.client_id
EXPLAIN (FORMAT TEXT, verbose) WITH pgrst_source AS ( SELECT "test"."projects"."id", "test"."projects"."name" FROM "test"."projects" ) SELECT null::bigint AS total_result_set,
pg_catalog.count(_postgrest_t) AS page_total, '' AS body, nullif(current_setting('response.headers', true), '') AS response_headers, nullif(current_setting('response.status', true), '') AS res
ponse_status FROM ( SELECT * FROM pgrst_source ) _postgrest_t;
-- here we get the empty body, ''::text
Aggregate (cost=25.00..25.02 rows=1 width=112)
Output: NULL::bigint, count(ROW(projects.id, projects.name)), ''::text, NULLIF(current_setting('response.headers'::text, true), ''::text), NULLIF(current_setting('response.status'::text, tr
ue), ''::text)
-> Seq Scan on test.projects (cost=0.00..22.00 rows=1200 width=36)
Output: projects.id, projects.name, projects.client_id Edit 1: One problem here is that for getting the EXPLAIN output we also use HTTP requests. So if we tested a HEAD request there will be no body. I'm thinking of adding an internal config that would let us obtain the plan output in a header. Edit 2: Considering #2825, something that seems useful now and later is to have a shortcut for the pg aggregate used. So I believe we can add this: HEAD /users?select=name&order=id HTTP/1.1
Accept: application/vnd.pgrst.plan; options=verbose
## which will respond
200 OK
X-Pg-Aggregate: json_agg(builtin) So when no aggregate is used, we can reply with an Edit 3: The above is considerable work so for now I'm just enforcing this behavior with new types. |
To note, the A request without # A request without limit=0
curl 'localhost:3000/big_table' -H 'Prefer: count=exact' -I
HTTP/1.1 200 OK
...
Content-Range: 0-99999/100000 A request with curl 'localhost:3000/big_table&limit=0' -H 'Prefer: count=exact' -I
HTTP/1.1 206 Partial Content
...
Content-Range: */100000 As seen in the Not promoting to change anything (not sure what you think), since right now |
Why is the page count slow? Shouldn't it be quick compared to the whole table count? Maybe share EXPLAIN analyze output of these requests? |
It's quicker if
(These requests do not have Without `limit=0`
curl 'localhost:3030/big_table' -H 'Prefer: count=exact' -I EXPLAIN (FORMAT TEXT, ANALYZE, VERBOSE)
WITH pgrst_source AS
(SELECT "public"."big_table".* FROM "public"."big_table") ,
pgrst_source_count AS
(SELECT 1 FROM "public"."big_table")
SELECT
(SELECT pg_catalog.count(*) FROM pgrst_source_count) AS total_result_set,
pg_catalog.count(_postgrest_t) AS page_total,
''::text AS body,
nullif(current_setting('response.headers', TRUE), '') AS response_headers,
nullif(current_setting('response.status', TRUE), '') AS response_status,
'' AS response_inserted
FROM
(SELECT * FROM pgrst_source) _postgrest_t Aggregate (cost=4562.01..4562.03 rows=1 width=144) (actual time=33.566..33.566 rows=1 loops=1)
Output: (InitPlan 1).col1, count(ROW(big_table.id, big_table.name, big_table.val, big_table.num)), ''::text, NULLIF(current_setting('response.headers'::text, true), ''::text), NULLIF(current_setting('response.status'::text, true), ''::text), ''::text
InitPlan 1
-> Aggregate (cost=2281.00..2281.01 rows=1 width=8) (actual time=4.900..4.900 rows=1 loops=1)
Output: count(*)
-> Seq Scan on public.big_table big_table_1 (cost=0.00..2031.00 rows=100000 width=0) (actual time=0.009..2.627 rows=100000 loops=1)
Output: big_table_1.id, big_table_1.name, big_table_1.val, big_table_1.num
-> Seq Scan on public.big_table (cost=0.00..2031.00 rows=100000 width=50) (actual time=0.009..7.667 rows=100000 loops=1)
Output: big_table.id, big_table.name, big_table.val, big_table.num
Planning Time: 0.151 ms
Execution Time: 33.598 ms With `limit=0`
curl 'localhost:3030/big_table?limit=0' -H 'Prefer: count=exact' -I EXPLAIN (FORMAT TEXT, ANALYZE, VERBOSE)
WITH pgrst_source AS
(SELECT "public"."big_table".* FROM "public"."big_table" LIMIT 0 OFFSET 0) ,
pgrst_source_count AS
(SELECT 1 FROM "public"."big_table")
SELECT
(SELECT pg_catalog.count(*) FROM pgrst_source_count) AS total_result_set,
pg_catalog.count(_postgrest_t) AS page_total,
''::text AS body,
nullif(current_setting('response.headers', TRUE), '') AS response_headers,
nullif(current_setting('response.status', TRUE), '') AS response_status,
'' AS response_inserted
FROM
(SELECT * FROM pgrst_source) _postgrest_t Aggregate (cost=2281.03..2281.05 rows=1 width=144) (actual time=17.579..17.580 rows=1 loops=1)
Output: (InitPlan 1).col1, count(ROW(big_table.id, big_table.name, big_table.val, big_table.num)), ''::text, NULLIF(current_setting('response.headers'::text, true), ''::text), NULLIF(current_setting('response.status'::text, true), ''::text), ''::text
InitPlan 1
-> Aggregate (cost=2281.00..2281.01 rows=1 width=8) (actual time=17.566..17.566 rows=1 loops=1)
Output: count(*)
-> Seq Scan on public.big_table big_table_1 (cost=0.00..2031.00 rows=100000 width=0) (actual time=0.015..9.224 rows=100000 loops=1)
Output: big_table_1.id, big_table_1.name, big_table_1.val, big_table_1.num
-> Limit (cost=0.00..0.02 rows=1 width=50) (actual time=0.001..0.002 rows=0 loops=1)
Output: big_table.id, big_table.name, big_table.val, big_table.num
-> Seq Scan on public.big_table (cost=0.00..2031.00 rows=100000 width=50) (never executed)
Output: big_table.id, big_table.name, big_table.val, big_table.num
Planning Time: 0.289 ms
Execution Time: 17.657 ms |
The whole table always need to be counted for the "total number of items". We either do so once (with limit=0) or twice (without limit). So it makes sense that the time required doubles. A possible optimization would be to skip the extra counting query for requests without limit. Of course, that only works without max-rows, so maybe it's a pointless edge-case to optimize for. |
Environment
Description of issue
Sending a
HEAD
request actually executes the query and sends the aggregated JSON query results back to PostgREST, even though the response will just be discarded. I think PostgREST should just compute thecount
and send that back alone, to avoid unnecessary CPU and I/O operations for something which is just discarded.I ran into this whilst using a
HEAD
request withPrefer: count=exact
. Adding an explicitlimit=0
to avoid actually sending any data decreased the response timing by nearly half.The text was updated successfully, but these errors were encountered: