`HEAD` request unnecessarily sends query data back to PostgREST #2849

bnjmnt4n · 2023-07-05T05:07:36Z

Environment

PostgreSQL version: 15.2
PostgREST version: 10.2
Operating system: Ubuntu 20.04, WSL

Description of issue

Sending a HEAD request actually executes the query and sends the aggregated JSON query results back to PostgREST, even though the response will just be discarded. I think PostgREST should just compute the count and send that back alone, to avoid unnecessary CPU and I/O operations for something which is just discarded.

I ran into this whilst using a HEAD request with Prefer: count=exact. Adding an explicit limit=0 to avoid actually sending any data decreased the response timing by nearly half.

The text was updated successfully, but these errors were encountered:

steve-chavez · 2023-07-05T18:09:13Z

It sounds like a possible optimization on HEAD, but I don't think we should do it.

RFC 2616 says:

The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response. The metainformation contained in the HTTP headers in response to a HEAD request SHOULD be identical to the information sent in response to a GET request. This method can be used for obtaining metainformation about the entity implied by the request without transferring the entity-body itself. This method is often used for testing hypertext links for validity, accessibility, and recent modification.

Note "identical to GET". Also, I've used HEAD to debug a GET before and just a slight change in the query wouldn't make it reliable for this.

I ran into this whilst using a HEAD request with Prefer: count=exact. Adding an explicit limit=0 to avoid actually sending any data decreased the response timing by nearly half.

If you only need the count I think we can add a Prefer: return=headers-only (Insert has this) for GET.

How does that sound?

bnjmnt4n · 2023-07-06T06:46:27Z

I don't have a very strong opinion on this, just that the current behavior seems a bit unexpected and unintuitive. I think sticking with the current behavior is fine as long as the documentation is clearly updated to make this clear.

steve-chavez · 2023-07-07T18:21:27Z

just that the current behavior seems a bit unexpected and unintuitive

On second thought, I think you're right. It's expected that HEAD is a cheap/fast operation. Will fix this.

steve-chavez · 2023-07-07T18:49:53Z

To test this, the simplest way that comes to mind is comparing the width of the query:

-- we get 40 when selecting columns
postgres=# explain select * from projects ;
                         QUERY PLAN
-------------------------------------------------------------
 Seq Scan on projects  (cost=0.00..22.00 rows=1200 width=40)

-- we get 0 when not selecting columns
postgres=# explain select  from projects ;
                         QUERY PLAN
------------------------------------------------------------
 Seq Scan on projects  (cost=0.00..22.00 rows=1200 width=0)
(1 row)

However we can't do that since we select columns for the count and custom headers.

So the way should be comparing the generated query:

EXPLAIN (FORMAT TEXT, verbose) WITH pgrst_source AS ( SELECT "test"."projects"."id", "test"."projects"."name" FROM "test"."projects"     )  SELECT null::bigint AS total_result_set,
pg_catalog.count(_postgrest_t) AS page_total, coalesce(json_agg(_postgrest_t), '[]') AS body, nullif(current_setting('response.headers', true), '') AS response_headers, nullif(current_setting(
'response.status', true), '') AS response_status FROM ( SELECT * FROM pgrst_source ) _postgrest_t;

-- we get the aggregation here, json_agg
 Aggregate  (cost=28.00..28.02 rows=1 width=112)
   Output: NULL::bigint, count(ROW(projects.id, projects.name)), COALESCE(json_agg(ROW(projects.id, projects.name)), '[]'::json), NULLIF(current_setting('response.headers'::text, true), ''::te
xt), NULLIF(current_setting('response.status'::text, true), ''::text)
   ->  Seq Scan on test.projects  (cost=0.00..22.00 rows=1200 width=36)
         Output: projects.id, projects.name, projects.client_id

EXPLAIN (FORMAT TEXT, verbose) WITH pgrst_source AS ( SELECT "test"."projects"."id", "test"."projects"."name" FROM "test"."projects"     )  SELECT null::bigint AS total_result_set,
pg_catalog.count(_postgrest_t) AS page_total, '' AS body, nullif(current_setting('response.headers', true), '') AS response_headers, nullif(current_setting('response.status', true), '') AS res
ponse_status FROM ( SELECT * FROM pgrst_source ) _postgrest_t;

-- here we get the empty body, ''::text
 Aggregate  (cost=25.00..25.02 rows=1 width=112)
   Output: NULL::bigint, count(ROW(projects.id, projects.name)), ''::text, NULLIF(current_setting('response.headers'::text, true), ''::text), NULLIF(current_setting('response.status'::text, tr
ue), ''::text)
   ->  Seq Scan on test.projects  (cost=0.00..22.00 rows=1200 width=36)
         Output: projects.id, projects.name, projects.client_id

Edit 1: One problem here is that for getting the EXPLAIN output we also use HTTP requests. So if we tested a HEAD request there will be no body.

I'm thinking of adding an internal config that would let us obtain the plan output in a header.

Edit 2: Considering #2825, something that seems useful now and later is to have a shortcut for the pg aggregate used. So I believe we can add this:

HEAD /users?select=name&order=id HTTP/1.1
Accept: application/vnd.pgrst.plan; options=verbose

## which will respond
200 OK
X-Pg-Aggregate: json_agg(builtin)

So when no aggregate is used, we can reply with an X-Pgrst-Aggregate: none. Only when the verbose option is provided, to match EXPLAIN behavior.

Edit 3: The above is considerable work so for now I'm just enforcing this behavior with new types.

laurenceisla · 2025-03-18T22:08:29Z

To note, the limit=0 would still be faster in HEAD requests with Prefer: count=exact. This is because even after the JSON aggregate was replaced with ''::text (reducing the HEAD execution time), the pagination count is still executed. For example:

A request without limit=0:

# A request without limit=0
curl 'localhost:3000/big_table' -H 'Prefer: count=exact' -I


HTTP/1.1 200 OK
...
Content-Range: 0-99999/100000

A request with limit=0:

curl 'localhost:3000/big_table&limit=0' -H 'Prefer: count=exact' -I

HTTP/1.1 206 Partial Content
...
Content-Range: */100000

As seen in the Content-Range they both count the total and return 100000, but it differs in the "rows per page" part. With limit=0 it's faster because it counts 0 rows, but without limit=0 it counts the whole response (takes almost twice the time). This is more noticeable when db-max-rows is not set or when it's big enough.

Not promoting to change anything (not sure what you think), since right now HEAD behaves as close to a GET request (sans processing the result with a JSON agg). Maybe we could add this difference to the docs somewhere?

steve-chavez · 2025-03-18T23:43:27Z

but without limit=0 it counts the whole response (takes almost twice the time).
Content-Range: */100000

Why is the page count slow? Shouldn't it be quick compared to the whole table count?

Maybe share EXPLAIN analyze output of these requests?

laurenceisla · 2025-03-19T09:33:28Z

Why is the page count slow? Shouldn't it be quick compared to the whole table count?

It's quicker if db-max-rows is set to a lower value. If it's not set or if it's big enough then it will take more time.

Maybe share EXPLAIN analyze output of these requests?

(These requests do not have db-max-rows set)

Without `limit=0`

curl 'localhost:3030/big_table' -H 'Prefer: count=exact' -I

EXPLAIN (FORMAT TEXT, ANALYZE, VERBOSE)
WITH pgrst_source AS
  (SELECT "public"."big_table".* FROM "public"."big_table") ,
 pgrst_source_count AS
  (SELECT 1 FROM "public"."big_table")
SELECT
  (SELECT pg_catalog.count(*) FROM pgrst_source_count) AS total_result_set,
  pg_catalog.count(_postgrest_t) AS page_total,
  ''::text AS body,
  nullif(current_setting('response.headers', TRUE), '') AS response_headers,
  nullif(current_setting('response.status', TRUE), '') AS response_status,
  '' AS response_inserted
FROM
  (SELECT * FROM pgrst_source) _postgrest_t

Aggregate  (cost=4562.01..4562.03 rows=1 width=144) (actual time=33.566..33.566 rows=1 loops=1)
  Output: (InitPlan 1).col1, count(ROW(big_table.id, big_table.name, big_table.val, big_table.num)), ''::text, NULLIF(current_setting('response.headers'::text, true), ''::text), NULLIF(current_setting('response.status'::text, true), ''::text), ''::text
  InitPlan 1
    ->  Aggregate  (cost=2281.00..2281.01 rows=1 width=8) (actual time=4.900..4.900 rows=1 loops=1)
          Output: count(*)
          ->  Seq Scan on public.big_table big_table_1  (cost=0.00..2031.00 rows=100000 width=0) (actual time=0.009..2.627 rows=100000 loops=1)
                Output: big_table_1.id, big_table_1.name, big_table_1.val, big_table_1.num
  ->  Seq Scan on public.big_table  (cost=0.00..2031.00 rows=100000 width=50) (actual time=0.009..7.667 rows=100000 loops=1)
        Output: big_table.id, big_table.name, big_table.val, big_table.num
Planning Time: 0.151 ms
Execution Time: 33.598 ms

With `limit=0`

curl 'localhost:3030/big_table?limit=0' -H 'Prefer: count=exact' -I

EXPLAIN (FORMAT TEXT, ANALYZE, VERBOSE)
WITH pgrst_source AS
  (SELECT "public"."big_table".* FROM "public"."big_table" LIMIT 0 OFFSET 0) ,
 pgrst_source_count AS
  (SELECT 1 FROM "public"."big_table")
SELECT
  (SELECT pg_catalog.count(*) FROM pgrst_source_count) AS total_result_set,
  pg_catalog.count(_postgrest_t) AS page_total,
  ''::text AS body,
  nullif(current_setting('response.headers', TRUE), '') AS response_headers,
  nullif(current_setting('response.status', TRUE), '') AS response_status,
  '' AS response_inserted
FROM
  (SELECT * FROM pgrst_source) _postgrest_t

Aggregate  (cost=2281.03..2281.05 rows=1 width=144) (actual time=17.579..17.580 rows=1 loops=1)
  Output: (InitPlan 1).col1, count(ROW(big_table.id, big_table.name, big_table.val, big_table.num)), ''::text, NULLIF(current_setting('response.headers'::text, true), ''::text), NULLIF(current_setting('response.status'::text, true), ''::text), ''::text
  InitPlan 1
    ->  Aggregate  (cost=2281.00..2281.01 rows=1 width=8) (actual time=17.566..17.566 rows=1 loops=1)
          Output: count(*)
          ->  Seq Scan on public.big_table big_table_1  (cost=0.00..2031.00 rows=100000 width=0) (actual time=0.015..9.224 rows=100000 loops=1)
                Output: big_table_1.id, big_table_1.name, big_table_1.val, big_table_1.num
  ->  Limit  (cost=0.00..0.02 rows=1 width=50) (actual time=0.001..0.002 rows=0 loops=1)
        Output: big_table.id, big_table.name, big_table.val, big_table.num
        ->  Seq Scan on public.big_table  (cost=0.00..2031.00 rows=100000 width=50) (never executed)
              Output: big_table.id, big_table.name, big_table.val, big_table.num
Planning Time: 0.289 ms
Execution Time: 17.657 ms

wolfgangwalther · 2025-03-19T17:24:47Z

Why is the page count slow? Shouldn't it be quick compared to the whole table count?

The whole table always need to be counted for the "total number of items". We either do so once (with limit=0) or twice (without limit). So it makes sense that the time required doubles.

A possible optimization would be to skip the extra counting query for requests without limit. Of course, that only works without max-rows, so maybe it's a pointless edge-case to optimize for.

steve-chavez added the idea Needs of discussion to become an enhancement, not ready for implementation label Jul 5, 2023

steve-chavez added bug perf and removed idea Needs of discussion to become an enhancement, not ready for implementation labels Jul 7, 2023

steve-chavez mentioned this issue Jul 8, 2023

fix: HEAD unnecessarily executing aggregates #2857

Merged

steve-chavez closed this as completed in #2857 Jul 10, 2023

steve-chavez mentioned this issue Mar 19, 2025

Prefer: count=exact optimization on HEAD request #3957

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`HEAD` request unnecessarily sends query data back to PostgREST #2849

`HEAD` request unnecessarily sends query data back to PostgREST #2849

bnjmnt4n commented Jul 5, 2023

steve-chavez commented Jul 5, 2023

bnjmnt4n commented Jul 6, 2023

steve-chavez commented Jul 7, 2023

steve-chavez commented Jul 7, 2023 •

edited

Loading

laurenceisla commented Mar 18, 2025 •

edited

Loading

steve-chavez commented Mar 18, 2025 •

edited

Loading

laurenceisla commented Mar 19, 2025 •

edited

Loading

wolfgangwalther commented Mar 19, 2025

HEAD request unnecessarily sends query data back to PostgREST #2849

HEAD request unnecessarily sends query data back to PostgREST #2849

Comments

bnjmnt4n commented Jul 5, 2023

Environment

Description of issue

steve-chavez commented Jul 5, 2023

bnjmnt4n commented Jul 6, 2023

steve-chavez commented Jul 7, 2023

steve-chavez commented Jul 7, 2023 • edited Loading

laurenceisla commented Mar 18, 2025 • edited Loading

steve-chavez commented Mar 18, 2025 • edited Loading

laurenceisla commented Mar 19, 2025 • edited Loading

wolfgangwalther commented Mar 19, 2025

`HEAD` request unnecessarily sends query data back to PostgREST #2849

`HEAD` request unnecessarily sends query data back to PostgREST #2849

steve-chavez commented Jul 7, 2023 •

edited

Loading

laurenceisla commented Mar 18, 2025 •

edited

Loading

steve-chavez commented Mar 18, 2025 •

edited

Loading

laurenceisla commented Mar 19, 2025 •

edited

Loading