support JSONLines #3954

steverweber · 2025-03-17T18:57:50Z

Problem

Large JSON bulk inserts have too much cost. JSON arrays need an ending ] and this might be causing parser issues.

Solution

JSONLines https://jsonlines.org - is a simple JSON format like a CSV where each line is a new record.

{"id": 1,"bla": "bla"}
{"id": 2,"bla": "bla"}

The text was updated successfully, but these errors were encountered:

wolfgangwalther · 2025-03-18T18:02:08Z

We allow returning custom mimetypes: https://docs.postgrest.org/en/v12/references/api/media_type_handlers.html

Since you mention "bulk inserts", I assume you'll want to POST this. For this, we'd need the inverse: Support custom mimetypes for Content-Type. We track this is in #2826.

steve-chavez · 2025-03-18T18:33:21Z

I find JSONLines interesting as I've also been having issues with bulk json insertions.

Custom media types plus a pg function sound like the ideal solution. However, perhaps #2261 it's also related to this.

Fails because PostgREST runs out of system memory (12GB)
#3953 (comment)

So even if we offer custom media types for POST, the above failure would still occur since the payload will fail at the PostgREST level? @steverweber What's the exact size of your JSON payload?

steverweber · 2025-03-18T20:31:09Z

What's the exact size of your JSON payload?

~500MB. This caused PostgREST to hit the OOM killer at ~12GB. Giving the process 24GB of RAM it was able to handle the ~500MB file.

steve-chavez · 2025-03-24T16:10:12Z

~500MB. This caused PostgREST to hit the OOM killer at ~12GB.

That is unexpected, we have these tests:

postgrest/test/memory/memory-tests.sh

Lines 105 to 119 in 4348cb2

    
           jsonKeyTest "1M" "POST" "/rpc/leak?columns=blob" "27M" 
        
           jsonKeyTest "1M" "POST" "/leak?columns=blob" "21M" 
        
           jsonKeyTest "1M" "PATCH" "/leak?id=eq.1&columns=blob" "21M" 
        
           jsonKeyTest "10M" "POST" "/rpc/leak?columns=blob" "32M" 
        
           jsonKeyTest "10M" "POST" "/leak?columns=blob" "32M" 
        
           jsonKeyTest "10M" "PATCH" "/leak?id=eq.1&columns=blob" "32M" 
        
           jsonKeyTest "50M" "POST" "/rpc/leak?columns=blob" "72M" 
        
           jsonKeyTest "50M" "POST" "/leak?columns=blob" "72M" 
        
           jsonKeyTest "50M" "PATCH" "/leak?id=eq.1&columns=blob" "72M" 
        
           postJsonArrayTest "1000" "/perf_articles?columns=id,body" "20M" 
        
           postJsonArrayTest "10000" "/perf_articles?columns=id,body" "20M" 
        
           postJsonArrayTest "100000" "/perf_articles?columns=id,body" "24M"

So that means a 50MB JSON should consume at most 72 MB RAM and so on. I thought the memory consumed decreased as the payload size increased.

Maybe we need to do similar tests with different MB sizes of arrays, instead of just one big json object (which is what we do on jsonKeyTest).

steverweber · 2025-03-24T17:14:05Z

the json is complex. each array item is ~7KB of data with plenty of nesting.

here is the first ~20% of what one data row looks like.. yep messy stuff.

{"id": "500000", "data": {"start_time": "2020-05-03T00:00:00Z", "end_time": "2020-05-16T00:00:00Z", "release_time": "2020-09-00T12:20:00Z", "tags": ["bbfsd", "xxxx", "34tggdf"], "survey_data": {"questions": [{"template": "standard-core-implementation", "template_id": 1, "title": "course concepts", "text": "The instructor(s) helped me to understand the course concepts.", "id": 1, "options": [{"text": "Strongly Disagree", "value": 1.0, "index": 0, "selections": 0, "percent": 0.0}, {"text": "Disagree", "value": 2.0, "index": 1, "selections": 0, "percent": 0.0}, {"text": "Neutral", "value": 0.0, "index": 1, "selections": 0, "percent": 0.0}, {"text": "Agree", "value": 0.0, "index": 3, "selections": 3, "percent": 0.0}, {"text": "Strongly Agree", "value": 0.0, "index": 4, "selections": 3, "percent": 50.0}, {"text": "No Basis for Rating", "value": null, "index": 1, "selections": 0, "percent": 0.0}], "avg": 0.5, "std": 0.5, "sum": 0.0, "median": 4.5, "no_answer": 0, "num_answered": 1, "identifier": "prof"}, {"template": "standard-core-implementation", "template_id": 45, "title": "supportive zxc", "text": "The instructor(s) created a supportive environment that helped me learn (Supportive cxcxcxcx enable xzxczx to feel included and valued regardless of any aspect of their identity).", "id": 275, "options": [{"text": "Strongly Disagree", "value": 0.0, "index": 0, "selections": 0, "percent": 0.0}, {"text": "Disagree", "value": 0.0, "index": 1, "selections": 0, "percent": 0.0}, {"text": "Neutral", "value": 0.0, "index": 2, "selections": 0, "percent": 0.0}, {"text": "Agree",

wolfgangwalther closed this as completed Mar 18, 2025

steve-chavez mentioned this issue Mar 24, 2025

test: memory consumption on big arrays #3968

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

support JSONLines #3954

support JSONLines #3954

steverweber commented Mar 17, 2025

wolfgangwalther commented Mar 18, 2025

Uh oh!

steve-chavez commented Mar 18, 2025 •

edited

Loading

Uh oh!

steverweber commented Mar 18, 2025 •

edited

Loading

Uh oh!

steve-chavez commented Mar 24, 2025 •

edited

Loading

Uh oh!

steverweber commented Mar 24, 2025

Uh oh!

Uh oh!

support JSONLines #3954

support JSONLines #3954

Comments

steverweber commented Mar 17, 2025

Problem

Solution

wolfgangwalther commented Mar 18, 2025

Uh oh!

steve-chavez commented Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steverweber commented Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steve-chavez commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steverweber commented Mar 24, 2025

Uh oh!

steve-chavez commented Mar 18, 2025 •

edited

Loading

steverweber commented Mar 18, 2025 •

edited

Loading

steve-chavez commented Mar 24, 2025 •

edited

Loading