Skip to content

support JSONLines #3954

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
steverweber opened this issue Mar 17, 2025 · 5 comments
Closed

support JSONLines #3954

steverweber opened this issue Mar 17, 2025 · 5 comments

Comments

@steverweber
Copy link

Problem

Large JSON bulk inserts have too much cost. JSON arrays need an ending ] and this might be causing parser issues.

Solution

JSONLines https://jsonlines.org - is a simple JSON format like a CSV where each line is a new record.

{"id": 1,"bla": "bla"}
{"id": 2,"bla": "bla"}
@wolfgangwalther
Copy link
Member

We allow returning custom mimetypes: https://docs.postgrest.org/en/v12/references/api/media_type_handlers.html

Since you mention "bulk inserts", I assume you'll want to POST this. For this, we'd need the inverse: Support custom mimetypes for Content-Type. We track this is in #2826.

@steve-chavez
Copy link
Member

steve-chavez commented Mar 18, 2025

I find JSONLines interesting as I've also been having issues with bulk json insertions.

Custom media types plus a pg function sound like the ideal solution. However, perhaps #2261 it's also related to this.

Fails because PostgREST runs out of system memory (12GB)
#3953 (comment)

So even if we offer custom media types for POST, the above failure would still occur since the payload will fail at the PostgREST level? @steverweber What's the exact size of your JSON payload?

@steverweber
Copy link
Author

steverweber commented Mar 18, 2025

What's the exact size of your JSON payload?

~500MB. This caused PostgREST to hit the OOM killer at ~12GB. Giving the process 24GB of RAM it was able to handle the ~500MB file.

@steve-chavez
Copy link
Member

steve-chavez commented Mar 24, 2025

~500MB. This caused PostgREST to hit the OOM killer at ~12GB.

That is unexpected, we have these tests:

jsonKeyTest "1M" "POST" "/rpc/leak?columns=blob" "27M"
jsonKeyTest "1M" "POST" "/leak?columns=blob" "21M"
jsonKeyTest "1M" "PATCH" "/leak?id=eq.1&columns=blob" "21M"
jsonKeyTest "10M" "POST" "/rpc/leak?columns=blob" "32M"
jsonKeyTest "10M" "POST" "/leak?columns=blob" "32M"
jsonKeyTest "10M" "PATCH" "/leak?id=eq.1&columns=blob" "32M"
jsonKeyTest "50M" "POST" "/rpc/leak?columns=blob" "72M"
jsonKeyTest "50M" "POST" "/leak?columns=blob" "72M"
jsonKeyTest "50M" "PATCH" "/leak?id=eq.1&columns=blob" "72M"
postJsonArrayTest "1000" "/perf_articles?columns=id,body" "20M"
postJsonArrayTest "10000" "/perf_articles?columns=id,body" "20M"
postJsonArrayTest "100000" "/perf_articles?columns=id,body" "24M"

So that means a 50MB JSON should consume at most 72 MB RAM and so on. I thought the memory consumed decreased as the payload size increased.

Maybe we need to do similar tests with different MB sizes of arrays, instead of just one big json object (which is what we do on jsonKeyTest).

@steverweber
Copy link
Author

the json is complex. each array item is ~7KB of data with plenty of nesting.

here is the first ~20% of what one data row looks like.. yep messy stuff.

{"id": "500000", "data": {"start_time": "2020-05-03T00:00:00Z", "end_time": "2020-05-16T00:00:00Z", "release_time": "2020-09-00T12:20:00Z", "tags": ["bbfsd", "xxxx", "34tggdf"], "survey_data": {"questions": [{"template": "standard-core-implementation", "template_id": 1, "title": "course concepts", "text": "The instructor(s) helped me to understand the course concepts.", "id": 1, "options": [{"text": "Strongly Disagree", "value": 1.0, "index": 0, "selections": 0, "percent": 0.0}, {"text": "Disagree", "value": 2.0, "index": 1, "selections": 0, "percent": 0.0}, {"text": "Neutral", "value": 0.0, "index": 1, "selections": 0, "percent": 0.0}, {"text": "Agree", "value": 0.0, "index": 3, "selections": 3, "percent": 0.0}, {"text": "Strongly Agree", "value": 0.0, "index": 4, "selections": 3, "percent": 50.0}, {"text": "No Basis for Rating", "value": null, "index": 1, "selections": 0, "percent": 0.0}], "avg": 0.5, "std": 0.5, "sum": 0.0, "median": 4.5, "no_answer": 0, "num_answered": 1, "identifier": "prof"}, {"template": "standard-core-implementation", "template_id": 45, "title": "supportive zxc", "text": "The instructor(s) created a supportive environment that helped me learn (Supportive cxcxcxcx enable xzxczx to feel included and valued regardless of any aspect of their identity).", "id": 275, "options": [{"text": "Strongly Disagree", "value": 0.0, "index": 0, "selections": 0, "percent": 0.0}, {"text": "Disagree", "value": 0.0, "index": 1, "selections": 0, "percent": 0.0}, {"text": "Neutral", "value": 0.0, "index": 2, "selections": 0, "percent": 0.0}, {"text": "Agree", 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants