- 
                Notifications
    You must be signed in to change notification settings 
- Fork 19
Open
Description
This issue replaces #30.
The issue is that user-inputted data that includes these newline characters:
- \u2028
- \u2029
- \x85
causes the dump to think that the line is actually split into more than one. The result is that the dump raises:
ValueError("Mismatch between column names and values.")
To solve it I added the following to the Python processes:
    process = subprocess.Popen(
        (
            "pg_dump",
            # Force output to be UTF-8 encoded.
            "--encoding=utf-8",
            # Quote all table and column names, just in case.
            "--quote-all-identifiers",
            # Luckily `pg_dump` supports DB URLs, so we can just pass it the
            # URL as argument to the command.
            "--dbname",
            url.geturl().replace('postgis://', 'postgresql://'),
         ) + tuple(extra_params),
        stdout=subprocess.PIPE,
    )
    # Remove newline characters.
    process = subprocess.Popen(
        "sed $'s/\u2028/ /g'",
        shell=True,
        stdin=process.stdout,
        stdout=subprocess.PIPE)
    process = subprocess.Popen(
        "sed $'s/\u2029/ /g'",
        shell=True,
        stdin=process.stdout,
        stdout=subprocess.PIPE)
    process = subprocess.Popen(
        "sed $'s/\x85/ /g'",
        shell=True,
        stdin=process.stdout,
        stdout=subprocess.PIPE)
I'd be happy to add as a PR if it's helpful, or is there a better way to handle the issue?
Metadata
Metadata
Assignees
Labels
No labels