Skip to content

implement sync_diffs_filediffs #47

@Cattes

Description

@Cattes

To reduce the memory required for writing large dataframes, a new mode sync_filediffs is being implemented in the mysql.Connection class.

The approach is to do as much as possible out of memory.
On receiving a dataframe, the df is written to disk.

The db table which should be updated is also downloaded chunkwise to disk.

Then the filediffs package is used to find the differences between the two dataframes and save them to disk.

After that the update part and the delete part are read back into memory and the database is updated.

A first version is already implemented on the sync_filediffs branch.

Still open Issues are

  1. The verbose logging has to be improved so it integrates better into the codebase.
  2. The temporary file management has to be improved.
  3. The query method's output format. Changing it seems to be a breaking change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions