Skip to content

Added support for Odoo Filestore #368

Open
odoo-service wants to merge 6 commits intoGreenmaskIO:mainfrom
odoo-service:main
Open

Added support for Odoo Filestore #368
odoo-service wants to merge 6 commits intoGreenmaskIO:mainfrom
odoo-service:main

Conversation

@odoo-service
Copy link
Copy Markdown

The basic idea behind filestore anonymisation is reduction. Files containing sensitive information, such as invoices, will not be transferred to the testing system. Instead, a dummy invoice will be uploaded and linked using a post-data UPDATE statement.

Example YAML-Configuration:

dump:
  pg_dump_options:
    jobs: 4
    pgzip: true

  filestore:
    enabled: true
    root_path: "/var/lib/odoo/.local/share/Odoo/filestore/odoo_db_name"  # example filestore path
    file_list: "/srv/greenmask/config/filestore-keep.txt" # optional allowlist of relative paths; omit to pack everything
    subdir: "filestore"              # default
    archive_name: "filestore.tar.gz" # default
    metadata_name: "filestore.json"  # default
    use_pgzip: true                  # override dump.pg_dump_options.pgzip if needed
    fail_on_missing: false
    split:
      max_size_bytes: 536870912   # start new tar after 512 MB uncompressed
      max_files: 10000            # or after every 10k files

restore:
  pg_restore_options:

  filestore:
    enabled: true
    target_path: "/var/lib/odoo/.local/share/Odoo/filestore/odoo_db_name" # example filestore path
    subdir: "filestore"
    metadata_name: "filestore.json"
    use_pgzip: true        # set if different from dump
    clean_target: false    # remove existing target before extract
    skip_existing: false   # leave files untouched if already present

Example of a Filestore-Keep-List:

06/062d1f643e075b789fb54b5cc26784995ea7299b
06/06a538b8b393cede62a0c00a4e683e54b1268441
0c/0c78f98ea899bf4e50afdbc01263fcc9fd0a3206
0c/0cca0074e044af0ea96a40ab5048228cc6f00175
18/180ea12a2fb2289fcfcedea56049c17a840a0736
1f/1f3b37adc61b802fb704cbd1202bafeb55b9e91d
20/201e7b500f0c18996a6c81c63ce2acf04658588e
24/2470aa607bdf1f128394703173f150fcc671f746
2b/2bbfba99b1749239c49d753f9b8fca48490d370a
...
...
...
41/4145d1ac5da031fcc04611d5218db61224acd536
52/52430a11dd26a23a0ce513777909d4fcfb9c395a
57/5758ca1a0d1e4fbe0744488c54ed743cc6cc67d2
58/58a54b94f7ed9dcb9f1d2baa8a5e385f3abe0597
63/63d4ba80def55da294e5ecc6a94b9d450860cb65
6a/6aba4dd7477ebcee9d86feaad8e3cf4828d5321c
6c/6c9541e39119d72b2a5707076f90f7f3eab3ea32
76/76987d68d19a0117b73f8da0a3e5495dc016f63a
7d/7d2eb6cad924d15a60cf92bad5f00a6685dd29ac
7d/7dbfe519d334d518b6f8c8e3afcafec5e758112e

The update script run after the database and (reduced) filestore have been restored:

scripts:
  post-data:
    - name: "replace all PDF documents"
      when: "after"
      query: "UPDATE ir_attachment SET store_fname = '6a/6aba4dd7477ebcee9d86feaad8e3cf4828d5321c' WHERE COALESCE(mimetype, '') = 'application/pdf'"

@wwoytenko wwoytenko self-requested a review November 24, 2025 17:28
@wwoytenko wwoytenko self-assigned this Nov 24, 2025
@wwoytenko
Copy link
Copy Markdown
Member

wwoytenko commented Nov 28, 2025

@odoo-service

Thanks for the contribution!

I’m not sure this is the right direction for Greenmask. On the one hand, I understand the core problem, but in this case it feels like we’re mixing heterogeneous sources from the side. From my point of view, this kind of integration is better handled through external automation/scripts rather than being built directly into Greenmask — unless we implement a dedicated engine inside Greenmask that is capable of pulling such data.

I also don’t fully see how we can support this approach in the upcoming platform architecture and in the V1 branch. Before going further, it would really help to understand the broader use cases. If you could describe the business scenarios and explain why this functionality is needed beyond your specific environment, I’d appreciate it.

Copy link
Copy Markdown
Member

@wwoytenko wwoytenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for the author’s clarification.

@odoo-service
Copy link
Copy Markdown
Author

Hello @wwoytenko, thanks for the thorough review!
We ran into this need while rolling out Greenmask for Odoo-based tenants. In those projects the anonymized database alone isn’t sufficient: downstream QA, automated testing, and integration suites also require the filestore (theme assets, product images, attachments, reports). Without the filestore the restored system is functionally incomplete, even though the DB data itself is anonymized.

Why this belongs inside Greenmask rather than in ad‑hoc scripts:

  • Uniform pipeline: storage backends (local/S3) and streaming compression are already abstracted in Greenmask. Leveraging those guarantees the filestore dump inherits retries, encryption, metadata, and lifecycle management without duplicating infra in separate scripts.
  • Operational simplicity: customers can schedule "one button" dumps (DB + filestore) with the same config, credentials, and observability. When we tried an external script, ops teams struggled with ordering, error handling, and retention because it lived outside Greenmask’s orchestration.
  • Safety and compliance: the include-list/query work we added ensures fine-grained control over which files end up in the archive -- a real requirement when tenants co-locate confidential assets. Embedding that logic keeps the policy alongside the dump definition rather than scattering it into bespoke tooling.
  • Forward compatibility: the implementation plugs into the existing storage interface and gating (filestore dump is optional, feature-flagged, and shares the same metadata format). That keeps it compatible with both the current CLI and the upcoming platform/V1 pipeline -- no cross-cutting hooks or new background services needed.
  • Business-wise, every software vendor we support (Odoo, ERP clones, e‑commerce suites) carries a filestore or media bucket next to the DB. Being able to reproduce the entire tenant state with a single Greenmask job is what lets them adopt Greenmask instead of sticking to in-house scripts. Making it first-class in Greenmask avoids dozens of slightly different, poorly maintained side integrations and gives the community one tested approach that already honors Greenmask’s architecture.

Happy to walk through more concrete workflows or anonymization requirements if that helps -- would really like to see this land upstream so we don’t fragment tooling in the ecosystem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants