Skip to content

Rework mime type white list#2198

Open
Tschuppi81 wants to merge 61 commits intomasterfrom
feature/ogc-2738-pentest-arbitrary-file-upload
Open

Rework mime type white list#2198
Tschuppi81 wants to merge 61 commits intomasterfrom
feature/ogc-2738-pentest-arbitrary-file-upload

Conversation

@Tschuppi81
Copy link
Contributor

@Tschuppi81 Tschuppi81 commented Nov 10, 2025

Org: Ensure mime type validation on file upload fields in form code

TYPE: Feature
LINK: ogc-2738

@linear
Copy link

linear bot commented Nov 10, 2025

@codecov
Copy link

codecov bot commented Nov 10, 2025

Codecov Report

❌ Patch coverage is 84.88372% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.23%. Comparing base (6574614) to head (d7db252).
⚠️ Report is 10 commits behind head on master.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/onegov/form/validators.py 58.06% 13 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (6574614) and HEAD (d7db252). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (6574614) HEAD (d7db252)
8 7
Additional details and impacted files
Files with missing lines Coverage Δ
src/onegov/agency/forms/agency.py 97.24% <100.00%> (-0.02%) ⬇️
src/onegov/election_day/forms/election.py 98.13% <100.00%> (-0.01%) ⬇️
src/onegov/election_day/forms/election_compound.py 97.00% <100.00%> (-0.76%) ⬇️
src/onegov/election_day/forms/subscription.py 86.11% <ø> (-13.89%) ⬇️
src/onegov/election_day/forms/upload/common.py 100.00% <ø> (ø)
src/onegov/election_day/forms/upload/election.py 90.90% <ø> (-9.10%) ⬇️
...gov/election_day/forms/upload/election_compound.py 100.00% <ø> (ø)
.../onegov/election_day/forms/upload/party_results.py 100.00% <ø> (ø)
src/onegov/election_day/forms/upload/rest.py 100.00% <ø> (ø)
src/onegov/election_day/forms/upload/vote.py 85.00% <ø> (-15.00%) ⬇️
... and 29 more

... and 150 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6574614...d7db252. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Tschuppi81 Tschuppi81 requested a review from Daverball December 4, 2025 12:03
@Tschuppi81
Copy link
Contributor Author

I saw that files types are handled differently for view_upload_file_by_json in handle_file_upload. Basically all file types are allowed. Shall we keep this?

@Tschuppi81
Copy link
Contributor Author

Should I completely remove type application/octet-stream ? It is mostly used in conjunction with application/zip

@Daverball
Copy link
Member

I saw that files types are handled differently for view_upload_file_by_json in handle_file_upload. Basically all file types are allowed. Shall we keep this?

We can make sure to set supported_content_types on GeneralFileCollection. That's the only one that would allow anything to be uploaded currently through those views.

Copy link
Member

@Daverball Daverball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, but there's a couple of details we should iron out.

@Daverball
Copy link
Member

Daverball commented Dec 4, 2025

Should I completely remove type application/octet-stream ? It is mostly used in conjunction with application/zip

It's probably fine to remove it for now. There may however be the rare false positive for any files that cannot be identified correctly by libmagic. Generally pdfs, zips and any other binary file formats can end up as application/octet-stream, it's a generic catch-all content type for binary data if it couldn't be detected as anything else.

@Tschuppi81 Tschuppi81 requested a review from Daverball January 13, 2026 14:41
Copy link
Member

@Daverball Daverball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're in a pretty good spot now. But we should clean up and refactor things a bit.

If mimetypes is an argument to upload fields, we no longer need to pass around instances of WhitelistedMimeType everywhere, we should use the new parameter and avoid inserting a validator object, instead let's rely on the post_validate method of Field, which we can override in UploadField (we don't need to override it in UploadMultipleField, but we should add an explicit mimetypes parameter, so we can easily access mimetypes from UploadMultipleField.mimetypes in addition to each subfield, this means this parameter should both be assigned to self.mimetypes and passed to the self.upload_field_class call.

if not any(isinstance(validator, WhitelistedMimeType)
for validator in validators
):
validators.append(validator)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would still be a bit cleaner to move the validator into post_validate like in my code suggestion above, instead of trying to modify the given validator chain. If someone passes in a WhitelistValidator we can emit a warning or raise an exception to use the mimetypes parameter instead.

id=id,
default=default,
widget=widget, # type:ignore[arg-type]
validators=[*(validators or ())],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're just looking for the validator in the wrong place in that test. The validator is on each field in the field list, not the field list itself. So I would get rid of this again and simplify the validators, otherwise the validator will run twice for each file. Once you move the validation from a validator in validators to post_validate you no longer will have to check for the presence of that validator anyways, since it's baked into the field itself.

@Daverball
Copy link
Member

I would suggest reverting 01b0af2, where you changed some things into sets, it should no longer be necessary, now that the argument type on the field is correct and we definitely don't want some of these things to appear as mutable.

@Tschuppi81
Copy link
Contributor Author

I believe I am ready for a final review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants