-
Notifications
You must be signed in to change notification settings - Fork 317
Malware scanning #1267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Malware scanning #1267
Conversation
✅ Deploy Preview for activist-org canceled.
|
Thank you for the pull request! ❤️The activist team will do our best to address your contribution as soon as we can. If you're not already a member of our public Matrix community, please consider joining! We'd suggest using Element as your Matrix client, and definitely join the |
Maintainer ChecklistThe following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :)
|
Hello @Joshua-Tihabwangye ! Thanks for all of the work getting this together; I'm looking forward to taking a look at the code! Before we go further, there are a few things we should do, to get the PR in good shape. Could you please:
I noticed that there are some conflicts that need to be resolved. If you can fix these, please go ahead to do so. If not, I'll take a look and see what we can do to fix them. Overall, you built some code and created a PR! That is fantastic, and we really appreciate your efforts here. Our next step is to fix the items above, then we can get to the fun part of digging through the code! |
Am sorry for not checking them, i got interrupted by other things and forgot but let me work on them right now |
I need help, from where i have reached. If i run docker in the logs it will show that the database has been scanned and there is no problem in it, but when i access the admin dashboard and i upload the image it cant show i the filescan container any logs of showing that the image has been scanned or nay file has been scanned. so am stack there. am here that we work together as i learn from you. |
@@ -1,26 +1,28 @@ | |||
FRONTEND_PORT="3000" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change in the formatting of the values (the "" ) is not necessary.
Usually, the only changes to a .env file should be adding / removing variables. It is best to leave the formatting as-is.
@Joshua-Tihabwangye , could you please revert this file? Thanks!
@@ -5,7 +5,10 @@ WORKDIR /app | |||
COPY requirements.txt . | |||
RUN pip install -r requirements.txt | |||
|
|||
COPY . /app/backend/. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add the clamav parts to the requirements. txt file? If so, we should do that.
uploaded_file = data["file_object"] | ||
file_content = uploaded_file.read() | ||
logger.info("About to scan uploaded file in memory.") # Add this log | ||
scan_result = scan_file_in_memory(file_content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good start, but we should use the file scan like a separate service, instead of in-line like we are doing here.
There are some questions I still have about this. @andrewtavis , your thoughts about how to best architect this? I was thinking in terms of treating the filescan like a microservice, separate from the rest of the backend code. Your comments on this are appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd agree that having it as it's own service would make the most sense, @mattburnett-repo 😊 Loading it into serializers sounds good to me :)
@@ -5,7 +5,10 @@ WORKDIR /app | |||
COPY requirements.txt . | |||
RUN pip install -r requirements.txt | |||
|
|||
COPY . /app/backend/. | |||
RUN apt-get update && apt-get install -y clamav clamav-daemon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During Docker build, getting this error:
RUN apt-get update && apt-get install -y clamav clamav-daemon
/bin/sh: apt-get: not found
See comment above, about including clamav modules in the requirements.txt file.
@@ -190,3 +190,6 @@ wheel==0.45.1 | |||
# The following packages are considered to be unsafe in a requirements file: | |||
# pip | |||
# setuptools | |||
pyclamd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok. I see here that the clamav modules are included in requirements text file.
Just wondering why we added 'factory-boy' here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like these were added directly into this file as well, @Joshua-Tihabwangye :) Totally fine that this is a bit confusing. Was for me when we started doing it 😊 See the directions on how to update dependencies at the top of backend/requirements.in, and please let us know if you have any questions! This uses pip-tools :)
@@ -16,3 +16,5 @@ gunicorn | |||
pillow | |||
psycopg2-binary | |||
python-dotenv | |||
pyclamd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok. I see here that the clamav modules are included in requirements text file.
@@ -94,3 +94,5 @@ urllib3==2.3.0 | |||
# via | |||
# requests | |||
# types-requests | |||
pyclamd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not understanding why this is duplicated here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking for clarity: Won't we need the file scan for production, so it needs to be in both requirement files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct. Thanks for pointing that out. #derp
@@ -0,0 +1,52 @@ | |||
import pyclamd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good start! Is it possible to move this code out of the main backend codebase, and instead treat it like a microservice? Waiting to hear from @andrewtavis to get his thoughts about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with whatever the architecture decision is, but I guess we're not talking a microservice in the true sense of it, but rather another sub-directory or app within the backend like communities, events, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CCing also @to-sta here for this part of the discussion of where the malware scanning should live and how it should be ran 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I was thinking about it as something separate from the backend code, and not a sub-dir as you mentioned.
The idea behind treating it as a microservice is simply to isolate the scan functionality from the rest of the backend codebase. The filescan stuff isn't technically part of the overall core functionality for the backend. Also, in the event that there is a directed attack against the filescanner, isolating the filescanner from the rest of the backend code reduces the attack surface.
These are just initial thoughts. I can see advantages to either solution, although I tend toward treating it as something external to the core backend codebase. However, my opinion on this isn't strong. I'm interested in @to-sta 's thoughts on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CCing also @to-sta here for this part of the discussion of where the malware scanning should live and how it should be ran 😊
I decided to integrate it in the serializers that handle the file uploads, in order to capture any file that will be saved to the database
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good start! Is it possible to move this code out of the main backend codebase, and instead treat it like a microservice? Waiting to hear from @andrewtavis to get his thoughts about this.
I don't know how we can do it outside the serializers so I think there we need help one another. @andrewtavis some idea about the architecture we should use besides using it in the API endpoint of uploading the files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first thought would be to use a seperate background worker that checks the file after it's uploaded:
- Upload the file to a temporary quarantine folder.
- The worker scans the file for malicious content.
- Based on the scan results, the file is either released or removed.
--events-per-org 1 \ | ||
--resources-per-entity 1 \ | ||
--faq-entries-per-entity 1 && | ||
python manage.py populate_db --users 10 --orgs-per-user 1 --groups-per-org 1 --events-per-org 1 --resources-per-entity 1 --faq-entries-per-entity 1 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this was changed, to put all of the populate_db options on one line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this would be nice to revert :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Joshua-Tihabwangye , could you please revert these changes? Thanks!
|
||
# volumes: | ||
# clamav_db: | ||
filescan: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very, very, very good start!
If we are pulling the clamav images here, do we need to also add clamav modules to requirement files?
Hi @Joshua-Tihabwangye ! You will certainly get help from us! We try our best to be helpful, and to promote a positive environment here! I've made some initial comments about the code. DON'T STRESS!!! :) You're doing fine! We will go through all of the issues and fix them, one-by-one. Right now, I have a question for @andrewtavis about how to architect this solution. It's actually a very basic question about where we want the code to go, and how we want to access it. You've taken on a big task, and have made good progress towards solving it. We need to hear from Andrew, and then we can take next steps. Thanks for all of your work so far! |
I duplicated it after getting problems with including the clamav modules
into the requirements.txt file so, when I added in one file I thought of
duplicating it to cover all the requirements.txt files
…On Sun, May 18, 2025, 4:47 PM Matt Burnett ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In backend/utils/malware_scan.py
<#1267 (comment)>
:
> @@ -0,0 +1,52 @@
+import pyclamd
This is a good start! Is it possible to move this code out of the main
backend codebase, and instead treat it like a microservice? Waiting to hear
from @andrewtavis <https://github.com/andrewtavis> to get his thoughts
about this.
—
Reply to this email directly, view it on GitHub
<#1267 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BCHYFTDO6TLQ6XPH34Z6FYL27CFPXAVCNFSM6AAAAAB5JE2SZCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDQNBYHE4TENZYGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
We can use the small micro service to implement the malware scanning functionality. I have made some research on both options of integrating the functionality into the API end points especially in serializers as it is right now and then I have also the one of creating a small microservice and I've found out that it is a good idea to make the functionality as a small microservice. |
No stress, @Joshua-Tihabwangye! I can run the scripts to generate the dependencies in the final review here. We're good for now 😊
Let's do it then! 🚀 I think this will be a very nice first microservice to add into the codebase :) |
I have made some research on how the microservice would look like inside our main repo or project. In the microservice we shall have the following directories which will have different files.
Then we need to have an API to help is integrate it or connect it to the main project. The API wil have the; |
@mattburnett-repo , @to-sta @andrewtavis I need your idea and help on what my findings are, is there other things needed to fit in the structure or architecture of the microservice? |
This is a good overview and a good first step towards building the filescan functionality! I would suggest that we do the following:
We don't need to focus on the actual filescanning right now. What is important at this step is to create the 'home' for the filescan functionality, meaning the folders/http service/Dockerfile/tests. Once we have this 'home' created and working well, we can add the real filescan code. @andrewtavis @to-sta , as always we're interested in any thoughts or comments you might have! |
I think the general plan sounds really good, @mattburnett-repo :) Thanks for outlining it in such detail 😊 |
@Joshua-Tihabwangye , I've outlined some steps, above, to take in order to move forward with this task. Please take a look; they should be helpful. |
Thank you @mattburnett-repo for the highlights given to us. It will really help me and the development team in this task |
Contributor checklist
Description
I have made the malware scanning functionality where each file is to be scanned before being stored in the database. I used a python library which is pyClamd. I created the malware_scan.py file inside the utils folder, i also had to set the pyclamd and its clamav deamon in settings.py, then other files that were edited are, serializers, docker file of backend, the docker compose file and the requirements file, and the requirements.txt.
Related issue