Backend implementation for exporting all data #652

ElanaC · 2025-09-06T09:12:40Z

Backend implementation for a frontend "Export" button.
Makes a zip file of a patients csv, steps csv, and a patients directory that includes step files that aren't able to be stored in a csv (also makes the folers and csvs separately outside of zip)

You can choose if you want hidden and deleted data
doppler run -- pnpm tsx scripts/dataextraction.ts --include-deleted --include-hidden

… projects main Step1 is completed ONLY, the other two backend steps are in progress

changeset-bot · 2025-09-06T09:12:43Z

⚠️ No Changeset found

Latest commit: 34fc37d

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

mattwalo32

Nice job, super impressive work with this

apps/backend/src/test.ts

mattwalo32 · 2025-09-08T13:04:26Z

apps/backend/src/index.ts

+import { main } from './test';
+
+initDB(main)
+


Could you remove this? Otherwise we'll be running the test every time the server starts

Did you forget to push this? It still seems to be there.

now removed

mattwalo32 · 2025-09-08T13:06:25Z

apps/backend/scripts/dataextraction.ts

+import { downloadFile } from '../src/utils/aws/awsS3Helpers';
+import archiver from 'archiver';
+
+mongoose.set('strictQuery', false);


Is this needed?

the two imports are important, first for step 3 media files and second for everything to be zipped properly. Line 23 isn't as important, just gets rid of a deprecation warning that pops up in the terminal telling me to add it in.

Could you remove the strictQuery line for now?

You're right that we should probably add that in eventually due to the deprecation, but I'd like to do some reading up on the implications of this. Not sure if it will cause breaking change for us.

apps/backend/package.json

apps/backend/scripts/dataextraction.ts

mattwalo32 · 2025-09-09T00:49:47Z

apps/backend/scripts/dataextraction.ts

+      for (const field of stepDef.fields) {
+        // Check if field should be included based on options
+        if (!includeHidden && field.isHidden) continue;
+


Need to check isDeleted as well

Looks like this is still missing

apps/backend/scripts/dataextraction.ts

mattwalo32 · 2025-09-09T00:55:24Z

apps/backend/scripts/dataextraction.ts

+                    if (!fs.existsSync(patientDir)) fs.mkdirSync(patientDir, { recursive: true });
+                    if (!fs.existsSync(stepDir)) fs.mkdirSync(stepDir, { recursive: true });
+                    hasDownloadedFiles = true;
+                  }


I don't think it's necessary to do this logic with checking hasDownloadedFiles and checking fs.existsSync. You should be able to just call fs.mkdirSync every single time before downloading the file. If it already exists, since recursive: true, it won't throw an error.

In my opinion this will make it a bit more readable by reducing nesting and long-lived variables in the loop.

This is unchanged, forgot to push maybe?

apps/backend/scripts/dataextraction.ts

mattwalo32 · 2025-09-09T01:05:44Z

Be sure to run pnpm run lint from the root of the repo

…test

…navbar)

mattwalo32 · 2025-09-13T14:06:57Z

apps/backend/src/index.ts

+import { main } from './test';
+
+initDB(main)
+


Did you forget to push this? It still seems to be there.

mattwalo32 · 2025-09-13T14:08:27Z

apps/backend/scripts/dataextraction.ts

+import { downloadFile } from '../src/utils/aws/awsS3Helpers';
+import archiver from 'archiver';
+
+mongoose.set('strictQuery', false);


Could you remove the strictQuery line for now?

You're right that we should probably add that in eventually due to the deprecation, but I'd like to do some reading up on the implications of this. Not sure if it will cause breaking change for us.

mattwalo32 · 2025-09-13T14:12:02Z

apps/backend/scripts/dataextraction.ts

+      for (const field of stepDef.fields) {
+        // Check if field should be included based on options
+        if (!includeHidden && field.isHidden) continue;
+


Looks like this is still missing

mattwalo32 · 2025-09-13T14:12:42Z

apps/backend/scripts/dataextraction.ts

+                    if (!fs.existsSync(patientDir)) fs.mkdirSync(patientDir, { recursive: true });
+                    if (!fs.existsSync(stepDir)) fs.mkdirSync(stepDir, { recursive: true });
+                    hasDownloadedFiles = true;
+                  }


This is unchanged, forgot to push maybe?

mattwalo32 · 2025-09-13T14:25:03Z

apps/backend/src/routes/api/export.ts

+export const router = express.Router();
+
+router.get(
+    '/download',


Right now this route is unauthenticated! This means that if this were deployed to production, anyone in the world would be able to download all the patient data from this endpoint.

You can add auth around this by using the requireAdmin middleware. We already have a few routes that are admin only. See how the requireAdmin middleware is used throughout the code base, you should just need to add a line right below /download

requireAdmin added

mattwalo32 · 2025-09-13T14:40:48Z

apps/backend/src/routes/api/export.ts

+        // Validate ZIP file exists and has content
+        await fs.promises.access(zipPath).catch(() => {
+            return res.status(500).send('ZIP file not found');
+        });


Some overarching comments on the implementation here:

Regarding the use of fs.promises, I would generally advise against that. All of these functions have non-promise synchronous versions. For example, you could use fs.accessSync insteaad of fs.promises.access. I dislike promsies for the following reasons:

Less readable in my opinion. Introduces more nesting and indirection

Unexpected behavior with control flow. For instance, you have return res.status(500) inside of the promise catch. block. The issue with this is that the return is scoped to the error handler, not the route handler. So it will actually continue executing the route in the background after the returning 500.

However, ideally, we shouldn't need to do any FS operations here at all. Could you rewrite the runCombinedExport to return a fs.ReadStream? This way the result can be streamed directly instead of read from disk first.

It would then look something like this

const zipStream = await runCombinedExport({ includeDeleted, includeHidden }) zipStream.pipe(res)

createZipStream() no longer uses disk inputs/outputs
runCombinedExport() returns a ReadStream but still uses disk temporarily
export.ts no longer uses fs.promises

mattwalo32 · 2025-09-13T14:44:04Z

apps/frontend/src/components/ExportButton/ExportButton.tsx

+            disabled={loading}
+            startIcon={loading ? <CircularProgress size={20} /> : null}
+          >
+            {loading ? 'Exporting...' : 'Export as ZIP'}


As a side note, it would be pretty cool if we could get those logs on the backend that show all the files being download + progress streamed to the frontend. As a user, since this takes a bit, it's hard to tell if it's exporting or if something went wrong.

But don't worry about that in this PR, that's more of a reach goal.

mattwalo32 · 2025-09-13T14:46:03Z

apps/backend/src/utils/aws/awsS3Helpers.ts

+// Step 2: Determine file type from saved file on disk
+const detectFileTypeFromFile = async (filePath: string): Promise<string | null> => {
+  try {
+    const buffer = fs.readFileSync(filePath, { encoding: null }); // Read first 4KB for type detection


Why specify encoding here? I believe the default is null if unspecified. However, we can skip this call to readFileSync entirely by using the fileTypeFromFile function in file-type.

This does all the file reading behind the scenes. The benefit to this is that we don't have to read the entire file into memory. It probably only needs to read the first chunk into memory to get the "magic number" in the file header, so this should be more efficient.

mattwalo32 · 2025-09-13T14:53:12Z

apps/frontend/src/api/api.ts

    return res.data
 }
+
+export const triggerExportDownload = async (includeDeleted: boolean, includeHidden: boolean) => {


Naming is a little ambiguous IMO because "triggering an export download" could mean a few things (I.e. Does it mean we're kicking off a download that happens in the background, or kicking off a download and receiving the result now?).

Could you rename to something like downloadAllPatientData or something similar?

mattwalo32 · 2025-09-13T15:05:06Z

apps/frontend/src/components/ExportButton/ExportButton.tsx

+}
+
+const ExportButton: React.FC<ExportButtonProps> = ({
+  buttonText = 'Export Data',


I think it's fine to not make this a prop and just have the button component decide what text it says.

Also can you add a translation for this. It consists of the following steps:

Open up apps/frontend/src/translation.json

Add a new field under the EN key with the english text

Add a corresponding one with the arabic text under AR

Use the useTranslations hook in the component

For example:

{ "EN": { "exportButtonText": "Export Patient Data", .... }, "AR": { "exportButtonText": "ARABIC HERE. I TRIED PASTING ARABIC HERE AND IT CRASHED MY BROWSER :(", .... }

Then

const translations = useTranslations()[0] translations.exportButtonText

mattwalo32 · 2025-09-14T13:42:55Z

apps/backend/scripts/dataextraction.ts

+  return stepDefinitions;
+}
+
+async function generateStepCSVs(options: ExportOptions = {}) {


The field groups seem to not be working. I added some test data in a table under the interview step for patient C00004. The data does not show up in the interview.csv file.

I'm thinking that we might be better off if each patient gets their own CSV for each step. The field groups don't play nicely with the columns. For example, say we have a field group, "appointments", and it has two subfields, "date", and "notes". If a patient has 3 appointments, we would have 6 columns:

Appointment 1 - Date

Appointment 1 - Notes

Appointment 2 - Date

Appointment 2 - Notes

Appointment 3 - Date

Appointment 3 - Notes

You can see the pattern...

If we have all the patients in a single CSV, that means that for each field group we need to find the max amount of entries that any given patient has and add that as columns. For all patients that have less than that max amount, we have to fill in empty values for their rows.

To me this seems a bit messy and will likely result in very wide tables for the CSV. Wondering if we should instead have individual CSV files per step per patient and write it under their folder (Ex. patients/C00004/interview.csv). The main benefit is that each field is a row instead of a column. What do you think?

Took a hybrid approach:

shared csv file for all patients for a certain step that is not a field group

one csv per patient for a singular field group step for anyone who has that step

mattwalo32 · 2025-09-14T13:43:36Z

apps/backend/scripts/dataextraction.ts

+  if (type === FieldType.DATE) return new Date(value).toISOString();
+  if (type === FieldType.MULTI_SELECT || type === FieldType.TAGS) {
+    return Array.isArray(value) ? value.join(', ') : value;
+  }


The MULTI_SELECT, TAGS, and RADIO_BUTTON fields are not being formatted properly.

For example, take a look at the __root_patient_storage.csv. Notice how it's all OIDs stored there, not readable values. This is because the value stored at these types of fields isn't a humnan-readable string, it's an ID. The reason why we do this is because:

The display text options might change. For example, if we have a tag field with two options "Jordan" or "gaza", maybe someone might go in and change one of the options from "gaza" to "Gaza". Using an ID to reference this will update all of the values on the frontend at once.

We support both english and arabic so we can't just store english text at these values. Instead the ID is used to lookup the object that contains both the english and arabic translations.

The text values for the options are stored on the field itself, so you can do the following:

Change the second param of formatField to be Field instead of FieldType.

If the field is one of these types, iterate through the field.options and find the options where options._id === value.

For all of the matching options (since multi select and tag can have multiple), return the option.Question.EN and join with a comma

fixed it for the most part but for survey.csv only some OIDs converted to values while others stayed

Convert to ESM

ElanaC · 2025-09-17T05:50:40Z

Be sure to run pnpm run lint from the root of the repo

Throwing an error when it hits backend. Backend turned to ESM but the .eslintrc.js is still using commonJS syntax
"
module.exports = {
root: true,
extends: ["3dp4me/web"]
}
"
Should this file be changed to continue to be commonJS syntax or changed to ESM?

mattwalo32 · 2025-09-17T21:28:58Z

Be sure to run pnpm run lint from the root of the repo

Throwing an error when it hits backend. Backend turned to ESM but the .eslintrc.js is still using commonJS syntax " module.exports = { root: true, extends: ["3dp4me/web"] } " Should this file be changed to continue to be commonJS syntax or changed to ESM?

You were right in standup, you can just rename to eslintrc.cjs

ElanaC added 3 commits August 20, 2025 17:16

this is all the updates for having the patients.csv completed, so the…

8ddea7f

… projects main Step1 is completed ONLY, the other two backend steps are in progress

backend work for exporting complete

411c79e

backend data export implementations completed

9646dff

mattwalo32 reviewed Sep 9, 2025

View reviewed changes

ElanaC and others added 7 commits September 10, 2025 04:51

fixed up backend and tried out frontend, putting button on navbar to …

436a18e

…test

fixed up backend and tried out frontend, putting button on navbar to …

96ae173

…test

fixed backend with comments, tried frontend (temporarily threw it in …

3c7953a

…navbar)

Add frontend export functionality

2abbbfa

export button (works?)

4fe722b

convert to using webpack

c4ac34a

add separate prod and dev configs

45ce81e

mattwalo32 reviewed Sep 13, 2025

View reviewed changes

mattwalo32 reviewed Sep 14, 2025

View reviewed changes

Merge pull request #1 from ElanaC/convert-to-esm

34fc37d

Convert to ESM

mattwalo32 mentioned this pull request Sep 21, 2025

export button #653

Closed

Backend implementation for exporting all data #652

Are you sure you want to change the base?

Backend implementation for exporting all data #652

Uh oh!

Conversation

ElanaC commented Sep 6, 2025

Uh oh!

changeset-bot bot commented Sep 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

mattwalo32 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mattwalo32 commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

changeset-bot bot commented Sep 6, 2025 •

edited

Loading

mattwalo32 commented Sep 9, 2025 •

edited

Loading