Skip to content

Conversation

ElanaC
Copy link
Contributor

@ElanaC ElanaC commented Sep 6, 2025

Backend implementation for a frontend "Export" button.
Makes a zip file of a patients csv, steps csv, and a patients directory that includes step files that aren't able to be stored in a csv (also makes the folers and csvs separately outside of zip)

  • You can choose if you want hidden and deleted data
    doppler run -- pnpm tsx scripts/dataextraction.ts --include-deleted --include-hidden

Copy link

changeset-bot bot commented Sep 6, 2025

⚠️ No Changeset found

Latest commit: 34fc37d

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Collaborator

@mattwalo32 mattwalo32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job, super impressive work with this

import { main } from './test';

initDB(main)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remove this? Otherwise we'll be running the test every time the server starts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you forget to push this? It still seems to be there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now removed

import { downloadFile } from '../src/utils/aws/awsS3Helpers';
import archiver from 'archiver';

mongoose.set('strictQuery', false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the two imports are important, first for step 3 media files and second for everything to be zipped properly. Line 23 isn't as important, just gets rid of a deprecation warning that pops up in the terminal telling me to add it in.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remove the strictQuery line for now?

You're right that we should probably add that in eventually due to the deprecation, but I'd like to do some reading up on the implications of this. Not sure if it will cause breaking change for us.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for (const field of stepDef.fields) {
// Check if field should be included based on options
if (!includeHidden && field.isHidden) continue;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check isDeleted as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is still missing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (!fs.existsSync(patientDir)) fs.mkdirSync(patientDir, { recursive: true });
if (!fs.existsSync(stepDir)) fs.mkdirSync(stepDir, { recursive: true });
hasDownloadedFiles = true;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary to do this logic with checking hasDownloadedFiles and checking fs.existsSync. You should be able to just call fs.mkdirSync every single time before downloading the file. If it already exists, since recursive: true, it won't throw an error.

In my opinion this will make it a bit more readable by reducing nesting and long-lived variables in the loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unchanged, forgot to push maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattwalo32
Copy link
Collaborator

mattwalo32 commented Sep 9, 2025

Be sure to run pnpm run lint from the root of the repo

import { main } from './test';

initDB(main)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you forget to push this? It still seems to be there.

import { downloadFile } from '../src/utils/aws/awsS3Helpers';
import archiver from 'archiver';

mongoose.set('strictQuery', false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remove the strictQuery line for now?

You're right that we should probably add that in eventually due to the deprecation, but I'd like to do some reading up on the implications of this. Not sure if it will cause breaking change for us.

for (const field of stepDef.fields) {
// Check if field should be included based on options
if (!includeHidden && field.isHidden) continue;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is still missing

if (!fs.existsSync(patientDir)) fs.mkdirSync(patientDir, { recursive: true });
if (!fs.existsSync(stepDir)) fs.mkdirSync(stepDir, { recursive: true });
hasDownloadedFiles = true;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unchanged, forgot to push maybe?

export const router = express.Router();

router.get(
'/download',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now this route is unauthenticated! This means that if this were deployed to production, anyone in the world would be able to download all the patient data from this endpoint.

You can add auth around this by using the requireAdmin middleware. We already have a few routes that are admin only. See how the requireAdmin middleware is used throughout the code base, you should just need to add a line right below /download

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requireAdmin added

// Validate ZIP file exists and has content
await fs.promises.access(zipPath).catch(() => {
return res.status(500).send('ZIP file not found');
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some overarching comments on the implementation here:

Regarding the use of fs.promises, I would generally advise against that. All of these functions have non-promise synchronous versions. For example, you could use fs.accessSync insteaad of fs.promises.access. I dislike promsies for the following reasons:

  1. Less readable in my opinion. Introduces more nesting and indirection
  2. Unexpected behavior with control flow. For instance, you have return res.status(500) inside of the promise catch. block. The issue with this is that the return is scoped to the error handler, not the route handler. So it will actually continue executing the route in the background after the returning 500.

However, ideally, we shouldn't need to do any FS operations here at all. Could you rewrite the runCombinedExport to return a fs.ReadStream? This way the result can be streamed directly instead of read from disk first.

It would then look something like this

const zipStream = await runCombinedExport({
    includeDeleted,
    includeHidden
})

zipStream.pipe(res)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createZipStream() no longer uses disk inputs/outputs
runCombinedExport() returns a ReadStream but still uses disk temporarily
export.ts no longer uses fs.promises

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works now

disabled={loading}
startIcon={loading ? <CircularProgress size={20} /> : null}
>
{loading ? 'Exporting...' : 'Export as ZIP'}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a side note, it would be pretty cool if we could get those logs on the backend that show all the files being download + progress streamed to the frontend. As a user, since this takes a bit, it's hard to tell if it's exporting or if something went wrong.

But don't worry about that in this PR, that's more of a reach goal.

// Step 2: Determine file type from saved file on disk
const detectFileTypeFromFile = async (filePath: string): Promise<string | null> => {
try {
const buffer = fs.readFileSync(filePath, { encoding: null }); // Read first 4KB for type detection
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why specify encoding here? I believe the default is null if unspecified. However, we can skip this call to readFileSync entirely by using the fileTypeFromFile function in file-type.

This does all the file reading behind the scenes. The benefit to this is that we don't have to read the entire file into memory. It probably only needs to read the first chunk into memory to get the "magic number" in the file header, so this should be more efficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return res.data
}

export const triggerExportDownload = async (includeDeleted: boolean, includeHidden: boolean) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming is a little ambiguous IMO because "triggering an export download" could mean a few things (I.e. Does it mean we're kicking off a download that happens in the background, or kicking off a download and receiving the result now?).

Could you rename to something like downloadAllPatientData or something similar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

const ExportButton: React.FC<ExportButtonProps> = ({
buttonText = 'Export Data',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to not make this a prop and just have the button component decide what text it says.

Also can you add a translation for this. It consists of the following steps:

  1. Open up apps/frontend/src/translation.json
  2. Add a new field under the EN key with the english text
  3. Add a corresponding one with the arabic text under AR
  4. Use the useTranslations hook in the component

For example:

{
    "EN": {
        "exportButtonText": "Export Patient Data",
        ....
    },
    "AR": {
        "exportButtonText": "ARABIC HERE. I TRIED PASTING ARABIC HERE AND IT CRASHED MY BROWSER :(",
        ....
}

Then

 const translations = useTranslations()[0]
 translations.exportButtonText

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return stepDefinitions;
}

async function generateStepCSVs(options: ExportOptions = {}) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field groups seem to not be working. I added some test data in a table under the interview step for patient C00004. The data does not show up in the interview.csv file.

I'm thinking that we might be better off if each patient gets their own CSV for each step. The field groups don't play nicely with the columns. For example, say we have a field group, "appointments", and it has two subfields, "date", and "notes". If a patient has 3 appointments, we would have 6 columns:

  • Appointment 1 - Date
  • Appointment 1 - Notes
  • Appointment 2 - Date
  • Appointment 2 - Notes
  • Appointment 3 - Date
  • Appointment 3 - Notes

You can see the pattern...

If we have all the patients in a single CSV, that means that for each field group we need to find the max amount of entries that any given patient has and add that as columns. For all patients that have less than that max amount, we have to fill in empty values for their rows.

To me this seems a bit messy and will likely result in very wide tables for the CSV. Wondering if we should instead have individual CSV files per step per patient and write it under their folder (Ex. patients/C00004/interview.csv). The main benefit is that each field is a row instead of a column. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a hybrid approach:

  • shared csv file for all patients for a certain step that is not a field group
  • one csv per patient for a singular field group step for anyone who has that step

if (type === FieldType.DATE) return new Date(value).toISOString();
if (type === FieldType.MULTI_SELECT || type === FieldType.TAGS) {
return Array.isArray(value) ? value.join(', ') : value;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MULTI_SELECT, TAGS, and RADIO_BUTTON fields are not being formatted properly.

For example, take a look at the __root_patient_storage.csv. Notice how it's all OIDs stored there, not readable values. This is because the value stored at these types of fields isn't a humnan-readable string, it's an ID. The reason why we do this is because:

  1. The display text options might change. For example, if we have a tag field with two options "Jordan" or "gaza", maybe someone might go in and change one of the options from "gaza" to "Gaza". Using an ID to reference this will update all of the values on the frontend at once.
  2. We support both english and arabic so we can't just store english text at these values. Instead the ID is used to lookup the object that contains both the english and arabic translations.

The text values for the options are stored on the field itself, so you can do the following:

  1. Change the second param of formatField to be Field instead of FieldType.
  2. If the field is one of these types, iterate through the field.options and find the options where options._id === value.
  3. For all of the matching options (since multi select and tag can have multiple), return the option.Question.EN and join with a comma

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it for the most part but for survey.csv only some OIDs converted to values while others stayed
Screenshot 2025-09-27 at 1 35 30 PM

@ElanaC
Copy link
Contributor Author

ElanaC commented Sep 17, 2025

Be sure to run pnpm run lint from the root of the repo

Throwing an error when it hits backend. Backend turned to ESM but the .eslintrc.js is still using commonJS syntax
"
module.exports = {
root: true,
extends: ["3dp4me/web"]
}
"
Should this file be changed to continue to be commonJS syntax or changed to ESM?

@mattwalo32
Copy link
Collaborator

Be sure to run pnpm run lint from the root of the repo

Throwing an error when it hits backend. Backend turned to ESM but the .eslintrc.js is still using commonJS syntax " module.exports = { root: true, extends: ["3dp4me/web"] } " Should this file be changed to continue to be commonJS syntax or changed to ESM?

You were right in standup, you can just rename to eslintrc.cjs

@mattwalo32 mattwalo32 mentioned this pull request Sep 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants