-
Notifications
You must be signed in to change notification settings - Fork 1
Backend implementation for exporting all data #652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
… projects main Step1 is completed ONLY, the other two backend steps are in progress
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job, super impressive work with this
import { main } from './test'; | ||
|
||
initDB(main) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remove this? Otherwise we'll be running the test every time the server starts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you forget to push this? It still seems to be there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now removed
import { downloadFile } from '../src/utils/aws/awsS3Helpers'; | ||
import archiver from 'archiver'; | ||
|
||
mongoose.set('strictQuery', false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the two imports are important, first for step 3 media files and second for everything to be zipped properly. Line 23 isn't as important, just gets rid of a deprecation warning that pops up in the terminal telling me to add it in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remove the strictQuery line for now?
You're right that we should probably add that in eventually due to the deprecation, but I'd like to do some reading up on the implications of this. Not sure if it will cause breaking change for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
for (const field of stepDef.fields) { | ||
// Check if field should be included based on options | ||
if (!includeHidden && field.isHidden) continue; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to check isDeleted
as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is still missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
if (!fs.existsSync(patientDir)) fs.mkdirSync(patientDir, { recursive: true }); | ||
if (!fs.existsSync(stepDir)) fs.mkdirSync(stepDir, { recursive: true }); | ||
hasDownloadedFiles = true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's necessary to do this logic with checking hasDownloadedFiles
and checking fs.existsSync
. You should be able to just call fs.mkdirSync
every single time before downloading the file. If it already exists, since recursive: true
, it won't throw an error.
In my opinion this will make it a bit more readable by reducing nesting and long-lived variables in the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unchanged, forgot to push maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
Be sure to run |
import { main } from './test'; | ||
|
||
initDB(main) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you forget to push this? It still seems to be there.
import { downloadFile } from '../src/utils/aws/awsS3Helpers'; | ||
import archiver from 'archiver'; | ||
|
||
mongoose.set('strictQuery', false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remove the strictQuery line for now?
You're right that we should probably add that in eventually due to the deprecation, but I'd like to do some reading up on the implications of this. Not sure if it will cause breaking change for us.
for (const field of stepDef.fields) { | ||
// Check if field should be included based on options | ||
if (!includeHidden && field.isHidden) continue; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is still missing
if (!fs.existsSync(patientDir)) fs.mkdirSync(patientDir, { recursive: true }); | ||
if (!fs.existsSync(stepDir)) fs.mkdirSync(stepDir, { recursive: true }); | ||
hasDownloadedFiles = true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unchanged, forgot to push maybe?
export const router = express.Router(); | ||
|
||
router.get( | ||
'/download', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now this route is unauthenticated! This means that if this were deployed to production, anyone in the world would be able to download all the patient data from this endpoint.
You can add auth around this by using the requireAdmin
middleware. We already have a few routes that are admin only. See how the requireAdmin
middleware is used throughout the code base, you should just need to add a line right below /download
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requireAdmin added
// Validate ZIP file exists and has content | ||
await fs.promises.access(zipPath).catch(() => { | ||
return res.status(500).send('ZIP file not found'); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some overarching comments on the implementation here:
Regarding the use of fs.promises
, I would generally advise against that. All of these functions have non-promise synchronous versions. For example, you could use fs.accessSync
insteaad of fs.promises.access
. I dislike promsies for the following reasons:
- Less readable in my opinion. Introduces more nesting and indirection
- Unexpected behavior with control flow. For instance, you have
return res.status(500)
inside of the promise catch. block. The issue with this is that thereturn
is scoped to the error handler, not the route handler. So it will actually continue executing the route in the background after the returning 500.
However, ideally, we shouldn't need to do any FS operations here at all. Could you rewrite the runCombinedExport
to return a fs.ReadStream
? This way the result can be streamed directly instead of read from disk first.
It would then look something like this
const zipStream = await runCombinedExport({
includeDeleted,
includeHidden
})
zipStream.pipe(res)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
createZipStream() no longer uses disk inputs/outputs
runCombinedExport() returns a ReadStream but still uses disk temporarily
export.ts no longer uses fs.promises
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
works now
disabled={loading} | ||
startIcon={loading ? <CircularProgress size={20} /> : null} | ||
> | ||
{loading ? 'Exporting...' : 'Export as ZIP'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a side note, it would be pretty cool if we could get those logs on the backend that show all the files being download + progress streamed to the frontend. As a user, since this takes a bit, it's hard to tell if it's exporting or if something went wrong.
But don't worry about that in this PR, that's more of a reach goal.
// Step 2: Determine file type from saved file on disk | ||
const detectFileTypeFromFile = async (filePath: string): Promise<string | null> => { | ||
try { | ||
const buffer = fs.readFileSync(filePath, { encoding: null }); // Read first 4KB for type detection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why specify encoding
here? I believe the default is null
if unspecified. However, we can skip this call to readFileSync
entirely by using the fileTypeFromFile
function in file-type
.
This does all the file reading behind the scenes. The benefit to this is that we don't have to read the entire file into memory. It probably only needs to read the first chunk into memory to get the "magic number" in the file header, so this should be more efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
return res.data | ||
} | ||
|
||
export const triggerExportDownload = async (includeDeleted: boolean, includeHidden: boolean) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming is a little ambiguous IMO because "triggering an export download" could mean a few things (I.e. Does it mean we're kicking off a download that happens in the background, or kicking off a download and receiving the result now?).
Could you rename to something like downloadAllPatientData
or something similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
} | ||
|
||
const ExportButton: React.FC<ExportButtonProps> = ({ | ||
buttonText = 'Export Data', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to not make this a prop and just have the button component decide what text it says.
Also can you add a translation for this. It consists of the following steps:
- Open up
apps/frontend/src/translation.json
- Add a new field under the
EN
key with the english text - Add a corresponding one with the arabic text under
AR
- Use the
useTranslations
hook in the component
For example:
{
"EN": {
"exportButtonText": "Export Patient Data",
....
},
"AR": {
"exportButtonText": "ARABIC HERE. I TRIED PASTING ARABIC HERE AND IT CRASHED MY BROWSER :(",
....
}
Then
const translations = useTranslations()[0]
translations.exportButtonText
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
return stepDefinitions; | ||
} | ||
|
||
async function generateStepCSVs(options: ExportOptions = {}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The field groups seem to not be working. I added some test data in a table under the interview
step for patient C00004
. The data does not show up in the interview.csv
file.
I'm thinking that we might be better off if each patient gets their own CSV for each step. The field groups don't play nicely with the columns. For example, say we have a field group, "appointments", and it has two subfields, "date", and "notes". If a patient has 3 appointments, we would have 6 columns:
- Appointment 1 - Date
- Appointment 1 - Notes
- Appointment 2 - Date
- Appointment 2 - Notes
- Appointment 3 - Date
- Appointment 3 - Notes
You can see the pattern...
If we have all the patients in a single CSV, that means that for each field group we need to find the max amount of entries that any given patient has and add that as columns. For all patients that have less than that max amount, we have to fill in empty values for their rows.
To me this seems a bit messy and will likely result in very wide tables for the CSV. Wondering if we should instead have individual CSV files per step per patient and write it under their folder (Ex. patients/C00004/interview.csv). The main benefit is that each field is a row instead of a column. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a hybrid approach:
- shared csv file for all patients for a certain step that is not a field group
- one csv per patient for a singular field group step for anyone who has that step
if (type === FieldType.DATE) return new Date(value).toISOString(); | ||
if (type === FieldType.MULTI_SELECT || type === FieldType.TAGS) { | ||
return Array.isArray(value) ? value.join(', ') : value; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MULTI_SELECT
, TAGS
, and RADIO_BUTTON
fields are not being formatted properly.
For example, take a look at the __root_patient_storage.csv
. Notice how it's all OIDs stored there, not readable values. This is because the value stored at these types of fields isn't a humnan-readable string, it's an ID. The reason why we do this is because:
- The display text options might change. For example, if we have a tag field with two options "Jordan" or "gaza", maybe someone might go in and change one of the options from "gaza" to "Gaza". Using an ID to reference this will update all of the values on the frontend at once.
- We support both english and arabic so we can't just store english text at these values. Instead the ID is used to lookup the object that contains both the english and arabic translations.
The text values for the options are stored on the field itself, so you can do the following:
- Change the second param of
formatField
to beField
instead ofFieldType
. - If the field is one of these types, iterate through the
field.options
and find the options whereoptions._id === value
. - For all of the matching options (since multi select and tag can have multiple), return the
option.Question.EN
and join with a comma
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convert to ESM
Throwing an error when it hits backend. Backend turned to ESM but the .eslintrc.js is still using commonJS syntax |
You were right in standup, you can just rename to |
Backend implementation for a frontend "Export" button.
Makes a zip file of a patients csv, steps csv, and a patients directory that includes step files that aren't able to be stored in a csv (also makes the folers and csvs separately outside of zip)
doppler run -- pnpm tsx scripts/dataextraction.ts --include-deleted --include-hidden