Skip to content

Conversation

@ascibisz
Copy link
Contributor

@ascibisz ascibisz commented Mar 14, 2024

Worked with Hassan Sohail, a Johns Hopkins University student working with Margaret Johnson's lab.

Ticket

Ticket is here

Details

This PR builds a converter that will take in NERDSS trajectory data and build a .simularium file. The input data is represented as a series bunch of PDB files, each representing a different timestep, and the timestep is reflected in the file name.

For preliminary data to use and details, see this document.

The jupyter notebook file (examples/Tutorial_nerdss.ipynb) contains an example usage of the converter with the 3 test files I pulled in from the above google doc and shortened. The unit tests also use these files.

Intra-molecular bonds are represented as fibers. After discussion, it was decided that inter-molecular bonds would not be included, for efficiency.

@ascibisz ascibisz force-pushed the feature/nerdss-converter branch from e16e856 to dfcbdcb Compare March 14, 2024 21:05
@ascibisz ascibisz force-pushed the feature/nerdss-converter branch from 1da9620 to 9503d5b Compare February 12, 2025 00:34
@ascibisz ascibisz changed the title WIP: NERDSS Converter NERDSS Converter Feb 12, 2025
@ascibisz ascibisz marked this pull request as ready for review February 12, 2025 21:41
@ascibisz ascibisz requested a review from a team as a code owner February 12, 2025 21:41
@ascibisz ascibisz requested review from blairlyons and toloudis and removed request for a team February 12, 2025 21:41
@ascibisz ascibisz linked an issue Feb 12, 2025 that may be closed by this pull request
@ascibisz ascibisz requested a review from interim17 February 26, 2025 19:51
Copy link

@interim17 interim17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this looks good to me, just left a couple non blocking questions!

Comment on lines +54 to +62
# we are assuming the time is the file name
universe = Universe(
os.path.join(
input_data.path_to_pdb_files, time_steps[time_index] + ".pdb"
)
)
atoms = universe.atoms
temp_n_agents = len(atoms)
agent_data.positions[time_index][0:temp_n_agents] = universe.atoms.positions

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How reliable is this assumption that the pdb files have the time stamps as their name? I get from the documentation you provided that we are making this assumption, but I still wonder if it's at all unreliable and whether there should be some validation/error handling if we encounter file structures or naming different than we expect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I should make sure that all of the file names are numbers at least!

spatial_units=input_data.spatial_units,
plots=input_data.plots,
)
# add the fiber agents, representing bonds, into the trajectory

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this comment line here? It seems like the fiber agents are added during _read_pdb_files on ln153?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops! good catch, I shouldve deleted that

Copy link
Contributor

@toloudis toloudis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions about the choice of representation but the code LGTM

time_steps = []
for file in file_list:
u = Universe(os.path.join(input_data.path_to_pdb_files, file))
n_agents = len(u.atoms.positions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one agent per atom position? just curious, why not treat them as standalone pdb agents? is this just to be able to draw the "fiber" bonds?

It looks to me like the pdbs are going to be actually different on every time step (is that true?). If so, it's going to result in a whole lot of agents either way. Maybe then it's actually better to use spheres like this, rather than cache entire independent pdbs in the browser.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ascibisz can correct me, but I think these PDB files are using the format differently than standard molecular PDBs, since they're representing data coming out of NERDSS and each particle represents a large part of a molecule

timestep
][agent_index]
agent_data.n_subpoints[timestep][agent_index + n_atoms] = (
VALUES_PER_3D_POINT * 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

knowing that all the fibers have only 2 points, it would be interesting if we could try to instance them even more in the viewer. I'm curious to know if these nerdss trajectories really stress the viewer's ability to render them. Also I would consider using a cylinder mesh here to get better instancing. (basically all fibers use the same cylinder as mesh instead of fiber, no subpoints, and you just have to give it the right position and rotation angle...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could even add cylinder as a primitive in the viewer

"metadata": {},
"source": [
"# _Note:_\n",
"To install simulariumio with all depencies needed for PDB, use `pip install simulariumio`"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't seem like there are additional dependencies for NERDSS, so no need for this

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh it needs MDAnalysis. Even though could use [md], probably better to add a nerdss dependency group with the same dependency, for clarity to users

time_steps = []
for file in file_list:
u = Universe(os.path.join(input_data.path_to_pdb_files, file))
n_agents = len(u.atoms.positions)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ascibisz can correct me, but I think these PDB files are using the format differently than standard molecular PDBs, since they're representing data coming out of NERDSS and each particle represents a large part of a molecule

@ascibisz ascibisz merged commit ff85180 into main Mar 12, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Convert NERDSS data

5 participants