Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

examples: kymo micron to bp conversion #691

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/examples/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,4 @@ For all of the examples, it is assumed that the following lines precede any othe
cas9_kymotracking/cas9_kymotracking
hairpin_fitting/hairpin_unfolding
bead_coupling/coupling
kymo_markers/kymo_markers
3 changes: 3 additions & 0 deletions docs/examples/kymo_markers/green_profile.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/examples/kymo_markers/kymo_complete.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
206 changes: 206 additions & 0 deletions docs/examples/kymo_markers/kymo_markers.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@

Kymograph with markers
======================

.. only:: html

:nbexport:`Download this page as a Jupyter notebook <self>`

.. _kymo_markers:

Convert kymograph coordinates to basepairs using markers
--------------------------------------------------------

In this notebook we use two red markers with known coordinates on a kymograph to convert coordinates from micrometers to base pair.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In this notebook we use two red markers with known coordinates on a kymograph to convert coordinates from micrometers to base pair.
In this notebook we use two red markers with known coordinates on a kymograph to convert coordinates from micrometers to base pairs.

The workflow is as follows:

- Determine the location of the red markers using a peak detection algorithm on the red channel
- Use the known coordinates of the markers to convert coordinates to basepairs

We then compare the binding profile, with base pair coordinates, to the expected target sites for the protein.

This Notebook requires the `peakutils` package. Run the following cell to make sure it is installed:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Should probably give instructions for pip vs conda. Ideally, we should offer peakfinding in Pylake itself. The algorithms are already there, just no public API.

I think for now, this is fine, but it'd be great if people wouldn't have to install stuff to use the notebook. What do you think?


Download the data file
----------------------

The kymograph is stored on zenodo.org.
We can download the data directly from Zenodo using the function :func:`~lumicks.pylake.download_from_doi`.
The data will be stored in the folder called `"test_data"`::

filenames = lk.download_from_doi("10.5281/zenodo.7729525", "test_data")

Plot the kymograph
------------------

Before starting the analysis on the high frequency data, let's look at the complete kymograph::

file = lk.File("test_data/kymo.h5")
kymo = file.kymos["16"]
plt.figure()
kymo.plot("rgb", adjustment=lk.ColorAdjustment(0, 98, mode="percentile"))

.. image:: kymo_complete.png

The above image shows the full kymograph.

Markers and target sites
------------------------

Provide the binding coordinates of the red markers and the two expected target sites for the gren protein, on lambda DNA::

marker1 = 33.786 # The location of the markers in kbp
marker2 = 44.826
Comment on lines +52 to +53
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
marker1 = 33.786 # The location of the markers in kbp
marker2 = 44.826
marker1_kbp = 33.786 # The location of the markers in kbp
marker2_kbp = 44.826

dna_kbp = 48.502 # length of DNA in kbp
target_sites = [18.084, 30.533] # Target binding sites
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity, why'd you choose to put the sites in an array and the markers in separate floats?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally I had a file with 10 target sites. It does make sense to add the markers to an array too.



Detect markers on kymograph
---------------------------

First, indicate the location of the bead edges by looking at the image of the kymograph. These coordinates are used to crop the kymograph::

top_bead_edge = 9 # Rough location of bead edges in um
bottom_bead_edge = 24.56

The following analysis requires the force to be constant, therefore we only choose the time interval from 10 seconds to 35 seconds.
Compute the average red profile for the selected area of the kymograph::

kymo_calibration = kymo["10s":"35s"].crop_by_distance(top_bead_edge, bottom_bead_edge)
profile_red = np.mean(kymo_calibration.get_image("red"), axis=1)/np.max(np.mean(kymo_calibration.get_image('red'), axis=1))
x = np.arange(len(profile_red))*kymo_calibration.pixelsize_um
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that this is all about coordinate transformations, I would really consider adding units (of course if you accept this change you would have to make other changes too), but it makes things less error prone.

Suggested change
x = np.arange(len(profile_red))*kymo_calibration.pixelsize_um
x_um = np.arange(len(profile_red)) * kymo_calibration.pixelsize_um


plt.figure()
plt.plot(x,profile_red, 'r')
plt.title("Red profile")
plt.xlabel("Position (um)")
plt.ylabel("Normalized intensity (a.u.)")
plt.savefig("profile_red.png")

.. image:: profile_red.png

The two highest peaks correspond to the locations of the markers. Use a peak finding algorithm to determined the precise location of these peaks::

indexes = peakutils.indexes(profile_red, thres=0.4, min_dist=30)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, this is fine, but maybe we should fast track having peakfinding in Pylake natively. Especially considering that we already have a very similar peakfinder in the codebase (just not public).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I was not aware of that. Indeed makes sense to use that. Does that peak finder also fit a Gaussian?

print(indexes)
peaks_x = peakutils.interpolate(x, profile_red, ind=indexes, width = 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think by default this uses an uncorrected centroid algorithm, which is probably not the best for subpixel optimization since it can be biased with higher backgrounds.

print(peaks_x)

plt.figure()
plt.plot(x,profile_red, 'r')
plt.vlines(peaks_x, ymin=0, ymax = 1)
plt.title("Identified peaks")
plt.xlabel("Position (um)")
plt.ylabel("Normalized intensity (a.u.)")

.. image:: profile_red_peaks.png

If the above analysis results in less or more than two identified peaks, the values of `thres` or the `min_dist` may have to be adjusted.

Functions for conversion to basepairs and vice versa
----------------------------------------------------

The following functions use the peak coordinates to convert from micron to base pairs, or vice versa.
The functions also check whether kymograph has to be flipped::

def um_to_kbp(coord, maxx, peak1, peak2, coord1 = marker1, coord2 = marker2):
"""Convert coordinates along the kymo in micron to kbp

Parameters
-----------
coord: coordinate in um to be converted
maxx: Max x-coordinate assuming that the coordinates are from one bead ege to the next.
This value is used to determine whether the coordinates have to be flipped
peak1: coordinate of first peak um
peak2: coordinate of second peak in um
coord1: coordinate of first reference dye in kbp
coord2: coordinate of second reference dye in kbp

Typical use: um_to_kbp(coord, maxx = np.max(x), peak1 = peaks_x[0], peak2 = peaks_x[1], coord1 = marker1, coord2 = marker2)
Comment on lines +109 to +119
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting of parameters is a little different. Considering that every parameter is a coord here, I would consider suffixing the unit.

Usually it's name : type and then description.

Suggested change
Parameters
-----------
coord: coordinate in um to be converted
maxx: Max x-coordinate assuming that the coordinates are from one bead ege to the next.
This value is used to determine whether the coordinates have to be flipped
peak1: coordinate of first peak um
peak2: coordinate of second peak in um
coord1: coordinate of first reference dye in kbp
coord2: coordinate of second reference dye in kbp
Typical use: um_to_kbp(coord, maxx = np.max(x), peak1 = peaks_x[0], peak2 = peaks_x[1], coord1 = marker1, coord2 = marker2)
Parameters
----------
coord_um: np.ndarray
Coordinate in um to be converted.
max_coord_um: int
Maximum x-coordinate in um assuming that the coordinates are from one bead edge to the next.
This value is used to determine whether the coordinates have to be flipped.
peak1_um: float
Coordinate of first peak in um
peak2_um: float
Coordinate of second peak in um
coord1_kbp: float
Coordinate of first reference dye in kbp
coord2_kbp: float
Coordinate of second reference dye in kbp
Returns
-------
np.ndarray
x-coordinate converted to kbp
Examples
--------
::
um_to_kbp(...)


returns:
coordinate x converted to kbp
"""
if maxx - peak1 - peak2 < 0:
a = (coord2 - coord1)/(peak2 - peak1)
b = coord1 - a*peak1
c = 0
else: # Flip coordinates if peaks are in the top half of the kymo
a = -(coord2 - coord1)/(peak2 - peak1)
b = coord1 - a*peak1
c = coord2 - coord1
return a*coord + b + c
Comment on lines +124 to +132
Copy link
Member

@JoepVanlier JoepVanlier Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it easier to just flip the coordinates? The resulting code seems a lot simpler to me.

Suggested change
if maxx - peak1 - peak2 < 0:
a = (coord2 - coord1)/(peak2 - peak1)
b = coord1 - a*peak1
c = 0
else: # Flip coordinates if peaks are in the top half of the kymo
a = -(coord2 - coord1)/(peak2 - peak1)
b = coord1 - a*peak1
c = coord2 - coord1
return a*coord + b + c
# Flip coordinates if peaks are in the top half of the kymo
if max_coord_um - peak1_um - peak2_um >= 0:
coord2_kbp, coord1_kbp = coord1_kbp, coord2_kbp
kbp_per_um = (coord2_kbp - coord1_kbp) / (peak2_um - peak1_um)
offset_kbp = coord1_kbp - kbp_per_um * peak1_um
return kbp_per_um * coord_um + offset_kbp

Second question, the criterion tests whether the average of the peaks is over half. What if the peaks end up on either side of the middle?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a specific use case where the markers are asymmetrically positioned. The script should indeed be able to handle the symmetric case.



def kbp_to_um(coord_kbp, maxx, peak1, peak2, coord1 = marker1, coord2 = marker2 ):
"""Conver coordinates along the kymo in micron to kbp

Parameters
-----------
coord_kbp: coordinate in kbp to be converted
maxx: Max x-coordinate assuming that the coordinates are from one bead ege to the next.
This value is used to determine whether the coordinates have to be flipped
peak1: coordinate of first peak um
peak2: coordinate of second peak in um
coord1: coordinate of first reference dye in kbp
coord2: coordinate of second reference dye in kbp

returns:
coordinate x converted to kbp
"""
if maxx - peak1 - peak2 < 0:
a = (coord2 - coord1)/(peak2 - peak1)
b = coord1 - a*peak1
c = 0
else: # Flip coordinates if peaks are in the top half of the kymo
a = -(coord2 - coord1)/(peak2 - peak1)
b = coord1 - a*peak1
c = peak2 - peak1
Comment on lines +151 to +158
Copy link
Member

@JoepVanlier JoepVanlier Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is the same as before. Have you considered making a separate function for calculating calibration factors kbp_per_um and offset_kbp and reusing those? Note that c is not needed anymore after the previous comment.

return (coord_kbp - b)/a + c
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return (coord_kbp - b)/a + c
return (coord_kbp - offset_kbp) / kbp_per_um


Green profile with base pair coordinates
----------------------------------------

Using the identified peaks and the functions above, we can now convert the coordinates on the kymograph to basepairs::

profile = np.mean(kymo_calibration.get_image('green'),axis=1)/np.max(np.mean(kymo_calibration.get_image("green"),axis=1))
um_coords = np.arange(len(profile))*kymo_calibration.pixelsize_um
kbp_coords = um_to_kbp(um_coords, maxx = np.max(x), peak1 = peaks_x[0], peak2 = peaks_x[1], coord1 = marker1, coord2 = marker2)

plt.figure()
plt.plot(kbp_coords, profile, "lightgreen")
for i in target_sites:
plt.vlines(i,ymin=np.min(profile), ymax=1, color = "k")
plt.xlabel("Position (kbp)")
plt.ylabel("Normalized intensity (a.u.)")
plt.title("Profile with target coordinates in kbp")

.. image:: green_profile.png

Kymograph with target sites overlaid
------------------------------------
Comment on lines +180 to +181
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not calibrate the kymograph as well? We do have Kymo.calibrate_to_kbp(length_kbp).

This does make me wonder whether we need an API for two point calibration for kymographs (taking into account the required offset). What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely makes sense. It is a common workflow. Sometimes markers are used, other times the blue fluorescence from the beads.


We can also use the markers to convert the coordinates of the target sites from base pairs to micrometers, and overlay them with the kymograph.
Below, the coordinates of the markers are added as well, as a control::

maxt=kymo_calibration.duration-1
plt.figure(figsize=(8,8))
kymo_calibration.plot(channel='rgb', aspect = 1, adjustment=lk.ColorAdjustment([0], [8]))
for i in target_sites:
plt.hlines(kbp_to_um(i, maxx = np.max(x), peak1 = peaks_x[0], peak2 = peaks_x[1]), xmin = 0, xmax=maxt, color = "yellow", linestyle = "dashed", linewidth = 0.5)
plt.hlines(kbp_to_um(marker1, maxx = np.max(x), peak1 = peaks_x[0], peak2 = peaks_x[1]), xmin = 0, xmax=maxt, color = "white", linestyle = "dashed", linewidth = 0.5)
plt.hlines(kbp_to_um(marker2, maxx = np.max(x), peak1 = peaks_x[0], peak2 = peaks_x[1]), xmin = 0, xmax=maxt, color = "white", linestyle = "dashed", linewidth = 0.5)

.. image:: kymo_target_sites.png

Quantify target binding
-----------------------

The profiles above can reveal overall enrichment on a target binding site.
Uncertainty? How to choose the width around the target site?

Further quantify target binding [1]_
- Track binding events using the Kymotracker, bin them, and determine which % is on/off target
- Compare the duration of binding on- vs off-target

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the target accuracy? Is chromatic aberration a worry here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The chromatic aberration has to be determined by the user though. Do we have official specs for the accuracy?

.. [1] M. D. Newton, DNA stretching induces Cas9 off-target activity, NSMB, 7016-7018 (2019)
3 changes: 3 additions & 0 deletions docs/examples/kymo_markers/kymo_target_sites.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/examples/kymo_markers/profile_red.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/examples/kymo_markers/profile_red_peaks.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.