Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding QuPath, ASAP annotation support #1

Open
jlevy44 opened this issue May 5, 2020 · 27 comments
Open

Adding QuPath, ASAP annotation support #1

jlevy44 opened this issue May 5, 2020 · 27 comments
Labels
enhancement New feature or request

Comments

@jlevy44
Copy link
Collaborator

jlevy44 commented May 5, 2020

It may also be nice to consider other formats to convert from for annotation viewing, perhaps converting these other annotation formats to a common dictionary format, such as what I've outlined here: https://github.com/jlevy44/PathFlowAI/wiki/5.-Additional-Tips-and-Tricks. Though maybe this issue is best purposed for https://github.com/DHMC-EDIT/PathFlow-ImageUtils .

@asmagen

@sumanthratna
Copy link
Owner

It looks like QuPath stores annotation objects to a .qpdata file format, and there doesn't seem to be an easy way to read that data directly in a Python script. I think this means we'll need to use the QuPath java library (https://github.com/qupath/qupath/tree/master/qupath-core/src/main/java/qupath/lib/io), and then use Java to Python bindings. The only problem is that I don't think I can return a NumPy array from Java.

Hopefully, I'm incorrect and we can actually just parse the data file as a GeoJSON file. Regardless, @jlevy44, can you send me an example .qpdata file?

It also seems that ASAP is much easier to work with. Both of the following should have example XMLs I can use:

@jlevy44
Copy link
Collaborator Author

jlevy44 commented May 5, 2020

I can also send you some example scripts for ASAP if need be. We should figure out how to parse hierarchical annotation structures though and how to account for that.

Regarding GeoJSON, it's worth a shot. I can dig around for annotations. Then being able to go from shapely or some other annotation storage format to a numpy mask. Maybe another possibility: https://stackoverflow.com/questions/39095994/fast-conversion-of-java-array-to-numpy-array-py4j

Definitely can be something more PFAI v2.

@sumanthratna
Copy link
Owner

sumanthratna commented May 5, 2020

The entirety of the annotations parsing code should be in this method:

viewmask/viewmask/utils.py

Lines 97 to 142 in 4219905

def xml_to_contours(xml_tree, contour_drawer):
"""Extract contours from a TCGA XML annotations file.
Parameters
----------
xml_tree : xml.etree.ElementTree
The XML tree of the TCGA annotations file.
contour_drawer : {'napari', 'cv2'}
If `contour_drawer` is `'napari'` then the contours will be transposed
over its main diagonal. When `contour_drawer` is `'cv2'`, no changes
will be made to the returned contours.
Returns
-------
contours : list of numpy.ndarray
A list of contours, where each contour is a list of coordinates,
where each coordinate is a list with exactly 2 integers, representing
the X and Y coordinates, respectively.
Notes
-----
The main diagonal is defined as the line that connects the top-left corner
and the bottom right corner.
"""
if contour_drawer not in ['napari', 'cv2']:
raise ValueError("contour_drawer must be 'cv2' or 'napari'")
root = xml_tree.getroot()
contours = []
for region in root.iter("Vertices"):
coords = []
for vertex in region:
if contour_drawer == 'napari':
# convert each coordinate to (y, x) to transpose
coords.append([
float(vertex.get("Y")),
float(vertex.get("X"))
])
else: # contour_drawer == 'cv2'
coords.append([
float(vertex.get("X")),
float(vertex.get("Y"))
])
# np.int32 is necessary for cv2.drawContours
contour = np.array(coords, dtype=np.int32)
contours.append(contour)
return contours

ASAP support is definitely something I can work on—I'd be cool to autodetect the XML annotation format to use, but I can't say if that's feasible without seeing an example ASAP file.

I'll also probably get rid of the contour_drawer argument and create a boolean option called transpose so that it's more useful for people who use viewmask as a library.

@asmagen
Copy link

asmagen commented May 6, 2020

I'm not sure how to setup and use ASAP, and the pathologists I'm working with are experienced with QuPath so it would be great if we find a python based QuPath solution, or alternatively a QuPath script to run within the program, although it's less ideal.

The advantage of QuPath is the wand tool which makes the annotation much faster and more accurate (you just drag along the tissue parts of interest and it recognizes the most similar pixel groups to expand the annotation in a smarter way).

How do you see the python based segmentation mask extraction go? I'll be working with a pathologist on annotating these images starting tomorrow so it's important we use the right tool and format.

Thanks

@sumanthratna
Copy link
Owner

sumanthratna commented May 6, 2020

Thanks for your comment! At the very least, I can write a QuPath script, but as you mentioned, that's less than ideal.

If you already have QuPath installed, would it be possible to load in a random image, draw some annotations, and then send me the .qpdata file? Hopefully, I can take a look at how to parse it and let you know if it's usable before tomorrow.

Can you also let me know what sort of annotations you're working with? If you're using bounding boxes and you need classification labels, you should be able to use the scripts here (make sure to save to a file):

Since I have AP tests next week, you might just need to use a QuPath script for the first week of annotating; after that, I can work on a more convenient solution. Alternatively, you could try using ASAP—adding support for ASAP annotations should really only take ~15 min on my end.

If you ultimately need masks of your annotations, this functionality should be present—the only problem is with parsing QuPath annotations. For an example with TCGA MoNuSeg XML annotations, see:

viewmask/viewmask/cli.py

Lines 157 to 158 in 4219905

tree = ET.parse(annotations)
rendered_annotations = xml_to_image(tree)

It should be as easy as just using Python's XML library to create an XML tree from the filename, and then passing this tree to viewmask.utils.xml_to_image.

And for my own reference, this contains a simple ASAP annotations file: computationalpathologygroup/ASAP#167 (comment). It looks like I'll be able to add annotations source autodetection.

@sumanthratna sumanthratna added the enhancement New feature or request label May 6, 2020
@asmagen
Copy link

asmagen commented May 6, 2020

Here's a link to download one annotation file generated by QuPath.

Can we clarify the options and the implications of each so I can tease out what is the best strategy? What I understood is that we can either:

  1. Use QuPath wand tool annotations which is enabling more accurate and faster annotation than the typical boxes etc, unless we decide that the annotation accuracy isn't a major problem, then use a python script to generate the segmentation input required for this package, otherwise run QuPath scripts to generate the output and modify it as needed in a python script to match the desired format
  2. Use ASAP, but then we get back to the question of the annotation accuracy and how to deploy that in a manner where collaborators without computational expertise can open and annotate from their machines. If it's complicated to deploy or load then I would of course prefer not to get there.

Another element to consider is the QC step, where I typically use HistoQC and several simple steps to identify tissue regions and crop the NDPI RGB image in high resolution to a rectangle that fits the tissue, so that the model developed here won't take into account all the background tiles. I can also use the tissue regions mask to set the background pixels to black if needed. I typically save the output of these steps as a PNG image but I can use any other output format. But then we need to consider whether this is being done before or after the manual annotation by the pathologist, and whether the PNG format, for example, will work with QuPath or ASAP. It's quite possible that using PNG instead of pyramidal ndpi file (or similar) in these annotation software will be too slow, if it is accepted as an input at all.

I'm not sure what you meant by "If you ultimately need masks of your annotations". I'm trying to prepare the input for this package for multi-class segmentation task, so I don't mind what format as long as we start from NDPI or PNG.

Here's an example of the annotation types, which as you see, aren't rectangles so I don't think the xml representation will be sufficient. The ideal output would be a mask for each and potentially the package will process them into a single matrix(?)

Thanks for your help

@sumanthratna
Copy link
Owner

Thanks for the sample qpdata file. It looks like the python-javaobj library is having some trouble deserializing the file, so I'll probably need to use a Java -> Python binding. I think conversion of qpdata to a readable format is possible, though.

  1. I don't know how important annotation accuracy is for your work, so it seems like this is the best option, although running a QuPath script may be an inconvenience. QuPath seems to be the way to go if you need high accuracy. If you don't want to use a QuPath script because it'll slow down your workflow, you could just not export yet, but still save the QuPath projects. Once I have a qpdata conversion script ready, all you'll need to do is export to qpdata, and then use viewmask to view your annotations!
  2. That's a good point, and I did notice that they don't have macOS builds available for release, which means I can't easily test.

It's quite possible that using PNG instead of pyramidal ndpi file (or similar) in these annotation software will be too slow, if it is accepted as an input at all.

This might not be a problem. When dask/dask-image#136 gets implemented, I'm planning on automatically converting images to pyramidal dask arrays.

I'm not sure what you meant by "If you ultimately need masks of your annotations". I'm trying to prepare the input for this package for a multi-class segmentation task, so I don't mind what format as long as we start from NDPI or PNG.

One of the main functionalities of viewmask is to be able to convert from annotations to a PNG mask. I think the main problem we need to overcome is to be able to parse QuPath data files—once we can do this, I think viewmask will be able to do what you need.

...the annotation types...aren't rectangles so I don't think the xml representation will be sufficient.

I think XML will still work—if enough coordinates are stored, then the annotations will appear rounded. Here's a non-interactive rendering of TCGA-50-5931-01Z-00-DX1, where the annotations are stored in XML:
Screen Shot 2020-05-06 at 10 09 13

@asmagen
Copy link

asmagen commented May 6, 2020

Got it, thanks. So I'll proceed with annotating several images using QuPath and we can communicate about the integration of the output with PFAI for the distributed multi class segmentation U-net.

@jlevy44
Copy link
Collaborator Author

jlevy44 commented May 7, 2020

Sounds good, @asmagen . XML to npy should be relatively trivial as @sumanthratna had pointed out. I agree with these steps, let me know how the singularity install for PFAI goes! :)

@asmagen
Copy link

asmagen commented May 7, 2020

Perhaps we can ask about the export options/strategy in this qupath image analysis forum, if you think it's relevant

@asmagen
Copy link

asmagen commented May 13, 2020

So what's the status of our issue with QuPath annotations? I'm starting to get these annotations so I would try to run PFAI asap, given a multi class export strategy.

Thanks

@sumanthratna
Copy link
Owner

Unfortunately, I can't work on this until May 19. I think your best option is to continue saving as qpdata, and store all of your qpdata files into the same directory. You should be able to write a groovy script to do what you want, but it'll take some trial-and-error. Then, depending on your OS, you may be able to use a glob in your terminal to batch-run a script on all your qpdata files:

I'm not sure if Groovy can easily write to an XML—you might see better results if you write to a CSV, and then modify viewmask a little to do what you want. Your CSV might look something like this:

color,x1,y1,x2,y2,x3,y3...

This certainly isn't the best format to store the data in (a list of coordinates in a CSV sounds like a bad idea, but CSV files are simple and easy to work with).

@jlevy44
Copy link
Collaborator Author

jlevy44 commented May 14, 2020

We’re happy to help with feature requests, though these will take longer to roll out, as our main priority right now is to quickly patch bugs as we balance our own set of projects and updates.

@jlevy44
Copy link
Collaborator Author

jlevy44 commented May 14, 2020

We’re also happy to give advice where needed.

@asmagen
Copy link

asmagen commented May 19, 2020

Great. Since it's part of your plans to integrate with annotation software I would wait for your input. Happy to discuss whenever's needed!

@asmagen
Copy link

asmagen commented May 22, 2020

Is it still part of the immediate goals? @sumanthratna

@sumanthratna
Copy link
Owner

Unfortunately, this has become a long-term goal. I'm not currently using QuPath for any of my current projects, so this has slowly lost importance on my to-do list.

We definitely realize this is useful functionality to have, but for now, this isn't something that'll really be worked on until PFAI 2. If you come up with a workaround or solution, please let me know!

@asmagen
Copy link

asmagen commented May 22, 2020

What can I use to annotate WSI for segmentation tasks that works directly with PFAI? I thought the recommendation above was to perform annotations in QuPath because you would have one or another of the approaches above working soon and because QuPath allows for integration with pathologists who are typically working in QuPath as well as higher accuracy in the annotation with the smart wand tool. I have created plenty of annotations with my pathologists already so it's unfortunate.

@sumanthratna
Copy link
Owner

You'll still be able to use QuPath annotations on your part—you'll just need to use a groovy script to export the data to a readable format. I believe that if you're using a QuPath project, then you can apply a single script to all your images by using either the QuPath UI or CLI: https://groups.google.com/d/msg/qupath-users/7cdsBsdy4HQ/faFXwPN3BgAJ.

I don't know the specifics of your use-case, but it sounds like all you need is the color and coordinates of annotations, which is possible in QuPath—take a look at the links I've sent in this comment and earlier.

@asmagen
Copy link

asmagen commented May 22, 2020

Can you clarify what annotation software are you using that works with PFAI directly?

@jlevy44
Copy link
Collaborator Author

jlevy44 commented May 22, 2020

ASAP should work and anything that can be exported to XML.

I would highly recommend searching qupath's issues for additional ways to export masks, the repository is currently undergoing continual updates:
https://github.com/qupath/qupath/search?q=annotation&type=Issues

@jlevy44
Copy link
Collaborator Author

jlevy44 commented May 22, 2020

For discussions on PFAI, I would recommend creating an issue in that repository.

@jlevy44
Copy link
Collaborator Author

jlevy44 commented May 30, 2020

@asmagen What mechanism did you end up using for the QuPath export?

@asmagen
Copy link

asmagen commented May 30, 2020

This script here

@jlevy44
Copy link
Collaborator Author

jlevy44 commented May 30, 2020

Thanks for letting us know. Glad that worked.

@petebankhead
Copy link

I just saw this... regarding QuPath + Python, you might be interested in paquo: https://forum.image.sc/t/paquo-read-write-qupath-projects-from-python/41892 (the forum is also the best place for any QuPath questions / tips that aren't in the latest documentation).

Exporting annotations to GeoJSON (+ reading with Shapely) is how I'd try it in QuPath... GeoJSON export from a menu might be added to QuPath in the future, but there are some unresolved ambiguities regarding how that should be done (where the origin should be, the units of export) so I'd like to find out more how other software handles this to maximize compatibility. Script export is rather convenient, and the only batch way to do export... it also allows export in alternative formats (including binary, labelled images etc.)

GeoJSON is, however, a nice format with an open standard used by a lot of other software - and it supports complicated regions (including holes, disconnected pieces). As far as I know ASAP's XML is specific to it (although it looks similar to Aperio's XML for ImageScope) and doesn't support as many shapes.

PS. There was a link to the QuPath Google Group above, but it hasn't been active for a long time... command line docs are here.

@hutaohutc
Copy link

It looks like QuPath stores annotation objects to a .qpdata file format, and there doesn't seem to be an easy way to read that data directly in a Python script. I think this means we'll need to use the QuPath java library (https://github.com/qupath/qupath/tree/master/qupath-core/src/main/java/qupath/lib/io), and then use Java to Python bindings. The only problem is that I don't think I can return a NumPy array from Java.

Hopefully, I'm incorrect and we can actually just parse the data file as a GeoJSON file. Regardless, @jlevy44, can you send me an example .qpdata file?

It also seems that ASAP is much easier to work with. Both of the following should have example XMLs I can use:

Could you please tell me how to convert a mask to a xml file then I can open it in ASAP?@sumanthratna

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants