-
Notifications
You must be signed in to change notification settings - Fork 6
Description
The crossmatching products can be stored as association catalogs: https://github.com/astronomy-commons/lsdb/blob/ba4dbe6e017633d52b5369911a0da2cf8733e64b/src/lsdb/io/to_association.py#L49-L64
They will be partitioned according to the left catalog of the crossmatch.
Potential issue
LSDB does not repartition the association catalogs before writing them to disk, so we can end up with lots of very small files which could be aggregated. And even if we did, it looks like the implementation of joins via AssociationCatalog are by nature bound to the partitioning of the left catalog:
We should monitor the performance of rubin.join(other, through=..., ...) as the survey progresses and the data volume increases. If there is too much overhead reading small high order files for association catalogs we might need to revisit this and implement repartitioning.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status