Repartition association catalogs

The crossmatching products can be stored as association catalogs: https://github.com/astronomy-commons/lsdb/blob/ba4dbe6e017633d52b5369911a0da2cf8733e64b/src/lsdb/io/to_association.py#L49-L64

They will be partitioned according to the left catalog of the crossmatch. 

#### Potential issue

LSDB does not repartition the association catalogs before writing them to disk, so we can end up with lots of very small files which could be aggregated. And even if we did, it looks like the implementation of joins via `AssociationCatalog` are by nature bound to the partitioning of the left catalog:

https://github.com/astronomy-commons/lsdb/blob/ba4dbe6e017633d52b5369911a0da2cf8733e64b/src/lsdb/dask/join_catalog_data.py#L389-L398

We should monitor the performance of `rubin.join(other, through=..., ...)` as the survey progresses and the data volume increases. If there is too much overhead reading small high order files for association catalogs we might need to revisit this and implement repartitioning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repartition association catalogs #587

Potential issue

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Repartition association catalogs #587

Description

Potential issue

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions