Re-design support for IDs

Currently, we support datasets with privacy IDs through two metrics. For individual dataframes, IDs are expressed as `IfGroupedBy(id_col, SymmetricDifference())`. However, sometime we have multiple dataframes with the same ID space. This can happen either at program initialization (if the user provides multiple such dataframes), or throughout the course of the program (a very common execution pattern is to start with a dictionary of dataframes, apply a transformation to one of them, and add the result back to the dictionary). Multiple dataframes with the same ID space are represented through the `AddRemoveKeys` metric, which pairs with a dictionary of dataframes, and identifies which column in each dataframe contains the ID.

This approach has some shortcomings. See [this doc] for a long explanation. In brief: If we have a dictionary of dataframes whose metric is `AddRemoveKeys`, and we want to perform a transformation on a dataframe, we have to pull it out of the dictionary. But when we do, the dataframe's own metric is `IfGroupedBy`. After the transformation, even if the output metric is still `IfGroupedBy`, we can't tell whether it's the same ID space as we started with.

That means that we can only use specially vetted transformations with `AddRemoveKeys` (ones that have been guaranteed to preserve the same ID space). In practice, this takes the form of wrapping every relevant transformation in a `TransformValue`.

Some problems, that I would like to see fixed in a re-design:
- We shouldn't need an entire separate set of transformation to work with `AddRemoveKeys`.
- We should be able to tell from the output metric whether the result of a transformation shares an ID space with other dataframes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-design support for IDs #53

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Re-design support for IDs #53

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions