diff --git a/docs/website/docs/general-usage/incremental-loading.md b/docs/website/docs/general-usage/incremental-loading.md index 293a8a03c8..d2ed9976e2 100644 --- a/docs/website/docs/general-usage/incremental-loading.md +++ b/docs/website/docs/general-usage/incremental-loading.md @@ -120,6 +120,45 @@ If you use the `merge` write disposition, but do not specify merge or primary ke The appended data will be inserted from a staging table in one transaction for most destinations in this case. ::: +Example: Deduplication with Timestamp based sorting + +```py +# Sample data +data = [ + {"id": 1, "metadata_modified": "2024-01-01", "value": "A"}, + {"id": 1, "metadata_modified": "2024-01-02", "value": "B"}, + {"id": 2, "metadata_modified": "2024-01-01", "value": "C"}, + {"id": 2, "metadata_modified": "2024-01-01", "value": "D"}, # Same metadata_modified as above +] + +# Define the resource with dedup_sort configuration +@dlt.resource( + primary_key='id', + write_disposition='merge', + columns={ + "metadata_modified": {"dedup_sort": "desc"} + } +) +def sample_data(): + for item in data: + yield item +``` +Output: +| id | metadata_modified | value | +|-----|------------------|-------| +| 1 | 2024-01-02 | B | +| 2 | 2024-01-01 | C | + +When this resource is executed, the following deduplication rules are applied: + +1. For records with different values in the `dedup_sort` column: + - The record with the highest value is kept when using `desc`. + - For example, between records with id=1, the one with `"metadata_modified"="2024-01-02"` is kept. + +2. For records with identical values in the `dedup_sort` column: + - The first occurrence encountered is kept. + - For example, between records with id=2 and identical `"metadata_modified"="2024-01-01"`, the first record (value="C") is kept. + #### Delete records The `hard_delete` column hint can be used to delete records from the destination dataset. The behavior of the delete mechanism depends on the data type of the column marked with the hint: 1) `bool` type: only `True` leads to a delete—`None` and `False` values are disregarded.