Skip to content

Feature Request: Enhance dyff (and yq integration) for comparing a list of documents #515

@lucasfcnunes

Description

@lucasfcnunes

Description

Currently, when working with Kubernetes manifests or any YAML data structured as a list of records, there is no straightforward one-line command to compare corresponding fields across all records. A common use case is comparing the desired state (.spec.forProvider) with the observed state (.status.atProvider) for multiple resources, which are typically returned in a List kind manifest.

This issue proposes a need for a more ergonomic solution, potentially involving new features or a new pattern for yq and dyff, that would allow for a simple, non-scripted comparison of document pairs within a stream.

Motivation

In a GitOps or a declarative configuration workflow, it is crucial to quickly identify and report configuration drift. A controller might be tasked with reconciling a desired state (e.g., a database's requested version) with the actual state of the external resource (e.g., the provider's assigned version).

The current methods for comparing these values for multiple resources require a multi-line shell script with a loop, which is not ideal for ad-hoc diagnostics or use in short CI/CD steps. A one-line command would significantly improve the developer experience and operational efficiency for SREs and platform engineers.

General Case: Comparing lists of "Before" and "After" records

Consider a general YAML structure where you have a list of records, and each record contains a "before" and an "after" state.

apiVersion: v1
kind: List
items:
- before:
    key1: valueA
    key2: valueB
  after:
    key1: valueA
    key2: valueC # This key is different
- before:
    key1: valueX
    key2: valueY
  after:
    key1: valueX
    key2: valueY # These keys are the same

The goal is to produce a diff for each pair of before and after records, one after the other.

Example with Kubernetes Custom Resources

Let's use a more concrete example with Kubernetes Custom Resources, where we want to compare .spec.forProvider against .status.atProvider.

Input Manifest (kubectl get databaseinstance -o yaml):

apiVersion: v1
kind: List
items:
- apiVersion: database.example.org/v1alpha1
  kind: DatabaseInstance
  metadata:
    name: my-database-1-diff
  spec:
    forProvider:
      engineVersion: "14"
      storageGB: 20
  status:
    atProvider:
      engineVersion: "14.7" # Differs from spec.forProvider
      storageGB: 20
- apiVersion: database.example.org/v1alpha1
  kind: DatabaseInstance
  metadata:
    name: my-database-2-no-diff
  spec:
    forProvider:
      engineVersion: "15"
      storageGB: 50
  status:
    atProvider:
      engineVersion: "15"
      storageGB: 50

Current Functional, but Multi-line Solution

The most reliable approach today uses a shell loop.

Command:

kubectl get databaseinstance -o yaml | yq -I=0 -o=json '.items[] | {"kind": .kind, "namespace": .metadata.namespace, "name": .metadata.name, "spec": .spec.forProvider, "status": .status.atProvider}' | while read -r item; do
    kind=$(echo "$item" | yq '.kind')
    namespace=$(echo "$item" | yq '.namespace')
    name=$(echo "$item" | yq '.name')
    spec=$(echo "$item" | yq '.spec')
    status=$(echo "$item" | yq '.status')
    echo "--- Comparing '-n=${namespace} ${kind} ${name}' ---"
    dyff between <(echo "$spec") <(echo "$status")
done

Resulting Diff Output:

--- Comparing '-n=null DatabaseInstance my-database-1-diff' ---
     _        __  __
   _| |_   _ / _|/ _|  between /tmp/sh-interp-15d00a1f8faa0072
 / _' | | | | |_| |_       and /tmp/sh-interp-10cf1222f3464435
| (_| | |_| |  _|  _|
 \__,_|\__, |_| |_|   returned one difference
        |___/

engineVersion
  ± value change
    - 14
    + 14.7

--- Comparing '-n=null DatabaseInstance my-database-2-no-diff' ---
     _        __  __
   _| |_   _ / _|/ _|  between /tmp/sh-interp-529cdc4cd471de1
 / _' | | | | |_| |_       and /tmp/sh-interp-8c59a373ac43b121
| (_| | |_| |  _|  _|
 \__,_|\__, |_| |_|   returned no differences
        |___/

The Challenge: A True One-liner

The core problem is that dyff between requires exactly two inputs. While yq can output multiple documents, it's not a pair-wise stream processor. A command like kubectl get ... | yq '.items[] | .spec.forProvider, .status.atProvider' | dyff between - fails because dyff receives a stream of 4 documents (spec1, status1, spec2, status2) and cannot pair them correctly.

A potential solution would be a new dyff mode or a clever yq trick that allows it to process the input stream in a pair-wise fashion. This would enable a one-line command that is both efficient and readable.

This could be a valuable enhancement for both yq and dyff that addresses a common pain point in the Kubernetes and broader YAML ecosystem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions