Skip to content

SNOW-1668981: Update is wrong when source is anti-join #2305

@tvdboom

Description

@tvdboom

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

    Python 3.11.6 (tags/v3.11.6:8b6ee5b, Oct 2 2023, 14:57:12) [MSC v.1935 64 bit (AMD64)]

  2. What operating system and processor architecture are you using?

    Windows-10-10.0.22631-SP0

  3. What are the component versions in the environment (pip freeze)?

    pandas==2.2.2
    snowflake-snowpark-python==1.22.1

  4. What did you do?

from snowflake.snowpark import Session
from snowflake.snowpark.functions import lit

mock_session = Session.builder.config("local_testing", True).create()
test_data = mock_session.create_dataframe(pd.DataFrame({"a": [1, 2]}))

df1 = mock_session.create_dataframe(pd.DataFrame({"A": [0, 1], "B": ['a', 'b']}))
df2 = mock_session.create_dataframe(pd.DataFrame({"A": [0, 1], "B": ['a', 'c']}))

anti = df1.join(df2, on=["A", "B"], how="anti")
anti.show()  # Has only 1 row

# This update should only update 1 row in df1
result = df1.update(
    assignments={"B": lit("f")},
    condition=(df1["A"] == anti["A"]) & (df1["B"] == anti["B"]),
    source=anti,
)

print(result)  # UpdateResult(rows_updated=2, multi_joined_rows_updated=0)
mock_session.table(df1.table_name).show()  # Two rows are updated
  1. What did you expect to see?

    The updated table should have only 1 updated row. Instead, both rows are updated.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingstatus-triage_doneInitial triage done, will be further handled by the driver team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions