Skip to content

Keyspace: Buggy program behavior if rows have been removed between discovery and replication phase #180

@johannes-gehrs

Description

@johannes-gehrs

Describe the bug

Looking at this logic

        def processNonCounterRow(row: Row, whereClause: String): Unit = {
          if (ttlColumn.equals("None")) {
            val rs = getSourceRow(selectStmtWithTs, whereClause, cassandraConnPerPar, customFormat)
            if (rs.nonEmpty) {
              processRowWithTimestamp(row, whereClause, rs)
            } else {
              val rs = getSourceRow(selectStmtWithTTL, whereClause, cassandraConnPerPar, customFormat)
              if (rs.nonEmpty) {
                processRowWithTTL(row, whereClause, rs)
              }
            }
          }
        }

I think the normal case is val rs = getSourceRow(selectStmtWithTs... returns a result.

But if the row was deleted after the row-sets to be worked on have been created in dataReplicationProcess, then rs is empty.

It then falls back to getSourceRow(selectStmtWithTTL... i.e. the version with TTL!

However, in our case we do not pass a TTL column.

But if you look at how you construct selectStmtWithTTL

      case s if s.equals("None") => ""...

because TTL column is not set, it returns an empty string.
We then do getSourceRow with cls being... an empty string.

The error handling Column list (cls) cannot be null or empty will then trigger and stop the whole replication process.

To Reproduce

Not trivial, would need to delete rows at the right time.

Expected behavior

Not very sure, tbh. As far as I can tell this row could be skipped because it would show up in the next newDeletesDF.

Screenshots
n/a

Additional context
n/a

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingduplicateThis issue or pull request already existsquestionFurther information is requested

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions