You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
perf(spark): Resolve drop-partition-columns projection once per writer instead of per row (#18972)
BulkInsertDataInternalWriterHelper#write redid constant work for every
row when hoodie.datasource.write.drop.partition.columns is enabled:
resolving the config flag, instantiating a key generator via
constructor reflection through getPartitionPathCols, recomputing the
partition-column ordinals into a fresh HashSet, and round-tripping the
whole row through toSeq/fromSeq (boxing every column).
The flag is now resolved once in the constructor, and the retained
(non-partition) field ordinals and types are computed once on the
first write(). The lazy initialization keeps the partition-column
resolution unreachable for the bucket-index subclasses, which override
write() and never drop columns, and for tasks that write no rows,
matching the previous reachability exactly. write() copies the
retained fields into a fresh GenericInternalRow, which is
value-identical to the previous toSeq/filter/fromSeq output.
Copy file name to clipboardExpand all lines: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BulkInsertDataInternalWriterHelper.java
0 commit comments