Skip to content

Commit 98620f5

Browse files
authored
GH-48134: [C++] Make StructArray::field() thread-safe (#48128)
### Rationale for this change Before this change it was possible for two threads calling `field()` with the same index at the same time to cause a race on the stored entry in `boxed_fields_`. I.e. if a second thread goes into the path that calls `MakeArray` before the first thread stored its own new array, the second thread would also write to the same `shared_ptr` and invalidate the `shared_ptr` from the first thread, thereby also invalidating the returned reference. ### What changes are included in this PR? This PR changes the return type of `StructArray::field()` from `shared_ptr<Array>&` to `shared_ptr<Array>` giving the caller co-ownership of the data and safeguarding against any potential concurrent writes to the underlying `boxed_fields_` vector. It also changes the body to use the CAS pattern to avoid multiple concurrent writes to the same address. ### Are these changes tested? I don't know how to write a deterministic test that triggers the issue before the fix. Even a non-deterministic test needs to run with address sanitizer or valgrind or something similar. I can however confirm that this change fixes an issue that I've been debugging in https://github.com/tenzir/tenzir. ### Are there any user-facing changes? While changing `StructArray::field()` to return by value is an API change, I believe this should be compatible with regular uses of that function. * GitHub Issue: #48134 Authored-by: Tobias Mayer <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
1 parent 83a5a61 commit 98620f5

File tree

2 files changed

+18
-13
lines changed

2 files changed

+18
-13
lines changed

cpp/src/arrow/array/array_nested.cc

Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1076,20 +1076,25 @@ const ArrayVector& StructArray::fields() const {
10761076
return boxed_fields_;
10771077
}
10781078

1079-
const std::shared_ptr<Array>& StructArray::field(int i) const {
1079+
std::shared_ptr<Array> StructArray::field(int i) const {
10801080
std::shared_ptr<Array> result = std::atomic_load(&boxed_fields_[i]);
1081-
if (!result) {
1082-
std::shared_ptr<ArrayData> field_data;
1083-
if (data_->offset != 0 || data_->child_data[i]->length != data_->length) {
1084-
field_data = data_->child_data[i]->Slice(data_->offset, data_->length);
1085-
} else {
1086-
field_data = data_->child_data[i];
1087-
}
1088-
result = MakeArray(field_data);
1089-
std::atomic_store(&boxed_fields_[i], std::move(result));
1090-
return boxed_fields_[i];
1081+
if (result) {
1082+
return result;
10911083
}
1092-
return boxed_fields_[i];
1084+
std::shared_ptr<ArrayData> field_data;
1085+
if (data_->offset != 0 || data_->child_data[i]->length != data_->length) {
1086+
field_data = data_->child_data[i]->Slice(data_->offset, data_->length);
1087+
} else {
1088+
field_data = data_->child_data[i];
1089+
}
1090+
result = MakeArray(field_data);
1091+
// Check if some other thread inserted the array in the meantime and return
1092+
// that in that case.
1093+
std::shared_ptr<Array> expected = nullptr;
1094+
if (!std::atomic_compare_exchange_strong(&boxed_fields_[i], &expected, result)) {
1095+
result = std::move(expected);
1096+
}
1097+
return result;
10931098
}
10941099

10951100
std::shared_ptr<Array> StructArray::GetFieldByName(const std::string& name) const {

cpp/src/arrow/array/array_nested.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -692,7 +692,7 @@ class ARROW_EXPORT StructArray : public Array {
692692
// Return a shared pointer in case the requestor desires to share ownership
693693
// with this array. The returned array has its offset, length and null
694694
// count adjusted.
695-
const std::shared_ptr<Array>& field(int pos) const;
695+
std::shared_ptr<Array> field(int pos) const;
696696

697697
const ArrayVector& fields() const;
698698

0 commit comments

Comments
 (0)