-
Notifications
You must be signed in to change notification settings - Fork 3.9k
GH-47532: [C++] Fixes for Arrow STL iterator for custom types #47531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format?
or
See also: |
83d99d5
to
ed494cd
Compare
|
Hi @kou, do you know of someone who can review this? |
@pitrou @zanmato1984 @bkietz Do you want to review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this. It seems like a solid improvement. I think it needs some minor tweaks to maximize usability
|
||
// Use custom accessor to iterate over decoded values across chunks | ||
auto it = | ||
Begin<DictionaryType, DictionaryArray, TestDictionaryValueAccessor>(*chunked_array); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think rather than requiring users to provide all of these template arguments we should add ChunkedArray::range<ArrowType, ValueAccessor = DefaultEtc>()
. Then we can write
for (int i : chunked_array->range<Int32Type>()) {}
Or
for (int i : chunked_array->range<DictionaryType, TestDictionaryValueAccessor>()) {}
The same member function could be added to Array
, in which case, the above could also be used on a dictionary array.
... Actually, using the argument_type
trait in util/functional.h it would be possible to infer all template arguments from a lambda value accessor:
auto accessor = [](const DictionaryArray& array, int64_t index) {
int64_t dict_index = array.GetValueIndex(index);
const auto& dict = checked_cast<const StringArray&>(*array.dictionary());
return dict->GetView(dict_index);
};
for (int i : chunked_array->range(accessor)) {}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at how to do this for a while, it seems to me that this will require adding the include to stl_iterator.h
in chunked_array.h
. I think for now we can first fix the existing accessor, and I can add the range
function in a later PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, but even if you'd prefer not to modify chunked_array.h
... I don't think that the current PR is very usable since it requires so many redundant template arguments. A non member function could still sidestep all the template arguments:
auto accessor = [](const DictionaryArray& array, int64_t index) {
int64_t dict_index = array.GetValueIndex(index);
const auto& dict = checked_cast<const StringArray&>(*array.dictionary());
return dict->GetView(dict_index);
};
for (int i : Iterate(chunked_array, accessor)) {}
I think it'd be sufficient to write:
template <typename ValueAccessor>
auto Iterate(const ChunkedArray& chunked_array, ValueAccessor value_accessor) {
using arrow::internal::call::traits;
static_assert(!call_traits::is_overloaded<ValueAccessor>::value,
"Cannot infer template arguments from overloaded ValueAccessor");
using ArrayType = call_traits::argument_type<ValueAccessor, 0>;
return stl::ChunkedArrayRange<ArrayType, ValueAccessor>{&chunked_array, value_accessor};
}
ed494cd
to
57e1af9
Compare
Hi @bkietz can I get another review? I modified the test so that it has an example with |
57e1af9
to
24e266c
Compare
value_type operator*() const { | ||
assert(array_); | ||
return array_->IsNull(index_) ? value_type{} : array_->GetView(index_); | ||
return array_->IsNull(index_) ? value_type{} : ValueAccessor{}(*array_, index_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep an instance of ValueAccessor as a data member here and elsewhere; otherwise it must be default constructible and empty. It wouldn't (for example) be able to reference a lookup table by reference unless that table was global.
|
||
// Use custom accessor to iterate over decoded values across chunks | ||
auto it = | ||
Begin<DictionaryType, DictionaryArray, TestDictionaryValueAccessor>(*chunked_array); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, but even if you'd prefer not to modify chunked_array.h
... I don't think that the current PR is very usable since it requires so many redundant template arguments. A non member function could still sidestep all the template arguments:
auto accessor = [](const DictionaryArray& array, int64_t index) {
int64_t dict_index = array.GetValueIndex(index);
const auto& dict = checked_cast<const StringArray&>(*array.dictionary());
return dict->GetView(dict_index);
};
for (int i : Iterate(chunked_array, accessor)) {}
I think it'd be sufficient to write:
template <typename ValueAccessor>
auto Iterate(const ChunkedArray& chunked_array, ValueAccessor value_accessor) {
using arrow::internal::call::traits;
static_assert(!call_traits::is_overloaded<ValueAccessor>::value,
"Cannot infer template arguments from overloaded ValueAccessor");
using ArrayType = call_traits::argument_type<ValueAccessor, 0>;
return stl::ChunkedArrayRange<ArrayType, ValueAccessor>{&chunked_array, value_accessor};
}
Rationale for this change
Allow STL iterator for non-builtin types that does not have the
GetView()
method.What changes are included in this PR?
Fix for the type signatures, tests
Are these changes tested?
Yes
Are there any user-facing changes?
No