Skip to content

Conversation

corpoverlords
Copy link
Contributor

@corpoverlords corpoverlords commented Sep 8, 2025

Rationale for this change

Allow STL iterator for non-builtin types that does not have the GetView() method.

What changes are included in this PR?

Fix for the type signatures, tests

Are these changes tested?

Yes

Are there any user-facing changes?

No

Copy link

github-actions bot commented Sep 8, 2025

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@corpoverlords corpoverlords changed the title Fixes for Arrow STL iterator for custom types GH-47532: [C++] Fixes for Arrow STL iterator for custom types Sep 8, 2025
Copy link

github-actions bot commented Sep 8, 2025

⚠️ GitHub issue #47532 has been automatically assigned in GitHub to PR creator.

@corpoverlords
Copy link
Contributor Author

Hi @kou, do you know of someone who can review this?

@kou
Copy link
Member

kou commented Sep 11, 2025

@pitrou @zanmato1984 @bkietz Do you want to review this?

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this. It seems like a solid improvement. I think it needs some minor tweaks to maximize usability


// Use custom accessor to iterate over decoded values across chunks
auto it =
Begin<DictionaryType, DictionaryArray, TestDictionaryValueAccessor>(*chunked_array);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think rather than requiring users to provide all of these template arguments we should add ChunkedArray::range<ArrowType, ValueAccessor = DefaultEtc>(). Then we can write

for (int i : chunked_array->range<Int32Type>()) {}

Or

for (int i : chunked_array->range<DictionaryType, TestDictionaryValueAccessor>()) {}

The same member function could be added to Array, in which case, the above could also be used on a dictionary array.

... Actually, using the argument_type trait in util/functional.h it would be possible to infer all template arguments from a lambda value accessor:

auto accessor = [](const DictionaryArray& array, int64_t index) {
  int64_t dict_index = array.GetValueIndex(index);
  const auto& dict = checked_cast<const StringArray&>(*array.dictionary());
  return dict->GetView(dict_index);
};
for (int i : chunked_array->range(accessor)) {}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at how to do this for a while, it seems to me that this will require adding the include to stl_iterator.h in chunked_array.h. I think for now we can first fix the existing accessor, and I can add the range function in a later PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, but even if you'd prefer not to modify chunked_array.h... I don't think that the current PR is very usable since it requires so many redundant template arguments. A non member function could still sidestep all the template arguments:

auto accessor = [](const DictionaryArray& array, int64_t index) {
  int64_t dict_index = array.GetValueIndex(index);
  const auto& dict = checked_cast<const StringArray&>(*array.dictionary());
  return dict->GetView(dict_index);
};
for (int i : Iterate(chunked_array, accessor)) {}

I think it'd be sufficient to write:

template <typename ValueAccessor>
auto Iterate(const ChunkedArray& chunked_array, ValueAccessor value_accessor) {
  using arrow::internal::call::traits;
  static_assert(!call_traits::is_overloaded<ValueAccessor>::value,
                "Cannot infer template arguments from overloaded ValueAccessor");

  using ArrayType = call_traits::argument_type<ValueAccessor, 0>;
  return stl::ChunkedArrayRange<ArrayType, ValueAccessor>{&chunked_array, value_accessor};
}

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Sep 15, 2025
@corpoverlords
Copy link
Contributor Author

Hi @bkietz can I get another review? I modified the test so that it has an example with Iterate<> and a range loop.

value_type operator*() const {
assert(array_);
return array_->IsNull(index_) ? value_type{} : array_->GetView(index_);
return array_->IsNull(index_) ? value_type{} : ValueAccessor{}(*array_, index_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep an instance of ValueAccessor as a data member here and elsewhere; otherwise it must be default constructible and empty. It wouldn't (for example) be able to reference a lookup table by reference unless that table was global.


// Use custom accessor to iterate over decoded values across chunks
auto it =
Begin<DictionaryType, DictionaryArray, TestDictionaryValueAccessor>(*chunked_array);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, but even if you'd prefer not to modify chunked_array.h... I don't think that the current PR is very usable since it requires so many redundant template arguments. A non member function could still sidestep all the template arguments:

auto accessor = [](const DictionaryArray& array, int64_t index) {
  int64_t dict_index = array.GetValueIndex(index);
  const auto& dict = checked_cast<const StringArray&>(*array.dictionary());
  return dict->GetView(dict_index);
};
for (int i : Iterate(chunked_array, accessor)) {}

I think it'd be sufficient to write:

template <typename ValueAccessor>
auto Iterate(const ChunkedArray& chunked_array, ValueAccessor value_accessor) {
  using arrow::internal::call::traits;
  static_assert(!call_traits::is_overloaded<ValueAccessor>::value,
                "Cannot infer template arguments from overloaded ValueAccessor");

  using ArrayType = call_traits::argument_type<ValueAccessor, 0>;
  return stl::ChunkedArrayRange<ArrayType, ValueAccessor>{&chunked_array, value_accessor};
}

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Sep 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants