Skip to content

ProcessingStage input validation fails when the input task is DocumentBatch and the data type is pyarrow #2151

Description

@ayushdg

Describe the bug

The check in validate_input per stage:

        for attr in required_data_attrs:
            if not hasattr(task.data, attr):
                missing_data_attrs.append(attr)

w.r.t required attirs in the data object fails for pyarrow tables since columns don't pass this check

Steps/Code to reproduce bug

Please list minimal steps or code snippet for us to be able to reproduce the bug.

A helpful guide on on how to craft a minimal bug report http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions