Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] lgb.Dataset(table, feature_name="auto") does not auto-detect feature names of pyarrow table #6780

Closed
mlondschien opened this issue Jan 9, 2025 · 4 comments
Labels

Comments

@mlondschien
Copy link
Contributor

Description

According to the docs https://lightgbm.readthedocs.io/en/v4.5.0/pythonapi/lightgbm.Dataset.html, if feature_name="auto", a lightgbm.Dataset can infer feature names from the column names of a pyarrow table. It appears that this does not happen.

Reproducible example

import lightgbm as lgb
import polars as pl
import numpy as np

rng = np.random.default_rng()
df = pl.DataFrame(
    {"a": np.arange(100), "b": rng.normal(100), "c": rng.choice([0, 1])}
)

y = np.random.rand(100)

# This works. LGBM can detect feature names from a pandas dataframe.
data = lgb.Dataset(
    data=df.to_pandas(),
    label=y,
    feature_name="auto",
    categorical_feature=["c"]
)
model = lgb.train(params={}, train_set=data, num_boost_round=10)

# This works, as I manually specify the feature names.
data = lgb.Dataset(
    data=df.to_arrow(),
    label=y,
    feature_name=df.columns,
    categorical_feature=["c"]
)
model = lgb.train(params={}, train_set=data, num_boost_round=10)

# This fails.
data = lgb.Dataset(
    data=df.to_arrow(),
    label=y,
    feature_name="auto",
    categorical_feature=["c"]
)
model = lgb.train(params={}, train_set=data, num_boost_round=10)

Environment info

LightGBM version or commit hash:

# Name                    Version                   Build  Channel
liblightgbm               4.5.0            cpu_h7ba702d_3    conda-forge
lightgbm                  4.5.0                  cpu_py_3    conda-forge
pyarrow                   18.1.0          py313h39782a4_0    conda-forge
pyarrow-core              18.1.0          py313hf9431ad_0_cpu    conda-forge

Command(s) you used to install LightGBM

conda install lightgbm
@jameslamb
Copy link
Collaborator

Thanks for using LightGBM and for the report.

Can you be more specific about what "fails" means? Error message, unexpected behavior, etc.?

@jameslamb jameslamb changed the title lgb.Dataset(table, feature_name="auto") does not auto-detect feature names of pyarrow table [python-package] lgb.Dataset(table, feature_name="auto") does not auto-detect feature names of pyarrow table Jan 9, 2025
@mlondschien
Copy link
Contributor Author

It raises an error:

Traceback (most recent call last):
  File "/Users/mlondschien/code/lgbm.py", line 38, in <module>
    model = lgb.train(params={}, train_set=data, num_boost_round=10)
  File "/Users/mlondschien/mambaforge/envs/lgbm/lib/python3.13/site-packages/lightgbm/engine.py", line 282, in train
    booster = Booster(params=params, train_set=train_set)
  File "/Users/mlondschien/mambaforge/envs/lgbm/lib/python3.13/site-packages/lightgbm/basic.py", line 3637, in __init__
    train_set.construct()
    ~~~~~~~~~~~~~~~~~~~^^
  File "/Users/mlondschien/mambaforge/envs/lgbm/lib/python3.13/site-packages/lightgbm/basic.py", line 2576, in construct
    self._lazy_init(
    ~~~~~~~~~~~~~~~^
        data=self.data,
        ^^^^^^^^^^^^^^^
    ...<9 lines>...
        position=self.position,
        ^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/mlondschien/mambaforge/envs/lgbm/lib/python3.13/site-packages/lightgbm/basic.py", line 2134, in _lazy_init
    raise TypeError(f"Wrong type({type(name).__name__}) or unknown name({name}) in categorical_feature")
TypeError: Wrong type(str) or unknown name(c) in categorical_feature

@jameslamb jameslamb added the bug label Jan 11, 2025
@jameslamb
Copy link
Collaborator

Thank you!

That's helpful for us trying to debug, and for everyone else who might encounter this (so the issue can be found from search engines).

@mlondschien
Copy link
Contributor Author

Closed by #6781

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants