You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm encountering an issue when using XGBClassifier for a lane number classification task. My target labels are lane numbers with values [3, 4, 5], but despite setting num_class=6 (to potentially cover six classes), the classifier fails during training with the following error:
ValueError: Invalid classes inferred from unique values of `y`. Expected: [0 1 2], got [3 4 5]
It appears that the classifier is inferring only three classes but then expects them to be numbered [0, 1, 2] rather than [3, 4, 5].
Steps to Reproduce:
importxgboostasxgbfromsklearn.model_selectionimporttrain_test_splitfromsklearn.metricsimportmean_absolute_error, mean_squared_error, accuracy_score, f1_scorefrommathimportsqrtimportpandasaspd# Assume df_original_xgboost is a DataFrame with our data,# where 'target_column' (e.g., "lane_number_smoothed") contains lane numbers [3, 4, 5],# and numerical_features and categorical_features are defined appropriately.# Example definitions:numerical_features= ['speed', 'Distance_vehicle_front', 'Distance_vehicle_front_left']
categorical_features= [] # or your actual categorical features listtarget_column='lane_number_smoothed'# Example DataFrame creation for demonstration (replace with your actual data):data= {
'speed': [50, 60, 55, 70, 65, 80],
'Distance_vehicle_front': [10, 12, 11, 13, 12, 14],
'Distance_vehicle_front_left': [8, 9, 8, 10, 9, 10],
target_column: [3, 4, 5, 3, 4, 5]
}
df_original_xgboost=pd.DataFrame(data)
# Convert categorical columns to 'category' type if anyforfeatureincategorical_features:
df_original_xgboost[feature] =df_original_xgboost[feature].astype('category')
# Prepare Features and Target variableX=df_original_xgboost[numerical_features+categorical_features]
y=df_original_xgboost[target_column]
# Train-Test SplitX_train, X_test, y_train, y_test=train_test_split(X, y, test_size=0.2, random_state=42)
# Print unique labels in y_train (should be [3, 4, 5])print("Unique labels in training set:", sorted(y_train.unique()))
# Initialize XGBClassifier with num_class=6model=xgb.XGBClassifier(random_state=42, num_class=6, enable_categorical=True, device="cuda")
# Attempt to train the model (this raises the ValueError)model.fit(X_train, y_train)
Observed Behavior:
When training the model, I get the following error:
ValueError: Invalid classes inferred from unique values of `y`. Expected: [0 1 2], got [3 4 5]
Expected Behavior:
I would expect one of the following:
XGBClassifier automatically remaps non-zero-based labels (like lane numbers [3, 4, 5]) to a contiguous range starting at 0, or
The classifier provides a clear configuration or parameter to allow the use of non-zero-based labels.
Currently, the workaround is to manually remap the target labels (e.g., mapping {3: 0, 4: 1, 5: 2}) before training, but I would like to know if this behavior is intended or if a fix is planned to support non-zero-based labels directly.
Environment:
XGBoost version, 2.1.4 or later
Python version: 3.11
Device: CUDA enabled
The text was updated successfully, but these errors were encountered:
Hi, the behavior is expected. XGBoost requires encoded labels as input. There's a discussion about whether the classifier should automatically route labels to sklearnLabelEncoder before training. Still, we haven't decided as there are many interfaces, including distributed ones, and we think the user should decide how to encode the labels. For instance, many implementations of the label encoders include the ones from sklearn and cuml for GPU. Also, they need to handle various input types including dataframes and arrays.
I'm encountering an issue when using XGBClassifier for a lane number classification task. My target labels are lane numbers with values [3, 4, 5], but despite setting num_class=6 (to potentially cover six classes), the classifier fails during training with the following error:
Steps to Reproduce:
Observed Behavior:
When training the model, I get the following error:
Expected Behavior:
I would expect one of the following:
XGBClassifier automatically remaps non-zero-based labels (like lane numbers [3, 4, 5]) to a contiguous range starting at 0, or
The classifier provides a clear configuration or parameter to allow the use of non-zero-based labels.
Currently, the workaround is to manually remap the target labels (e.g., mapping {3: 0, 4: 1, 5: 2}) before training, but I would like to know if this behavior is intended or if a fix is planned to support non-zero-based labels directly.
Environment:
XGBoost version, 2.1.4 or later
Python version: 3.11
Device: CUDA enabled
The text was updated successfully, but these errors were encountered: