Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] Examples to tune lightGBM using grid search #4642

Closed
adithirgis opened this issue Oct 2, 2021 · 6 comments
Closed

[R-package] Examples to tune lightGBM using grid search #4642

adithirgis opened this issue Oct 2, 2021 · 6 comments

Comments

@adithirgis
Copy link

Not sure where I could ask this. Are there tutorials / resources for tuning lightGBM using grid search or any other methods in R? I want to tune the hyper parameters in LightGBM using the original package lightGBM in R without using tidymodels. I use this resource for now - https://www.kaggle.com/andrewmvd/lightgbm-in-r. Thank you!

@jameslamb
Copy link
Collaborator

Thanks for using LightGBM!

We don't have any example documentation of performing grid search specifically in the R package, but you could consult the following:

@adithirgis
Copy link
Author

adithirgis commented Oct 2, 2021

Thank you for the prompt response. I tried something similar, not sure if it elegant.

library(lightgbm)
library(Matrix)
library(MLmetrics)

file_shared <- data
train_ind <- sample(seq_len(nrow(file_shared)), size = (nrow(file_shared) * 0.75))
train_x <- as.matrix(file_shared[train_ind, c("hour", "PA_RH", "PA_Temp", "PA_CF_ATM") ])
train_y <- as.matrix(file_shared[train_ind, "BAM" ])
test_x <- as.matrix(file_shared[-train_ind, c("hour", "PA_RH", "PA_Temp", "PA_CF_ATM") ])
test_y <- as.matrix(file_shared[-train_ind, "BAM" ])
dtrain <- lgb.Dataset(train_x, label = train_y)

lgb_grid <- list(objective = "regression",
                metric = "l2", 
                min_sum_hessian_in_leaf = 1,
                feature_fraction = 0.7,
                bagging_fraction = 0.7, # cannot write c(0, 0.5, 0.7)
                bagging_freq = 5,
                min_data = 100,
                max_bin = 50,
                lambda_l1 = 8,
                lambda_l2 = 1.3,
                min_data_in_bin = 100,
                min_gain_to_split = 10,
                min_data_in_leaf = 30,
                is_unbalance = TRUE)


lgb_normalizedgini <- function(preds, dtrain){
  actual <- getinfo(dtrain, "label")
  score  <- NormalizedGini(preds, actual)
  return(list(name = "gini", value = score, higher_better = TRUE))
}

lgb_model_cv <- lgb.cv(params = lgb_grid, data = dtrain, learning_rate = 0.02, num_leaves = 25,
                       num_threads = 2 , nrounds = 7000, early_stopping_rounds = 50,
                       eval_freq = 20, eval = lgb_normalizedgini, nfold = 5, stratified = TRUE)
best_iter <- lgb_model_cv$best_iter

lgb_model <- lgb.train(params = lgb_grid, data = dtrain, learning_rate = 0.02,
                       num_leaves = 25, num_threads = 2 , nrounds = best_iter,
                       eval_freq = 20, eval = lgb_normalizedgini)

test_x$pred_lightgbm <- predict(lgb_model, test_x)

ggplot(test_x, aes(BAM, pred_lightgbm)) + geom_point() + geom_smooth(method = "lm")
summary(lm(BAM ~ pred_lightgbm, data = test_x))
mean(abs((test_x$BAM - test_x$pred_lightgbm) / test_x$BAM)) * 100

@jameslamb
Copy link
Collaborator

Looks like a fine approach to me! And then trying different combinations of parameters in the object you've called lgb_grid through this approach, you could use the results from lgb.cv() to get the expected performance of your model with those different parameter values.

Anything else we can help with?

@adithirgis
Copy link
Author

Thanks again! No, Ill try something out. I will watch this issue in case someone comes up with a method.

@jameslamb
Copy link
Collaborator

Ok sounds good! We actually try to keep the list of open issues as small as possible (to focus maintainers' attention), so I'm going to close this for now.

If you'd be interested in contributing a vignette on hyperparameter tuning with the {lightgbm} R package in the future, I'd be happy to help with any questions you have on contributing!

Once the 3.3.0 release (#4310) makes it to CRAN, we'll focus on converting the existing R package demos to vignettes (@mayer79 has already started this in #3946), and I think a hyperparameter tuning one would be very valuable!

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants