WIP: [R-package] Add support for specifying training indices in lgb.cv() #3989

julioasotodv · 2021-02-16T12:27:06Z

ghost · 2021-02-16T12:50:47Z

All CLA requirements met.

julioasotodv · 2021-02-16T13:41:32Z

No idea why Azure pipelines are unhappy. It looks like it was an issue with the instances while building the containers... Shall we force rebuild?

StrikerRUS · 2021-02-16T13:49:32Z

@julioasotodv

It looks like it was an issue with the instances while building the containers... Shall we force rebuild?

Thanks, done!

jameslamb

Thanks for taking the time to contribute! This is a really nice addition.

Could you please add a unit test in the lgb.cv() section, checking that this behavior works as expected? It would be great if you can set up a small time-series cross validation example as the test. https://github.com/microsoft/LightGBM/blob/master/R-package/tests/testthat/test_basic.R#L262

R-package/R/lgb.cv.R

jameslamb · 2021-02-16T14:37:03Z

I changed the PR's description to say "fixes" instead of "as seen in". That way when we merge this, the issue will be closed automatically.

Co-authored-by: James Lamb <[email protected]>

julioasotodv · 2021-02-16T15:14:03Z

Thanks for taking the time to contribute! This is a really nice addition.

Could you please add a unit test in the lgb.cv() section, checking that this behavior works as expected? It would be great if you can set up a small time-series cross validation example as the test. https://github.com/microsoft/LightGBM/blob/master/R-package/tests/testthat/test_basic.R#L262

Sure! I will try to come up with a meaningful test.

jameslamb · 2021-03-12T19:59:56Z

@julioasotodv are you still interested in contributing this? I'm available to help if you have any questions.

julioasotodv · 2021-03-12T20:11:32Z

@jameslamb Indeed! Sorry had a couple of busy weeks...

Yes, I was thinking on how to make a test for this, as I don't see it straightforward... Since lgb.cv() basically returns the results list, I could only think about seeding everything (model through params and nfold) and check that result changes with and without train_folds.

But perhaps there is a better way I did not think of...

jameslamb · 2021-03-12T20:27:13Z

No problem, and no rush at all!

Let me know if it's too simplistic, but what about this?

Construct a dataset for a regression tasks where the relationship between the features and target should be easy for LightGBM to figure out. Like maybe where you have a few features that are all some_constant * y + some_noise()
Construct a second, much larger dataset with the same number of features that is all random noise
Set folds to only slices of the "real" data (so you aren't evaluating on any of the noise data)
Run lgb.cv() twice, once without train_folds and once with train_folds set to only contain indices from the "real data". Make num_iterations and num_leaves fairly small (like num_iterations = 5L, num_leaves = 5L).
Compare model performance between the two cases, evaluated only against the "real" data. The model trained on only the "real data" (where you specified train_folds) should have much better performance than the one trained on mostly noise.

jameslamb · 2021-04-04T18:29:41Z

gently pinging @julioasotodv , is there anything I can do to help you with this?

jameslamb · 2021-09-01T05:17:37Z

For now, I'm going to close this PR due to lack of response. @julioasotodv, thank you again for your interest in LightGBM and a great issue write-up in #3924! If you have the time and interest to contribute this in the future, we'd welcome a new pull request.

I think providing better support for customization of the cross-validation process (like for cases mentioned in #3924 where you want to do time-based splits) is a valuable contribution to the R package, but not one that will be done by maintainers before the next release (#4310).

I'm going to lock discussion on this PR for now to focus all discussion of this feature back on #3924. Anyone who is interested in contributing this feature (or wants to ask maintainers to give it higher priority and implement it) is welcome to comment on #3924.

Add support for specifying training indices in lgb.cv()

37888e0

As seen in issue microsoft#3924

julioasotodv requested review from jameslamb and Laurae2 as code owners February 16, 2021 12:27

Removed trailing whitespace in comment

8e1d21c

StrikerRUS changed the title ~~Add support for specifying training indices in lgb.cv()~~ [R-package] Add support for specifying training indices in lgb.cv() Feb 16, 2021

StrikerRUS added the feature label Feb 16, 2021

Julio Antonio Soto added 2 commits February 16, 2021 13:36

Fix linter issues

919235f

Updated R docs with train_folds arugment for lgb.cv()

24bc7cb

jameslamb requested changes Feb 16, 2021

View reviewed changes

R-package/R/lgb.cv.R Outdated Show resolved Hide resolved

R-package/R/lgb.cv.R Outdated Show resolved Hide resolved

Julio Antonio Soto and others added 4 commits February 16, 2021 15:49

Mispelling in lgb.cv.R

4fa6e93

Co-authored-by: James Lamb <[email protected]>

Changed train_folds argument order in lgb.cv

a1d1cec

Changed train_folds argument order in lgb.cv docstring

2d55ae4

Changed train_folds argument order in lgb.cv docs

afca0e5

julioasotodv changed the title ~~[R-package] Add support for specifying training indices in lgb.cv()~~ wip: [R-package] Add support for specifying training indices in lgb.cv() Feb 16, 2021

julioasotodv changed the title ~~wip: [R-package] Add support for specifying training indices in lgb.cv()~~ WIP: [R-package] Add support for specifying training indices in lgb.cv() Feb 16, 2021

Julio Antonio Soto added 2 commits February 16, 2021 16:22

Fix lgb.cv docs

af214f3

Fixed minor linting issue

4667a8f

jameslamb added the in progress label Feb 18, 2021

jameslamb removed the request for review from Laurae2 May 4, 2021 16:20

jameslamb closed this Sep 1, 2021

microsoft locked and limited conversation to collaborators Sep 1, 2021

jameslamb removed the in progress label Dec 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: [R-package] Add support for specifying training indices in lgb.cv() #3989

WIP: [R-package] Add support for specifying training indices in lgb.cv() #3989

julioasotodv commented Feb 16, 2021 •

edited by jameslamb

Loading

ghost commented Feb 16, 2021 •

edited by ghost

Loading

julioasotodv commented Feb 16, 2021

StrikerRUS commented Feb 16, 2021

jameslamb left a comment

jameslamb commented Feb 16, 2021

julioasotodv commented Feb 16, 2021

jameslamb commented Mar 12, 2021

julioasotodv commented Mar 12, 2021 •

edited

Loading

jameslamb commented Mar 12, 2021

jameslamb commented Apr 4, 2021

jameslamb commented Sep 1, 2021 •

edited

Loading

WIP: [R-package] Add support for specifying training indices in lgb.cv() #3989

WIP: [R-package] Add support for specifying training indices in lgb.cv() #3989

Conversation

julioasotodv commented Feb 16, 2021 • edited by jameslamb Loading

ghost commented Feb 16, 2021 • edited by ghost Loading

julioasotodv commented Feb 16, 2021

StrikerRUS commented Feb 16, 2021

jameslamb left a comment

Choose a reason for hiding this comment

jameslamb commented Feb 16, 2021

julioasotodv commented Feb 16, 2021

jameslamb commented Mar 12, 2021

julioasotodv commented Mar 12, 2021 • edited Loading

jameslamb commented Mar 12, 2021

jameslamb commented Apr 4, 2021

jameslamb commented Sep 1, 2021 • edited Loading

julioasotodv commented Feb 16, 2021 •

edited by jameslamb

Loading

ghost commented Feb 16, 2021 •

edited by ghost

Loading

julioasotodv commented Mar 12, 2021 •

edited

Loading

jameslamb commented Sep 1, 2021 •

edited

Loading