Implementation of XE_NDCG_MART for the ranking task #2620

sbruch · 2019-12-07T01:41:00Z

This Pull Request implements the recently proposed ranking loss (paper at https://arxiv.org/abs/1911.09798), a variant of the cross-entropy loss that is a convex bound on (shifted and scaled) NDCG. Models trained with this loss perform as well as or better than LambdaMART and, more importantly, exhibit higher robustness to noise in the training data.

…sts for xe_ndcg

sbruch · 2019-12-07T02:18:32Z

I don't believe the one test case that failed is because of a change in this PR (https://travis-ci.org/microsoft/LightGBM/jobs/621876769?utm_medium=notification&utm_source=github_status)

StrikerRUS · 2019-12-07T21:10:42Z

@sbruch

I don't believe the one test case that failed is because of a change in this PR

Thank you very much for your PR and sorry about that! It was failing due to server temporary problems. I've re-run it - now everything is OK.

sbruch · 2019-12-07T22:05:11Z

@StrikerRUS - Thank you!

sbruch · 2019-12-13T00:22:48Z

Hi - I was just wondering if I could get initial feedback on whether you accept PRs (that propose a new algorithm) like this one at all?

StrikerRUS · 2019-12-13T02:02:48Z

@sbruch We are so sorry for the delay! But we are lacking of resources and busy with some bug fixes (bug fixes have higher priority than enhancements). We'll definitely review your PR! And of course we are welcome such kinds of PRs (you may take a look at our feature requests hub or recently merged PR with AUC-mu implementation if you have any doubts).

sbruch · 2019-12-13T02:12:34Z

@StrikerRUS Oh no worries at all. Understood. I just wanted to make sure that this sort of PR would be welcome. Please take your time with the review.

StrikerRUS

I'm not a cpp review, but I have noticed some things related to general design:

include/LightGBM/config.h

src/objective/rank_xendcg_objective.hpp

sbruch · 2019-12-24T14:27:53Z

I'm not a cpp review, but I have noticed some things related to general design:

Thanks for the feedback! I have addressed the issues you raised in a new commit.

sbruch · 2019-12-24T14:47:50Z

@StrikerRUS I'm not sure I understand the continuous-integration failure above. It does not seem to be related to this PR. What do you think?

StrikerRUS

@sbruch Thank you very much for your fast updates!

I'm not sure I understand the continuous-integration failure above. It does not seem to be related to this PR. What do you think?

Oh, please do not worry! That failure is not related to this PR. It's a random failure we observe sometimes with MinGW exclusively and have no idea what causes it. Possibly, it's related to #1818.

include/LightGBM/config.h

src/objective/rank_xendcg_objective.hpp

sbruch · 2019-12-26T14:30:00Z

Oh, please do not worry! That failure is not related to this PR. It's a random failure we observe sometimes with MinGW exclusively and have no idea what causes it. Possibly, it's related to #1818.

Thanks! That's good to know.

src/objective/rank_xendcg_objective.hpp

examples/xendcg/README.md

examples/xendcg/predict.conf

include/LightGBM/config.h

tests/python_package_test/test_sklearn.py

src/objective/rank_xendcg_objective.hpp

sbruch · 2020-01-02T16:41:31Z

@StrikerRUS Thanks for the review! Please let me know if you have any further concerns/questions.

…hreads to be one.

sbruch · 2020-01-03T17:56:16Z

@StrikerRUS I was wondering if there are other changes I need to make and/or approvals I need to obtain to merge this PR?

StrikerRUS

@sbruch Awesome job! Thank you very much!

src/objective/rank_xendcg_objective.hpp

StrikerRUS · 2020-01-09T15:28:58Z

src/objective/rank_xendcg_objective.hpp

+    if (config.seed != 0) {
+      rand_ = new Random(config.seed);
+    } else {
+      rand_ = new Random();
+    }


@guolinke Does LightGBM require special treating for seed == 0? I thought seed is always required for reproducibility.

If 'seed' is not set the expected behavior should be that a seed is selected at random, right? Here ideally I'd like to check if a seed has been set, though the only way to do that I believe is comparing with 0. Is there a different, preferred way of checking if seed is set/not-set?

@sbruch

If 'seed' is not set the expected behavior should be that a seed is selected at random, right?

I don't think so.

When seed is set, it's used to generate other seeds:

LightGBM/src/io/config.cpp

Lines 187 to 195 in f6b8ecf

// generate seeds by seed.

if (GetInt(params, "seed", &seed)) {

Random rand(seed);

int int_max = std::numeric_limits<int16_t>::max();

data_random_seed = static_cast<int>(rand.NextShort(0, int_max));

bagging_seed = static_cast<int>(rand.NextShort(0, int_max));

drop_seed = static_cast<int>(rand.NextShort(0, int_max));

feature_fraction_seed = static_cast<int>(rand.NextShort(0, int_max));

}

Otherwise, default values of concrete seeds are used. For instance,

LightGBM/include/LightGBM/config.h

Line 270 in f6b8ecf

int bagging_seed = 3;

LightGBM/include/LightGBM/config.h

Line 290 in f6b8ecf

int feature_fraction_seed = 2;

LightGBM/include/LightGBM/config.h

Line 349 in f6b8ecf

int drop_seed = 4;

I guess, we need a new param, let say, objective_seed to be consistent with the existing codebase. I can't find any references where seed is used directly.

cc @guolinke

I'm fine doing whatever you and @guolinke decide is appropriate. As far as I understand, either:

Reduce the highlighted code to: rand_ = new Random(config.seed); or,

Introduce an objective_seed

are what you have suggested so far.

As an aside, note that "new Random()" does not factor config.seed in the initialization at all and sets the seed to a randomly generated number. That's what gave me the impression that if seed is unset the behavior is expected to be non-deterministic.

I personally vote for #2 "Introduce an objective_seed". Let's wait for other opinions.

Hi, I was wondering if you all have made a decision?

ping @guolinke @chivee

I think an objective_seed is good.

guolinke · 2020-01-24T13:30:20Z

@sbruch did you compared the time cost of the ranking objective?
Refer to #2701

…tly using seed

sbruch · 2020-01-28T13:10:38Z

@sbruch did you compared the time cost of the ranking objective?
Refer to #2701

I do not have precise measurement of the training time, but in repeated experiments it appears on par with or faster than lambdarank.

StrikerRUS

@sbruch Many thanks for prompt changes! Only one minor remark below:

StrikerRUS · 2020-01-28T23:02:01Z

include/LightGBM/config.h

@@ -746,6 +748,10 @@ struct Config {
  // desc = separate by ``,``
  std::vector<double> label_gain;

+  // desc = random seed for objectives
+  // desc = used only in the ``rank_xendcg`` objective
+  int objective_seed = 1;


I believe it should be 5: data_random_seed=1, feature_fraction_seed=2, bagging_seed=3, drop_seed=4, ...

Ah, right. Sorry about that. Fixed now.

StrikerRUS

@sbruch Thanks a lot for your efforts and patience!

@guolinke Can we merge this?

guolinke · 2020-01-30T03:13:27Z

LGTM

Sebastian Bruch added 5 commits December 6, 2019 18:46

Implementation of XE_NDCG loss function for ranking.

95005c6

Add citation

4a93880

Check in example usage for xe_ndcg loss.

7deec06

Seed the generator when a seed is provided in the config. Add unit-te…

c0876fc

…sts for xe_ndcg

Update documentation

c08921d

sbruch requested review from chivee, guolinke, jameslamb, Laurae2 and StrikerRUS as code owners December 7, 2019 01:41

Fix indentation

fe95e62

StrikerRUS reviewed Dec 24, 2019

View reviewed changes

include/LightGBM/config.h Outdated Show resolved Hide resolved

src/objective/rank_xendcg_objective.hpp Outdated Show resolved Hide resolved

src/objective/rank_xendcg_objective.hpp Show resolved Hide resolved

src/objective/rank_xendcg_objective.hpp Outdated Show resolved Hide resolved

Address issues raised by reviewers.

ad176e0

StrikerRUS reviewed Dec 25, 2019

View reviewed changes

include/LightGBM/config.h Outdated Show resolved Hide resolved

src/objective/rank_xendcg_objective.hpp Outdated Show resolved Hide resolved

Clean up include statements.

51fa03a

StrikerRUS added the awaiting review label Dec 28, 2019

guolinke approved these changes Jan 1, 2020

View reviewed changes

StrikerRUS reviewed Jan 2, 2020

View reviewed changes

StrikerRUS removed the awaiting review label Jan 2, 2020

Fix issues raised by reviewers.

f5f5d72

Sebastian Bruch added 2 commits January 2, 2020 17:58

Regenerate parameters.rst

763a289

Add a note to explain that reproducing xe_ndcg results requires num_t…

81d4b92

…hreads to be one.

StrikerRUS approved these changes Jan 5, 2020

View reviewed changes

src/objective/rank_xendcg_objective.hpp Show resolved Hide resolved

StrikerRUS reviewed Jan 9, 2020

View reviewed changes

Introduce objective_seed and use that in rank_xendcg instead of direc…

a7d9fb8

…tly using seed

StrikerRUS reviewed Jan 28, 2020

View reviewed changes

Change default value of objective_seed

a6584fb

StrikerRUS approved these changes Jan 29, 2020

View reviewed changes

guolinke merged commit 8653098 into microsoft:master Jan 30, 2020

This was referenced Feb 1, 2020

[python][R-package][docs] fix support of XE_NDCG_MART obj in language wrappers and docs #2726

Merged

Extremely randomized trees #2671

Merged

guolinke added the feature label Mar 1, 2020

lock bot locked as resolved and limited conversation to collaborators Apr 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of XE_NDCG_MART for the ranking task #2620

Implementation of XE_NDCG_MART for the ranking task #2620

sbruch commented Dec 7, 2019

sbruch commented Dec 7, 2019

StrikerRUS commented Dec 7, 2019

sbruch commented Dec 7, 2019

sbruch commented Dec 13, 2019

StrikerRUS commented Dec 13, 2019

sbruch commented Dec 13, 2019

StrikerRUS left a comment

sbruch commented Dec 24, 2019

sbruch commented Dec 24, 2019

StrikerRUS left a comment

sbruch commented Dec 26, 2019

sbruch commented Jan 2, 2020

sbruch commented Jan 3, 2020

StrikerRUS left a comment

StrikerRUS Jan 9, 2020

sbruch Jan 9, 2020

StrikerRUS Jan 10, 2020

sbruch Jan 10, 2020

StrikerRUS Jan 10, 2020

sbruch Jan 16, 2020

StrikerRUS Jan 17, 2020

guolinke Jan 24, 2020

sbruch Jan 28, 2020

guolinke commented Jan 24, 2020

sbruch commented Jan 28, 2020

StrikerRUS left a comment

StrikerRUS Jan 28, 2020

sbruch Jan 28, 2020

StrikerRUS left a comment

guolinke commented Jan 30, 2020

	// generate seeds by seed.
	if (GetInt(params, "seed", &seed)) {
	Random rand(seed);
	int int_max = std::numeric_limits<int16_t>::max();
	data_random_seed = static_cast<int>(rand.NextShort(0, int_max));
	bagging_seed = static_cast<int>(rand.NextShort(0, int_max));
	drop_seed = static_cast<int>(rand.NextShort(0, int_max));
	feature_fraction_seed = static_cast<int>(rand.NextShort(0, int_max));
	}

Implementation of XE_NDCG_MART for the ranking task #2620

Implementation of XE_NDCG_MART for the ranking task #2620

Conversation

sbruch commented Dec 7, 2019

sbruch commented Dec 7, 2019

StrikerRUS commented Dec 7, 2019

sbruch commented Dec 7, 2019

sbruch commented Dec 13, 2019

StrikerRUS commented Dec 13, 2019

sbruch commented Dec 13, 2019

StrikerRUS left a comment

Choose a reason for hiding this comment

sbruch commented Dec 24, 2019

sbruch commented Dec 24, 2019

StrikerRUS left a comment

Choose a reason for hiding this comment

sbruch commented Dec 26, 2019

sbruch commented Jan 2, 2020

sbruch commented Jan 3, 2020

StrikerRUS left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guolinke commented Jan 24, 2020

sbruch commented Jan 28, 2020

StrikerRUS left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StrikerRUS left a comment

Choose a reason for hiding this comment

guolinke commented Jan 30, 2020