-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of XE_NDCG_MART for the ranking task #2620
Conversation
I don't believe the one test case that failed is because of a change in this PR (https://travis-ci.org/microsoft/LightGBM/jobs/621876769?utm_medium=notification&utm_source=github_status) |
Thank you very much for your PR and sorry about that! It was failing due to server temporary problems. I've re-run it - now everything is OK. |
@StrikerRUS - Thank you! |
Hi - I was just wondering if I could get initial feedback on whether you accept PRs (that propose a new algorithm) like this one at all? |
@sbruch We are so sorry for the delay! But we are lacking of resources and busy with some bug fixes (bug fixes have higher priority than enhancements). We'll definitely review your PR! And of course we are welcome such kinds of PRs (you may take a look at our feature requests hub or recently merged PR with AUC-mu implementation if you have any doubts). |
@StrikerRUS Oh no worries at all. Understood. I just wanted to make sure that this sort of PR would be welcome. Please take your time with the review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a cpp review, but I have noticed some things related to general design:
Thanks for the feedback! I have addressed the issues you raised in a new commit. |
@StrikerRUS I'm not sure I understand the continuous-integration failure above. It does not seem to be related to this PR. What do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbruch Thank you very much for your fast updates!
I'm not sure I understand the continuous-integration failure above. It does not seem to be related to this PR. What do you think?
Oh, please do not worry! That failure is not related to this PR. It's a random failure we observe sometimes with MinGW exclusively and have no idea what causes it. Possibly, it's related to #1818.
Thanks! That's good to know. |
@StrikerRUS Thanks for the review! Please let me know if you have any further concerns/questions. |
@StrikerRUS I was wondering if there are other changes I need to make and/or approvals I need to obtain to merge this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbruch Awesome job! Thank you very much!
if (config.seed != 0) { | ||
rand_ = new Random(config.seed); | ||
} else { | ||
rand_ = new Random(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guolinke Does LightGBM require special treating for seed == 0
? I thought seed is always required for reproducibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If 'seed' is not set the expected behavior should be that a seed is selected at random, right? Here ideally I'd like to check if a seed has been set, though the only way to do that I believe is comparing with 0. Is there a different, preferred way of checking if seed is set/not-set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If 'seed' is not set the expected behavior should be that a seed is selected at random, right?
I don't think so.
When seed
is set, it's used to generate other seeds:
Lines 187 to 195 in f6b8ecf
// generate seeds by seed. | |
if (GetInt(params, "seed", &seed)) { | |
Random rand(seed); | |
int int_max = std::numeric_limits<int16_t>::max(); | |
data_random_seed = static_cast<int>(rand.NextShort(0, int_max)); | |
bagging_seed = static_cast<int>(rand.NextShort(0, int_max)); | |
drop_seed = static_cast<int>(rand.NextShort(0, int_max)); | |
feature_fraction_seed = static_cast<int>(rand.NextShort(0, int_max)); | |
} |
Otherwise, default values of concrete seeds are used. For instance,
LightGBM/include/LightGBM/config.h
Line 270 in f6b8ecf
int bagging_seed = 3; |
LightGBM/include/LightGBM/config.h
Line 290 in f6b8ecf
int feature_fraction_seed = 2; |
LightGBM/include/LightGBM/config.h
Line 349 in f6b8ecf
int drop_seed = 4; |
I guess, we need a new param, let say, objective_seed
to be consistent with the existing codebase. I can't find any references where seed
is used directly.
cc @guolinke
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine doing whatever you and @guolinke decide is appropriate. As far as I understand, either:
- Reduce the highlighted code to: rand_ = new Random(config.seed); or,
- Introduce an objective_seed
are what you have suggested so far.
As an aside, note that "new Random()" does not factor config.seed in the initialization at all and sets the seed to a randomly generated number. That's what gave me the impression that if seed is unset the behavior is expected to be non-deterministic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally vote for #2
"Introduce an objective_seed". Let's wait for other opinions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I was wondering if you all have made a decision?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an objective_seed
is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbruch Many thanks for prompt changes! Only one minor remark below:
include/LightGBM/config.h
Outdated
@@ -746,6 +748,10 @@ struct Config { | |||
// desc = separate by ``,`` | |||
std::vector<double> label_gain; | |||
|
|||
// desc = random seed for objectives | |||
// desc = used only in the ``rank_xendcg`` objective | |||
int objective_seed = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it should be 5
: data_random_seed=1
, feature_fraction_seed=2
, bagging_seed=3
, drop_seed=4
, ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. Sorry about that. Fixed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM |
This Pull Request implements the recently proposed ranking loss (paper at https://arxiv.org/abs/1911.09798), a variant of the cross-entropy loss that is a convex bound on (shifted and scaled) NDCG. Models trained with this loss perform as well as or better than LambdaMART and, more importantly, exhibit higher robustness to noise in the training data.