Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple quantile regression with preserving monotonicity #5727

Open
RektPunk opened this issue Feb 17, 2023 · 13 comments
Open

Multiple quantile regression with preserving monotonicity #5727

RektPunk opened this issue Feb 17, 2023 · 13 comments
Labels

Comments

@RektPunk
Copy link
Contributor

RektPunk commented Feb 17, 2023

Summary

Hello, while using LightGBM to estimate multiple quantiles, I have encountered several issues where the monotonicity between quantiles is not guaranteed just as #3447, #4201, and #5296. To address this, I have implemented a custom loss and monotonic constraints to ensure that the quantiles satisfy a non-crossing condition inspired by references below. I created this issue (even though this may not be appropriate to the issues tab ;) ) because I wanted to share my solution.

Description

Basically, I used the features provided by LightGBM.
Speaking of the concept, an input train data (including explanation and response variables) is duplicated as much as the number of input alphas (which are quantiles), and each alpha is put into a column as Cannon's research. Next, calculate the gradient of the composite quantile loss for the new data like References, set hess = 1, and do first order approximation. Finally, if we put increasing monotone constraints as $1$ in the alpha column, the monotone constraints for alpha with any new data is preserved even though alphas are differ from train alphas.

from typing import List, Union, Dict, Any, Tuple
from functools import partial
from itertools import repeat, chain

import numpy as np
import lightgbm as lgb
import pandas as pd


def _grad_rho(u, alpha) -> np.ndarray:
    return -(alpha - (u < 0).astype(float))


def check_loss_grad_hess(
    y_pred: np.ndarray, dtrain: lgb.basic.Dataset, alphas: List[float]
) -> Tuple[np.ndarray, np.ndarray]:
    _len_alpha = len(alphas)
    _y_train = dtrain.get_label()
    _y_pred_reshaped = y_pred.reshape(_len_alpha, -1)
    _y_train_reshaped = _y_train.reshape(_len_alpha, -1)

    grads = []
    for alpha_inx in range(_len_alpha):
        _err_for_alpha = _y_train_reshaped[alpha_inx] - _y_pred_reshaped[alpha_inx]
        grad = _grad_rho(_err_for_alpha, alphas[alpha_inx])
        grads.append(grad)

    grad = np.concatenate(grads)
    hess = np.ones(_y_train.shape)

    return grad, hess


def _alpha_validate(
    alphas: Union[List[float], float],
) -> List[float]:
    if isinstance(alphas, float):
        alphas = [alphas]
    return alphas


def _prepare_x(
    x: Union[pd.DataFrame, pd.Series, np.ndarray],
    alphas: List[float],
) -> pd.DataFrame:
    if isinstance(x, np.ndarray) or isinstance(x, pd.Series):
        x = pd.DataFrame(x)
    assert "_tau" not in x.columns, "Column name '_tau' is not allowed."
    _alpha_repeat_count_list = [list(repeat(alpha, len(x))) for alpha in alphas]
    _alpha_repeat_list = list(chain.from_iterable(_alpha_repeat_count_list))
    _repeated_x = pd.concat([x] * len(alphas), axis=0)

    _repeated_x = _repeated_x.assign(
        _tau=_alpha_repeat_list,
    )
    return _repeated_x


def _prepare_train(
    x: Union[pd.DataFrame, pd.Series, np.ndarray],
    y: Union[pd.Series, np.ndarray],
    alphas: List[float],
) -> Dict[str, Union[pd.DataFrame, np.ndarray]]:
    _train_df = _prepare_x(x, alphas)
    _repeated_y = np.concatenate(list(repeat(y, len(alphas))))
    return (_train_df, _repeated_y)


class MonotonicQuantileRegressor:
    def __init__(
        self,
        x: Union[pd.DataFrame, pd.Series, np.ndarray],
        y: Union[pd.Series, np.ndarray],
        alphas: Union[List[float], float],
    ):
        alphas = _alpha_validate(alphas)
        self.x_train, self.y_train = _prepare_train(x, y, alphas)
        self.dataset = lgb.Dataset(data=self.x_train, label=self.y_train)
        self.fobj = partial(check_loss_grad_hess, alphas=alphas)

    def train(self, params: Dict[str, Any]) -> lgb.basic.Booster:
        self._params = params.copy()
        if "monotone_constraints" in self._params:
            self._params["monotone_constraints"].append(1)
        else:
            self._params.update(
                {
                    "monotone_constraints": [
                        1 if "_tau" == col else 0 for col in self.x_train.columns
                    ]
                }
            )
        self.model = lgb.train(
            train_set=self.dataset,
            verbose_eval=False,
            params=self._params,
            fobj=self.fobj,
            feval=self.feval,
        )
        return self.model

    def predict(
        self,
        x: Union[pd.DataFrame, pd.Series, np.ndarray],
        alphas: Union[List[float], float],
    ) -> np.ndarray:
        alphas = _alpha_validate(alphas)
        _x = _prepare_x(x, alphas)
        _pred = self.model.predict(_x)
        _pred = _pred.reshape(len(alphas), len(x))
        return _pred

I created a simple example to test the non-crossing condition.

sample_size = 500
params = {
    "max_depth": 4,
    "num_leaves": 15,
    "learning_rate": 0.1,
    "n_estimators": 100,
    "boosting_type": "gbdt",
}
alphas = [0.3, 0.4, 0.5, 0.6, 0.7]
x = np.linspace(-10, 10, sample_size)
x_test = np.linspace(-10, 10, sample_size)
y_noise = np.sin(x) + np.random.uniform(-0.4, 0.4, sample_size)
y_test = np.sin(x_test) + np.random.uniform(-0.4, 0.4, sample_size)

monotonic_quantile_regressor = MonotonicQuantileRegressor(x=x, y=y_noise, alphas=alphas)
model = monotonic_quantile_regressor.train(params=params)
preds = monotonic_quantile_regressor.predict(x=x_test, alphas=alphas)
preds_df = pd.DataFrame(preds).T
(preds_df.diff(axis = 1) < 0).sum(axis = 1).sum(axis = 0)

image

When visualized, the result was as shown in the figure below.
image

I am very curious about how other guys have approached this problem and would love to hear any new ideas, insights, or feedbacks.

References

@chulhongsung
Copy link

Awesome!

@juandados
Copy link

Thanks for sharing this here @RektPunk. Could you please elaborate a little more about:

"..., set hess = 1, and do first order approximation".

In particular, I am interested on understanding two things:

  • The pinball loss is picewise linear respect the model output, so the second derivative is zero, then the hessian is null. Why to set it as one?
  • Where is the first order approximation used here?

@RektPunk
Copy link
Contributor Author

RektPunk commented Dec 1, 2023

Hey, @juandados. Thanks for asking :)
As you said, hessian should be null, but for computation, hess is set as 1. Also, the first order approximation (cause only use grad) is used for estimating params in lgb like https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.train.html.

@StatMixedML
Copy link
Contributor

StatMixedML commented Jan 3, 2024

Very interesting discussions.

Even though there is no quantile regression available yet, you might find my repo LightGBMLSS that models and predicts the full conditional distribution of a univariate target as a function of covariates interesting. Choosing from a wide range of continuous, discrete, and mixed discrete-continuous distributions, modelling and predicting the entire conditional distribution greatly enhances the flexibility of LightGBM, as it allows to create probabilistic outputs from which prediction intervals and quantiles of interest can be derived. Since the distribution function is strictly monotonically increasing, quantile crossing cannot happen.

Currently, I am also working on a quantile regression extension, with strictly non-crossing quantiles. I'll post again once it is available.

@mat-ej
Copy link

mat-ej commented Feb 3, 2024

@StatMixedML

Have you managed to add non crossing quantile reg into your package ?

@StatMixedML
Copy link
Contributor

@mat-ej Sorry for the late reply.

Yes I have a working version of non-crossing quantiles, where all quantiles are jointly estimated in one model. I haven't included it yet as a release in the package since I need to evaluate it in more detail.

@StatMixedML
Copy link
Contributor

StatMixedML commented Feb 21, 2024

@RektPunk The approach of Cannon (2018) is extremely nice and using the monotonicity constraint on the quantile-column "guarantees" they are non-crossing. Extremely happy you brought this to our attention!

I have some questions to the community, especially when it comes to the creation of the dataset and its implications on estimation:

  • The original dataset is replicated/stacked across different quantile levels, with the only difference being the quantile column. Is there any experience on how this affects training? Reason I am asking is because there is redundancy in the dataset when it is stacked multiple times, as each row from the original dataset is repeated for each quantile level.
  • @RektPunk In the paper, there are two tau columns: first and last. In your implementation there is only one column which is used for the monotonicity constraint. Any reason why not using the first as well?

Many thanks for the great suggestions!

@RektPunk
Copy link
Contributor Author

Hey @StatMixedML, thxs for your comment. The first comment is exactly what i concerned. I also wonder is there any way to remove redundancy :).
For second comment, I don’t think there are two columns in Canon’s paper. Even so, one tau column is enough to guarantee non-crossing property.

@StatMixedML
Copy link
Contributor

StatMixedML commented Feb 27, 2024

I used the above implementation and compared it with the base quantile regression available in LightGBM. Both models have the same set of hyper-parameters and number of iterations, and the dashed lines show the actual quantiles

The constrained estimation does not properly capture the shape of the quantiles, especially for the higher and lower quantiles. This might be an effect of the monotonicity constraint or the lack of proper HPO. Any comments on this?

@RektPunk
Copy link
Contributor Author

Hey, thanks for conducting experiment. I think hyper-parameters should be different cause constraint model is much complex and also have constraint as you mentioned.

@StatMixedML
Copy link
Contributor

The thing is that due to the multiple stacking of the data, the approach is infeasible if the original data is already big...

@RektPunk
Copy link
Contributor Author

RektPunk commented Mar 1, 2024

Yeah, I totally agree with you. For that case, other trick should be considered.

@RektPunk
Copy link
Contributor Author

RektPunk commented Jun 28, 2024

LightGBM version has changed and some details have been modified. The latest code can be found here. I also added code for xgboost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants