Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Squared log error objective function produces NaN values during training #11210

Open
heptaflar opened this issue Feb 6, 2025 · 1 comment
Open

Comments

@heptaflar
Copy link

heptaflar commented Feb 6, 2025

The squared log error objective function produces NaN values during training, if the predicted value x is less than or equal to -1.0, as you correctly write in your documentation. However, this is not a mathematical necessity but a consequence of your implementation. You may exploit the identity

log(1 + x) == 0.5 * log((1 + x) * (1 + x))

or as you may prefer

log1p(x) == 0.5 * log1p(x * (2.0 + x))

which yields NaN if and only if x == -1.0 but a meaningful value otherwise. The gradient and the Hessian (and hence the training) become stable for values x < -1.0, too. I included my custom implementation of the squared log error objective (and its associated metric) in the code below. It works well in my case and I would like to share it with you. My implementation also uses the identity

log(a) - log(b) == log(a / b)

But this is not essential. Take it or leave it. Best wishes, Ralf.

"""
This module defines custom objectives.
"""

from abc import ABC
from abc import abstractmethod

import numpy as np
import xgboost as xgb


class Objective(ABC):
    """
    The interface for a custom objective and its associated metric.
    """

    @abstractmethod
    def gradient(self, pred: np.ndarray, data: xgb.DMatrix) -> np.ndarray:
        """
        Returns the gradient of the objective.

        :param pred: The predicted values.
        :param data: The predictor values.
        :return: The gradient.
        """

    @abstractmethod
    def hessian(self, pred: np.ndarray, data: xgb.DMatrix) -> np.ndarray:
        """
        Returns the Hessian of the objective.

        :param pred: The predicted values.
        :param data: The predictor values.
        :return: The Hessian.
        """

    @abstractmethod
    def metric(
        self, pred: np.ndarray, data: xgb.DMatrix
    ) -> tuple[str, float]:
        """
        Returns the metric associated with the objective.

        :param pred: The predicted values.
        :param data: The predictor values.
        :return: The name and the value of the metric.
        """

    def obj(
        self, pred: np.ndarray, data: xgb.DMatrix
    ) -> tuple[np.ndarray, np.ndarray]:
        """
        The objective function.

        :param pred: The predicted values.
        :param data: The predictor values.
        :return: The gradient and the Hessian of the objective.
        """
        return self.gradient(pred, data), self.hessian(pred, data)


def le(x: np.ndarray, y: np.ndarray) -> np.ndarray:
    """Returns the logarithmic error terms."""
    return 0.5 * np.log(np.square((1.0 + x) / (1.0 + y)))


def rms(e: np.ndarray, w: np.ndarray) -> np.ndarray:
    """Returns the root (weighted) mean squared error."""
    return np.sqrt(
        np.average(np.square(e), weights=w if w.shape == e.shape else None)
    )


class SLE(Objective):
    """
    The squared logarithmic error objective.

    This objective shall replace the internal XGB squared logarithmic
    error objective.
    """

    def gradient(self, pred: np.ndarray, data: xgb.DMatrix) -> np.ndarray:
        return le(pred, data.get_label()) / (1.0 + pred)

    def hessian(self, pred: np.ndarray, data: xgb.DMatrix) -> np.ndarray:
        return (1.0 - le(pred, data.get_label())) / np.square(1.0 + pred)

    def metric(
        self, pred: np.ndarray, data: xgb.DMatrix
    ) -> tuple[str, float]:
        return (
            "rmsle",
            rms(le(pred, data.get_label()), data.get_weight()).item(),
        )
@trivialfis
Copy link
Member

Thank you for sharing, will look into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants