-
Notifications
You must be signed in to change notification settings - Fork 0
Machine Learning & Data Science Statistics Density Ratio Estimation
Tags: #Statistics #MachineLearning #UnsupervisedLearning
Density Ratio Estimation is a statistical method used to estimate the ratio of two probability density functions,
This is a powerful technique in unsupervised machine learning because it allows us to compare and relate two distributions without ever needing to explicitly model or estimate the individual densities
The fundamental insight, often called the "density ratio trick," is that many important statistical divergences and measures can be calculated or optimized by using the density ratio, bypassing the need for direct density estimation.
Instead of a two-step process (1. estimate
Density ratio estimation is deeply connected to the Kullback-Leibler (KL) Divergence. The KL divergence from
This equation shows that if we can accurately estimate the density ratio
Several algorithms have been developed to perform density ratio estimation, including:
- KLIEP (Kullback-Leibler Importance Estimation Procedure): This method directly minimizes the KL divergence to find the optimal density ratio.
-
Logistic Regression: A simple probabilistic logistic regression classifier can be trained to distinguish between samples from
$p(x)$ (labeled as 1) and$q(x)$ (labeled as 0). The learned odds from this classifier are directly related to the density ratio. - uLSIF (unconstrained Least-Squares Importance Fitting): This method uses a squared loss objective to match the density ratio, which often has a convenient closed-form solution.
Density ratio estimation is a versatile tool with many applications:
- Covariate Shift Adaptation: When the training data distribution is different from the test data distribution, density ratios can be used as importance weights to correct the model's learning process.
-
Anomaly/Outlier Detection: If
$p(x)$ is the distribution of normal data and$q(x)$ is a new data point, a very high or low density ratio can indicate that the new point is an outlier. - Mutual Information Estimation: The mutual information between two variables can be expressed in terms of the ratio between their joint density and the product of their marginal densities.
- Two-Sample Tests: It can be used to test the hypothesis of whether two sets of samples are drawn from the same distribution.