Skip to content

Commit 52b6a66

Browse files
aryamccarthyGaelVaroquaux
authored andcommitted
Add averaging option to AMI and NMI (scikit-learn#11124)
* Add averaging option to AMI and NMI Leave current behavior unchanged * Flake8 fixes * Incorporate tests of means for AMI and NMI * Add note about `average_method` in NMI * Update docs from AMI, NMI changes (#1) * Correct the NMI and AMI descriptions in docs * Update docstrings due to averaging changes - V-measure - Homogeneity - Completeness - NMI - AMI * Update documentation and remove nose tests (#2) * Update v0.20.rst * Update test_supervised.py * Update clustering.rst * Fix multiple spaces after operator * Rename all arguments * No more arbitrary values! * Improve handling of floating-point imprecision * Clearly state when the change occurs * Update AMI/NMI docs * Update v0.20.rst * Catch FutureWarnings in AMI and NMI
1 parent 671fe79 commit 52b6a66

File tree

4 files changed

+179
-29
lines changed

4 files changed

+179
-29
lines changed

doc/modules/clustering.rst

Lines changed: 32 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1158,8 +1158,8 @@ Given the knowledge of the ground truth class assignments ``labels_true`` and
11581158
our clustering algorithm assignments of the same samples ``labels_pred``, the
11591159
**Mutual Information** is a function that measures the **agreement** of the two
11601160
assignments, ignoring permutations. Two different normalized versions of this
1161-
measure are available, **Normalized Mutual Information(NMI)** and **Adjusted
1162-
Mutual Information(AMI)**. NMI is often used in the literature while AMI was
1161+
measure are available, **Normalized Mutual Information (NMI)** and **Adjusted
1162+
Mutual Information (AMI)**. NMI is often used in the literature, while AMI was
11631163
proposed more recently and is **normalized against chance**::
11641164

11651165
>>> from sklearn import metrics
@@ -1212,17 +1212,11 @@ Advantages
12121212
for any value of ``n_clusters`` and ``n_samples`` (which is not the
12131213
case for raw Mutual Information or the V-measure for instance).
12141214

1215-
- **Bounded range [0, 1]**: Values close to zero indicate two label
1215+
- **Upper bound of 1**: Values close to zero indicate two label
12161216
assignments that are largely independent, while values close to one
1217-
indicate significant agreement. Further, values of exactly 0 indicate
1218-
**purely** independent label assignments and a AMI of exactly 1 indicates
1217+
indicate significant agreement. Further, an AMI of exactly 1 indicates
12191218
that the two label assignments are equal (with or without permutation).
12201219

1221-
- **No assumption is made on the cluster structure**: can be used
1222-
to compare clustering algorithms such as k-means which assumes isotropic
1223-
blob shapes with results of spectral clustering algorithms which can
1224-
find cluster with "folded" shapes.
1225-
12261220

12271221
Drawbacks
12281222
~~~~~~~~~
@@ -1274,15 +1268,15 @@ It also can be expressed in set cardinality formulation:
12741268

12751269
The normalized mutual information is defined as
12761270

1277-
.. math:: \text{NMI}(U, V) = \frac{\text{MI}(U, V)}{\sqrt{H(U)H(V)}}
1271+
.. math:: \text{NMI}(U, V) = \frac{\text{MI}(U, V)}{\text{mean}(H(U), H(V))}
12781272

12791273
This value of the mutual information and also the normalized variant is not
12801274
adjusted for chance and will tend to increase as the number of different labels
12811275
(clusters) increases, regardless of the actual amount of "mutual information"
12821276
between the label assignments.
12831277

12841278
The expected value for the mutual information can be calculated using the
1285-
following equation, from Vinh, Epps, and Bailey, (2009). In this equation,
1279+
following equation [VEB2009]_. In this equation,
12861280
:math:`a_i = |U_i|` (the number of elements in :math:`U_i`) and
12871281
:math:`b_j = |V_j|` (the number of elements in :math:`V_j`).
12881282

@@ -1295,7 +1289,19 @@ following equation, from Vinh, Epps, and Bailey, (2009). In this equation,
12951289
Using the expected value, the adjusted mutual information can then be
12961290
calculated using a similar form to that of the adjusted Rand index:
12971291

1298-
.. math:: \text{AMI} = \frac{\text{MI} - E[\text{MI}]}{\max(H(U), H(V)) - E[\text{MI}]}
1292+
.. math:: \text{AMI} = \frac{\text{MI} - E[\text{MI}]}{\text{mean}(H(U), H(V)) - E[\text{MI}]}
1293+
1294+
For normalized mutual information and adjusted mutual information, the normalizing
1295+
value is typically some *generalized* mean of the entropies of each clustering.
1296+
Various generalized means exist, and no firm rules exist for preferring one over the
1297+
others. The decision is largely a field-by-field basis; for instance, in community
1298+
detection, the arithmetic mean is most common. Each
1299+
normalizing method provides "qualitatively similar behaviours" [YAT2016]_. In our
1300+
implementation, this is controlled by the ``average_method`` parameter.
1301+
1302+
Vinh et al. (2010) named variants of NMI and AMI by their averaging method [VEB2010]_. Their
1303+
'sqrt' and 'sum' averages are the geometric and arithmetic means; we use these
1304+
more broadly common names.
12991305

13001306
.. topic:: References
13011307

@@ -1304,22 +1310,29 @@ calculated using a similar form to that of the adjusted Rand index:
13041310
Machine Learning Research 3: 583–617.
13051311
`doi:10.1162/153244303321897735 <http://strehl.com/download/strehl-jmlr02.pdf>`_.
13061312

1307-
* Vinh, Epps, and Bailey, (2009). "Information theoretic measures
1313+
* [VEB2009] Vinh, Epps, and Bailey, (2009). "Information theoretic measures
13081314
for clusterings comparison". Proceedings of the 26th Annual International
13091315
Conference on Machine Learning - ICML '09.
13101316
`doi:10.1145/1553374.1553511 <https://dl.acm.org/citation.cfm?doid=1553374.1553511>`_.
13111317
ISBN 9781605585161.
13121318

1313-
* Vinh, Epps, and Bailey, (2010). Information Theoretic Measures for
1319+
* [VEB2010] Vinh, Epps, and Bailey, (2010). "Information Theoretic Measures for
13141320
Clusterings Comparison: Variants, Properties, Normalization and
1315-
Correction for Chance, JMLR
1316-
http://jmlr.csail.mit.edu/papers/volume11/vinh10a/vinh10a.pdf
1321+
Correction for Chance". JMLR
1322+
<http://jmlr.csail.mit.edu/papers/volume11/vinh10a/vinh10a.pdf>
13171323

13181324
* `Wikipedia entry for the (normalized) Mutual Information
13191325
<https://en.wikipedia.org/wiki/Mutual_Information>`_
13201326

13211327
* `Wikipedia entry for the Adjusted Mutual Information
13221328
<https://en.wikipedia.org/wiki/Adjusted_Mutual_Information>`_
1329+
1330+
* [YAT2016] Yang, Algesheimer, and Tessone, (2016). "A comparative analysis of
1331+
community
1332+
detection algorithms on artificial networks". Scientific Reports 6: 30750.
1333+
`doi:10.1038/srep30750 <https://www.nature.com/articles/srep30750>`_.
1334+
1335+
13231336

13241337
.. _homogeneity_completeness:
13251338

@@ -1359,7 +1372,7 @@ Their harmonic mean called **V-measure** is computed by
13591372
0.51...
13601373

13611374
The V-measure is actually equivalent to the mutual information (NMI)
1362-
discussed above normalized by the sum of the label entropies [B2011]_.
1375+
discussed above, with the aggregation function being the arithmetic mean [B2011]_.
13631376

13641377
Homogeneity, completeness and V-measure can be computed at once using
13651378
:func:`homogeneity_completeness_v_measure` as follows::
@@ -1534,7 +1547,7 @@ Advantages
15341547
for any value of ``n_clusters`` and ``n_samples`` (which is not the
15351548
case for raw Mutual Information or the V-measure for instance).
15361549

1537-
- **Bounded range [0, 1]**: Values close to zero indicate two label
1550+
- **Upper-bounded at 1**: Values close to zero indicate two label
15381551
assignments that are largely independent, while values close to one
15391552
indicate significant agreement. Further, values of exactly 0 indicate
15401553
**purely** independent label assignments and a AMI of exactly 1 indicates

doc/whats_new/v0.20.rst

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -206,6 +206,12 @@ Metrics
206206
:func:`metrics.roc_auc_score`. :issue:`3273` by
207207
:user:`Alexander Niederbühl <Alexander-N>`.
208208

209+
- Added control over the normalization in
210+
:func:`metrics.normalized_mutual_information_score` and
211+
:func:`metrics.adjusted_mutual_information_score` via the ``average_method``
212+
parameter. In version 0.22, the default normalizer for each will become
213+
the *arithmetic* mean of the entropies of each clustering. :issue:`11124` by
214+
:user:`Arya McCarthy <aryamccarthy>`.
209215
- Added ``output_dict`` parameter in :func:`metrics.classification_report`
210216
to return classification statistics as dictionary.
211217
:issue:`11160` by :user:`Dan Barkhorn <danielbarkhorn>`.
@@ -792,6 +798,17 @@ Metrics
792798
due to floating point error in the input.
793799
:issue:`9851` by :user:`Hanmin Qin <qinhanmin2014>`.
794800

801+
- In :func:`metrics.normalized_mutual_information_score` and
802+
:func:`metrics.adjusted_mutual_information_score`,
803+
warn that ``average_method``
804+
will have a new default value. In version 0.22, the default normalizer for each
805+
will become the *arithmetic* mean of the entropies of each clustering. Currently,
806+
:func:`metrics.normalized_mutual_information_score` uses the default of
807+
``average_method='geometric'``, and :func:`metrics.adjusted_mutual_information_score`
808+
uses the default of ``average_method='max'`` to match their behaviors in
809+
version 0.19.
810+
:issue:`11124` by :user:`Arya McCarthy <aryamccarthy>`.
811+
795812
- The ``batch_size`` parameter to :func:`metrics.pairwise_distances_argmin_min`
796813
and :func:`metrics.pairwise_distances_argmin` is deprecated to be removed in
797814
v0.22. It no longer has any effect, as batch size is determined by global

sklearn/metrics/cluster/supervised.py

Lines changed: 76 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,13 @@
1111
# Thierry Guillemot <[email protected]>
1212
# Gregory Stupp <[email protected]>
1313
# Joel Nothman <[email protected]>
14+
# Arya McCarthy <[email protected]>
1415
# License: BSD 3 clause
1516

1617
from __future__ import division
1718

1819
from math import log
20+
import warnings
1921

2022
import numpy as np
2123
from scipy import sparse as sp
@@ -59,6 +61,21 @@ def check_clusterings(labels_true, labels_pred):
5961
return labels_true, labels_pred
6062

6163

64+
def _generalized_average(U, V, average_method):
65+
"""Return a particular mean of two numbers."""
66+
if average_method == "min":
67+
return min(U, V)
68+
elif average_method == "geometric":
69+
return np.sqrt(U * V)
70+
elif average_method == "arithmetic":
71+
return np.mean([U, V])
72+
elif average_method == "max":
73+
return max(U, V)
74+
else:
75+
raise ValueError("'average_method' must be 'min', 'geometric', "
76+
"'arithmetic', or 'max'")
77+
78+
6279
def contingency_matrix(labels_true, labels_pred, eps=None, sparse=False):
6380
"""Build a contingency matrix describing the relationship between labels.
6481
@@ -245,7 +262,9 @@ def homogeneity_completeness_v_measure(labels_true, labels_pred):
245262
246263
V-Measure is furthermore symmetric: swapping ``labels_true`` and
247264
``label_pred`` will give the same score. This does not hold for
248-
homogeneity and completeness.
265+
homogeneity and completeness. V-Measure is identical to
266+
:func:`normalized_mutual_info_score` with the arithmetic averaging
267+
method.
249268
250269
Read more in the :ref:`User Guide <homogeneity_completeness>`.
251270
@@ -444,7 +463,8 @@ def completeness_score(labels_true, labels_pred):
444463
def v_measure_score(labels_true, labels_pred):
445464
"""V-measure cluster labeling given a ground truth.
446465
447-
This score is identical to :func:`normalized_mutual_info_score`.
466+
This score is identical to :func:`normalized_mutual_info_score` with
467+
the ``'arithmetic'`` option for averaging.
448468
449469
The V-measure is the harmonic mean between homogeneity and completeness::
450470
@@ -459,6 +479,7 @@ def v_measure_score(labels_true, labels_pred):
459479
measure the agreement of two independent label assignments strategies
460480
on the same dataset when the real ground truth is not known.
461481
482+
462483
Read more in the :ref:`User Guide <homogeneity_completeness>`.
463484
464485
Parameters
@@ -485,6 +506,7 @@ def v_measure_score(labels_true, labels_pred):
485506
--------
486507
homogeneity_score
487508
completeness_score
509+
normalized_mutual_info_score
488510
489511
Examples
490512
--------
@@ -617,7 +639,8 @@ def mutual_info_score(labels_true, labels_pred, contingency=None):
617639
return mi.sum()
618640

619641

620-
def adjusted_mutual_info_score(labels_true, labels_pred):
642+
def adjusted_mutual_info_score(labels_true, labels_pred,
643+
average_method='warn'):
621644
"""Adjusted Mutual Information between two clusterings.
622645
623646
Adjusted Mutual Information (AMI) is an adjustment of the Mutual
@@ -626,7 +649,7 @@ def adjusted_mutual_info_score(labels_true, labels_pred):
626649
clusters, regardless of whether there is actually more information shared.
627650
For two clusterings :math:`U` and :math:`V`, the AMI is given as::
628651
629-
AMI(U, V) = [MI(U, V) - E(MI(U, V))] / [max(H(U), H(V)) - E(MI(U, V))]
652+
AMI(U, V) = [MI(U, V) - E(MI(U, V))] / [avg(H(U), H(V)) - E(MI(U, V))]
630653
631654
This metric is independent of the absolute values of the labels:
632655
a permutation of the class or cluster label values won't change the
@@ -650,9 +673,17 @@ def adjusted_mutual_info_score(labels_true, labels_pred):
650673
labels_pred : array, shape = [n_samples]
651674
A clustering of the data into disjoint subsets.
652675
676+
average_method : string, optional (default: 'warn')
677+
How to compute the normalizer in the denominator. Possible options
678+
are 'min', 'geometric', 'arithmetic', and 'max'.
679+
If 'warn', 'max' will be used. The default will change to
680+
'arithmetic' in version 0.22.
681+
682+
.. versionadded:: 0.20
683+
653684
Returns
654685
-------
655-
ami: float(upperlimited by 1.0)
686+
ami: float (upperlimited by 1.0)
656687
The AMI returns a value of 1 when the two partitions are identical
657688
(ie perfectly matched). Random partitions (independent labellings) have
658689
an expected AMI around 0 on average hence can be negative.
@@ -691,6 +722,12 @@ def adjusted_mutual_info_score(labels_true, labels_pred):
691722
<https://en.wikipedia.org/wiki/Adjusted_Mutual_Information>`_
692723
693724
"""
725+
if average_method == 'warn':
726+
warnings.warn("The behavior of AMI will change in version 0.22. "
727+
"To match the behavior of 'v_measure_score', AMI will "
728+
"use average_method='arithmetic' by default.",
729+
FutureWarning)
730+
average_method = 'max'
694731
labels_true, labels_pred = check_clusterings(labels_true, labels_pred)
695732
n_samples = labels_true.shape[0]
696733
classes = np.unique(labels_true)
@@ -709,17 +746,29 @@ def adjusted_mutual_info_score(labels_true, labels_pred):
709746
emi = expected_mutual_information(contingency, n_samples)
710747
# Calculate entropy for each labeling
711748
h_true, h_pred = entropy(labels_true), entropy(labels_pred)
712-
ami = (mi - emi) / (max(h_true, h_pred) - emi)
749+
normalizer = _generalized_average(h_true, h_pred, average_method)
750+
denominator = normalizer - emi
751+
# Avoid 0.0 / 0.0 when expectation equals maximum, i.e a perfect match.
752+
# normalizer should always be >= emi, but because of floating-point
753+
# representation, sometimes emi is slightly larger. Correct this
754+
# by preserving the sign.
755+
if denominator < 0:
756+
denominator = min(denominator, -np.finfo('float64').eps)
757+
else:
758+
denominator = max(denominator, np.finfo('float64').eps)
759+
ami = (mi - emi) / denominator
713760
return ami
714761

715762

716-
def normalized_mutual_info_score(labels_true, labels_pred):
763+
def normalized_mutual_info_score(labels_true, labels_pred,
764+
average_method='warn'):
717765
"""Normalized Mutual Information between two clusterings.
718766
719767
Normalized Mutual Information (NMI) is an normalization of the Mutual
720768
Information (MI) score to scale the results between 0 (no mutual
721769
information) and 1 (perfect correlation). In this function, mutual
722-
information is normalized by ``sqrt(H(labels_true) * H(labels_pred))``.
770+
information is normalized by some generalized mean of ``H(labels_true)``
771+
and ``H(labels_pred))``, defined by the `average_method`.
723772
724773
This measure is not adjusted for chance. Therefore
725774
:func:`adjusted_mustual_info_score` might be preferred.
@@ -743,13 +792,22 @@ def normalized_mutual_info_score(labels_true, labels_pred):
743792
labels_pred : array, shape = [n_samples]
744793
A clustering of the data into disjoint subsets.
745794
795+
average_method : string, optional (default: 'warn')
796+
How to compute the normalizer in the denominator. Possible options
797+
are 'min', 'geometric', 'arithmetic', and 'max'.
798+
If 'warn', 'geometric' will be used. The default will change to
799+
'arithmetic' in version 0.22.
800+
801+
.. versionadded:: 0.20
802+
746803
Returns
747804
-------
748805
nmi : float
749806
score between 0.0 and 1.0. 1.0 stands for perfectly complete labeling
750807
751808
See also
752809
--------
810+
v_measure_score: V-Measure (NMI with arithmetic mean option.)
753811
adjusted_rand_score: Adjusted Rand Index
754812
adjusted_mutual_info_score: Adjusted Mutual Information (adjusted
755813
against chance)
@@ -773,6 +831,12 @@ def normalized_mutual_info_score(labels_true, labels_pred):
773831
0.0
774832
775833
"""
834+
if average_method == 'warn':
835+
warnings.warn("The behavior of NMI will change in version 0.22. "
836+
"To match the behavior of 'v_measure_score', NMI will "
837+
"use average_method='arithmetic' by default.",
838+
FutureWarning)
839+
average_method = 'geometric'
776840
labels_true, labels_pred = check_clusterings(labels_true, labels_pred)
777841
classes = np.unique(labels_true)
778842
clusters = np.unique(labels_pred)
@@ -789,7 +853,10 @@ def normalized_mutual_info_score(labels_true, labels_pred):
789853
# Calculate the expected value for the mutual information
790854
# Calculate entropy for each labeling
791855
h_true, h_pred = entropy(labels_true), entropy(labels_pred)
792-
nmi = mi / max(np.sqrt(h_true * h_pred), 1e-10)
856+
normalizer = _generalized_average(h_true, h_pred, average_method)
857+
# Avoid 0.0 / 0.0 when either entropy is zero.
858+
normalizer = max(normalizer, np.finfo('float64').eps)
859+
nmi = mi / normalizer
793860
return nmi
794861

795862

0 commit comments

Comments
 (0)