Skip to content

Commit a2b7c83

Browse files
committed
Merge branch 'BoyanH-20-pandas-support' into dev
2 parents fab3e96 + 143067c commit a2b7c83

22 files changed

+541
-194
lines changed

README.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -100,12 +100,11 @@ import numpy as np
100100
X = np.random.choice(np.linspace(0, 20, 10000), size=200, replace=False).reshape(-1, 1)
101101
y = np.sin(X) + np.random.normal(scale=0.3, size=X.shape)
102102
```
103-
For active learning, we shall define a custom query strategy tailored to Gaussian processes. In a nutshell, a *query stategy* in modAL is a function taking (at least) two arguments (an estimator object and a pool of examples), outputting the index of the queried instance and the instance itself. In our case, the arguments are ```regressor``` and ```X```.
103+
For active learning, we shall define a custom query strategy tailored to Gaussian processes. In a nutshell, a *query stategy* in modAL is a function taking (at least) two arguments (an estimator object and a pool of examples), outputting the index of the queried instance. In our case, the arguments are ```regressor``` and ```X```.
104104
```python
105105
def GP_regression_std(regressor, X):
106106
_, std = regressor.predict(X, return_std=True)
107-
query_idx = np.argmax(std)
108-
return query_idx, X[query_idx]
107+
return np.argmax(std)
109108
```
110109
After setting up the query strategy and the data, the active learner can be initialized.
111110
```python

docs/source/content/examples/active_regression.ipynb

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@
7070
"metadata": {},
7171
"source": [
7272
"## Uncertainty measure and query strategy for Gaussian processes\n",
73-
"For active learning, we shall define a custom query strategy tailored to Gaussian processes. More information on how to write your custom query strategies can be found at the page [Extending modAL](https://cosmic-cortex.github.io/modAL/Extending-modAL). In a nutshell, a *query stategy* in modAL is a function taking (at least) two arguments (an estimator object and a pool of examples), outputting the index of the queried instance and the instance itself. In our case, the arguments are ```regressor``` and ```X```."
73+
"For active learning, we shall define a custom query strategy tailored to Gaussian processes. More information on how to write your custom query strategies can be found at the page [Extending modAL](https://cosmic-cortex.github.io/modAL/Extending-modAL). In a nutshell, a *query stategy* in modAL is a function taking (at least) two arguments (an estimator object and a pool of examples), outputting the index of the queried instance. In our case, the arguments are ```regressor``` and ```X```."
7474
]
7575
},
7676
{
@@ -81,8 +81,7 @@
8181
"source": [
8282
"def GP_regression_std(regressor, X):\n",
8383
" _, std = regressor.predict(X, return_std=True)\n",
84-
" query_idx = np.argmax(std)\n",
85-
" return query_idx, X[query_idx]"
84+
" return np.argmax(std)"
8685
]
8786
},
8887
{
@@ -234,4 +233,4 @@
234233
},
235234
"nbformat": 4,
236235
"nbformat_minor": 2
237-
}
236+
}

docs/source/content/overview/Extending-modAL.ipynb

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,8 @@
2727
" # measure the utility of each instance in the pool\n",
2828
" utility = utility_measure(classifier, X)\n",
2929
"\n",
30-
" # select the indices of the instances to be queried\n",
31-
" query_idx = select_instances(utility)\n",
32-
"\n",
33-
" # return the indices and the instances\n",
34-
" return query_idx, X[query_idx]"
30+
" # select and return the indices of the instances to be queried\n",
31+
" return select_instances(utility)"
3532
]
3633
},
3734
{
@@ -213,8 +210,7 @@
213210
"# classifier uncertainty and classifier margin\n",
214211
"def custom_query_strategy(classifier, X, n_instances=1):\n",
215212
" utility = linear_combination(classifier, X)\n",
216-
" query_idx = multi_argmax(utility, n_instances=n_instances)\n",
217-
" return query_idx, X[query_idx]\n",
213+
" return multi_argmax(utility, n_instances=n_instances)\n",
218214
"\n",
219215
"custom_query_learner = ActiveLearner(\n",
220216
" estimator=GaussianProcessClassifier(1.0 * RBF(1.0)),\n",
@@ -299,4 +295,4 @@
299295
},
300296
"nbformat": 4,
301297
"nbformat_minor": 2
302-
}
298+
}

docs/source/content/overview/modAL-in-a-nutshell.rst

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -118,15 +118,13 @@ the *noisy sine* function:
118118
For active learning, we shall define a custom query strategy tailored to
119119
Gaussian processes. In a nutshell, a *query stategy* in modAL is a
120120
function taking (at least) two arguments (an estimator object and a pool
121-
of examples), outputting the index of the queried instance and the
122-
instance itself. In our case, the arguments are ``regressor`` and ``X``.
121+
of examples), outputting the index of the queried instance. In our case, the arguments are ``regressor`` and ``X``.
123122

124123
.. code:: python
125124
126125
def GP_regression_std(regressor, X):
127126
_, std = regressor.predict(X, return_std=True)
128-
query_idx = np.argmax(std)
129-
return query_idx, X[query_idx]
127+
return np.argmax(std)
130128
131129
After setting up the query strategy and the data, the active learner can
132130
be initialized.

examples/active_regression.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,7 @@
1212
# query strategy for regression
1313
def GP_regression_std(regressor, X):
1414
_, std = regressor.predict(X, return_std=True)
15-
query_idx = np.argmax(std)
16-
return query_idx, X[query_idx]
15+
return np.argmax(std)
1716

1817

1918
# generating the data

examples/custom_query_strategies.py

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,18 +5,16 @@
55
66
The first two arguments of a query strategy function is always the estimator and the pool
77
of instances to be queried from. Additional arguments are accepted as keyword arguments.
8-
A valid query strategy function always returns a tuple of the indices of the queried
9-
instances and the instances themselves.
8+
A valid query strategy function always returns indices of the queried
9+
instances.
1010
1111
def custom_query_strategy(classifier, X, a_keyword_argument=42):
1212
# measure the utility of each instance in the pool
1313
utility = utility_measure(classifier, X)
1414
15-
# select the indices of the instances to be queried
16-
query_idx = select_instances(utility)
15+
# select and return the indices of the instances to be queried
16+
return select_instances(utility)
1717
18-
# return the indices and the instances
19-
return query_idx, X[query_idx]
2018
2119
This function can be used in the active learning workflow.
2220
@@ -97,8 +95,7 @@ def custom_query_strategy(classifier, X, a_keyword_argument=42):
9795
# classifier uncertainty and classifier margin
9896
def custom_query_strategy(classifier, X, n_instances=1):
9997
utility = linear_combination(classifier, X)
100-
query_idx = multi_argmax(utility, n_instances=n_instances)
101-
return query_idx, X[query_idx]
98+
return multi_argmax(utility, n_instances=n_instances)
10299

103100
custom_query_learner = ActiveLearner(
104101
estimator=GaussianProcessClassifier(1.0 * RBF(1.0)),

examples/deep_bayesian_active_learning.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -62,12 +62,10 @@ def max_entropy(learner, X, n_instances=1, T=100):
6262
expected_p = np.mean(MC_samples, axis=0)
6363
acquisition = - np.sum(expected_p * np.log(expected_p + 1e-10), axis=-1) # [batch size]
6464
idx = (-acquisition).argsort()[:n_instances]
65-
query_idx = random_subset[idx]
66-
return query_idx, X[query_idx]
65+
return random_subset[idx]
6766

6867
def uniform(learner, X, n_instances=1):
69-
query_idx = np.random.choice(range(len(X)), size=n_instances, replace=False)
70-
return query_idx, X[query_idx]
68+
return np.random.choice(range(len(X)), size=n_instances, replace=False)
7169

7270
"""
7371
Training the ActiveLearner

examples/shape_learning.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,8 +57,7 @@
5757

5858

5959
def random_sampling(classsifier, X):
60-
query_idx = np.random.randint(len(X))
61-
return query_idx, X[query_idx]
60+
return np.random.randint(len(X))
6261

6362

6463
X_pool = deepcopy(X_full)

modAL/acquisition.py

Lines changed: 6 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ def optimizer_UCB(optimizer: BaseLearner, X: modALinput, beta: float = 1) -> np.
104104

105105

106106
def max_PI(optimizer: BaseLearner, X: modALinput, tradeoff: float = 0,
107-
n_instances: int = 1) -> Tuple[np.ndarray, modALinput]:
107+
n_instances: int = 1) -> np.ndarray:
108108
"""
109109
Maximum PI query strategy. Selects the instance with highest probability of improvement.
110110
@@ -118,13 +118,11 @@ def max_PI(optimizer: BaseLearner, X: modALinput, tradeoff: float = 0,
118118
The indices of the instances from X chosen to be labelled; the instances from X chosen to be labelled.
119119
"""
120120
pi = optimizer_PI(optimizer, X, tradeoff=tradeoff)
121-
query_idx = multi_argmax(pi, n_instances=n_instances)
122-
123-
return query_idx, X[query_idx]
121+
return multi_argmax(pi, n_instances=n_instances)
124122

125123

126124
def max_EI(optimizer: BaseLearner, X: modALinput, tradeoff: float = 0,
127-
n_instances: int = 1) -> Tuple[np.ndarray, modALinput]:
125+
n_instances: int = 1) -> np.ndarray:
128126
"""
129127
Maximum EI query strategy. Selects the instance with highest expected improvement.
130128
@@ -138,13 +136,11 @@ def max_EI(optimizer: BaseLearner, X: modALinput, tradeoff: float = 0,
138136
The indices of the instances from X chosen to be labelled; the instances from X chosen to be labelled.
139137
"""
140138
ei = optimizer_EI(optimizer, X, tradeoff=tradeoff)
141-
query_idx = multi_argmax(ei, n_instances=n_instances)
142-
143-
return query_idx, X[query_idx]
139+
return multi_argmax(ei, n_instances=n_instances)
144140

145141

146142
def max_UCB(optimizer: BaseLearner, X: modALinput, beta: float = 1,
147-
n_instances: int = 1) -> Tuple[np.ndarray, modALinput]:
143+
n_instances: int = 1) -> np.ndarray:
148144
"""
149145
Maximum UCB query strategy. Selects the instance with highest upper confidence bound.
150146
@@ -158,6 +154,4 @@ def max_UCB(optimizer: BaseLearner, X: modALinput, beta: float = 1,
158154
The indices of the instances from X chosen to be labelled; the instances from X chosen to be labelled.
159155
"""
160156
ucb = optimizer_UCB(optimizer, X, beta=beta)
161-
query_idx = multi_argmax(ucb, n_instances=n_instances)
162-
163-
return query_idx, X[query_idx]
157+
return multi_argmax(ucb, n_instances=n_instances)

modAL/batch.py

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,7 @@ def select_instance(
114114
unlabeled_indices = [i for i in range(n_pool) if mask[i]]
115115
best_instance_index = unlabeled_indices[best_instance_index_in_unlabeled]
116116
mask[best_instance_index] = 0
117-
return best_instance_index, np.expand_dims(X_pool[best_instance_index], axis=0), mask
117+
return best_instance_index, X_pool[[best_instance_index]], mask
118118

119119

120120
def ranked_batch(classifier: Union[BaseLearner, BaseCommittee],
@@ -142,11 +142,16 @@ def ranked_batch(classifier: Union[BaseLearner, BaseCommittee],
142142
"""
143143
# Make a local copy of our classifier's training data.
144144
# Define our record container and record the best cold start instance in the case of cold start.
145+
146+
# transform unlabeled data if needed
147+
if classifier.on_transformed:
148+
unlabeled = classifier.transform_without_estimating(unlabeled)
149+
145150
if classifier.X_training is None:
146151
best_coldstart_instance_index, labeled = select_cold_start_instance(X=unlabeled, metric=metric, n_jobs=n_jobs)
147152
instance_index_ranking = [best_coldstart_instance_index]
148153
elif classifier.X_training.shape[0] > 0:
149-
labeled = classifier.X_training[:]
154+
labeled = classifier.Xt_training[:] if classifier.on_transformed else classifier.X_training[:]
150155
instance_index_ranking = []
151156

152157
# The maximum number of records to sample.
@@ -180,7 +185,7 @@ def uncertainty_batch_sampling(classifier: Union[BaseLearner, BaseCommittee],
180185
metric: Union[str, Callable] = 'euclidean',
181186
n_jobs: Optional[int] = None,
182187
**uncertainty_measure_kwargs
183-
) -> Tuple[np.ndarray, Union[np.ndarray, sp.csr_matrix]]:
188+
) -> np.ndarray:
184189
"""
185190
Batch sampling query strategy. Selects the least sure instances for labelling.
186191
@@ -206,6 +211,6 @@ def uncertainty_batch_sampling(classifier: Union[BaseLearner, BaseCommittee],
206211
Indices of the instances from `X` chosen to be labelled; records from `X` chosen to be labelled.
207212
"""
208213
uncertainty = classifier_uncertainty(classifier, X, **uncertainty_measure_kwargs)
209-
query_indices = ranked_batch(classifier, unlabeled=X, uncertainty_scores=uncertainty,
214+
return ranked_batch(classifier, unlabeled=X, uncertainty_scores=uncertainty,
210215
n_instances=n_instances, metric=metric, n_jobs=n_jobs)
211-
return query_indices, X[query_indices]
216+

0 commit comments

Comments
 (0)