-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add first implementation of scaling functions #1197
Draft
franflame
wants to merge
14
commits into
CCSI-Toolset:master
Choose a base branch
from
franflame:scaling
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
bcf7593
Add first implementation of scaling functions
franflame 3e4ff22
Implementing scaling.py functions in plugins
franflame bc3a020
Formatting
franflame 90500dc
Remove commented code and add documentation for BaseScaler
franflame 490d9f2
Passing hypothesis tests for scaling.py
franflame 730046e
Parametrize and revise scaler object test function
franflame ff571ed
Handle error case for very small values in scale_log
franflame 193d457
Try adding GUI test for scaling option select
franflame 859ac94
Add support for text hints in table row search
lbianchi-lbl 7193751
Add quotes to search hint since they are present in the widget
lbianchi-lbl 45ca80f
Format with Black
lbianchi-lbl 21f2b42
Add scaling variants to surrogate GUI test
franflame a9867fd
Merge branch 'master' into scaling
sotorrio1 3d7b9f8
Merge branch 'master' into scaling
sotorrio1 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,202 @@ | ||
import copy | ||
import json | ||
import logging | ||
import math | ||
from collections import OrderedDict | ||
|
||
import numpy as np | ||
import pandas as pd | ||
from typing import Tuple | ||
|
||
|
||
def validate_for_scaling(array_in, lo, hi) -> None: | ||
if not np.all(np.isfinite(array_in)): | ||
raise ValueError("Input data cannot contain NaN or inf values") | ||
if array_in.ndim != 1: | ||
raise ValueError("Only 1D arrays supported") | ||
if array_in.size < 2: | ||
raise ValueError("Array must have at least 2 values") | ||
if np.allclose(lo, hi): | ||
raise ValueError("Array must contain non-identical values") | ||
if not check_under_or_overflow(array_in): | ||
raise ValueError("Array contains under/overflow values for dtype") | ||
|
||
|
||
def check_under_or_overflow(arr): | ||
if np.issubdtype(arr.dtype, np.integer): | ||
info = np.iinfo(arr.dtype) | ||
elif np.issubdtype(arr.dtype, np.floating): | ||
info = np.finfo(arr.dtype) | ||
else: | ||
raise ValueError("Unsupported data type") | ||
max_value = info.max | ||
min_value = info.min | ||
return np.all(arr < max_value) & np.all(arr > min_value) | ||
|
||
|
||
def scale_linear(array_in, lo=None, hi=None): | ||
if lo is None: | ||
lo = np.min(array_in) | ||
if hi is None: | ||
hi = np.max(array_in) | ||
validate_for_scaling(array_in, lo, hi) | ||
if (hi - lo) == 0: | ||
result = 0 | ||
else: | ||
result = (array_in - lo) / (hi - lo) | ||
return result | ||
|
||
|
||
def scale_log(array_in, lo=None, hi=None): | ||
# need to account for log domain | ||
epsilon = 1e-8 | ||
if np.any(array_in < epsilon): | ||
raise ValueError(f"All values must be greater than {epsilon}") | ||
if lo is None: | ||
lo = np.min(array_in) | ||
if hi is None: | ||
hi = np.max(array_in) | ||
validate_for_scaling(array_in, lo, hi) | ||
result = (np.log10(array_in) - np.log10(lo)) / (np.log10(hi) - np.log10(lo)) | ||
return result | ||
|
||
|
||
def scale_log2(array_in, lo=None, hi=None): | ||
if lo is None: | ||
lo = np.min(array_in) | ||
if hi is None: | ||
hi = np.max(array_in) | ||
validate_for_scaling(array_in, lo, hi) | ||
result = np.log10(9 * (array_in - lo) / (hi - lo) + 1) | ||
return result | ||
|
||
|
||
def scale_power(array_in, lo=None, hi=None): | ||
if lo is None: | ||
lo = np.min(array_in) | ||
if hi is None: | ||
hi = np.max(array_in) | ||
validate_for_scaling(array_in, lo, hi) | ||
result = (np.power(10, array_in) - np.power(10, lo)) / ( | ||
np.power(10, hi) - np.power(10, lo) | ||
) | ||
return result | ||
|
||
|
||
def scale_power2(array_in, lo=None, hi=None): | ||
if lo is None: | ||
lo = np.min(array_in) | ||
if hi is None: | ||
hi = np.max(array_in) | ||
validate_for_scaling(array_in, lo, hi) | ||
result = 1 / 9 * (np.power(10, (array_in - lo) / (hi - lo)) - 1) | ||
return result | ||
|
||
|
||
def unscale_linear(array_in, lo, hi): | ||
result = array_in * (hi - lo) / 1.0 + lo | ||
return result | ||
|
||
|
||
def unscale_log(array_in, lo, hi): | ||
result = lo * np.power(hi / lo, array_in) | ||
return result | ||
|
||
|
||
def unscale_log2(array_in, lo=None, hi=None): | ||
result = (np.power(10, array_in / 1.0) - 1) * (hi - lo) / 9.0 + lo | ||
return result | ||
|
||
|
||
def unscale_power(array_in, lo, hi): | ||
result = np.log10( | ||
(array_in / 1.0) * (np.power(10, hi) - np.power(10, lo)) + np.power(10, lo) | ||
) | ||
return result | ||
|
||
|
||
def unscale_power2(array_in, lo, hi): | ||
result = np.log10(9.0 * array_in / 1.0 + 1) * (hi - lo) + lo | ||
return result | ||
|
||
|
||
class BaseScaler: | ||
"""BaseScaler is the base class for the scaler classes defined | ||
below. It exposes the transformer interface from scikit-learn, | ||
and is not supposed to be instantiated directly.""" | ||
|
||
def fit(self, X: np.ndarray): | ||
self.lo_ = np.min(X) | ||
self.hi_ = np.max(X) | ||
return self | ||
|
||
def fit_transform(self, X: np.ndarray) -> np.ndarray: | ||
return self.fit(X).transform(X) | ||
|
||
def transform(self, X: np.ndarray) -> np.ndarray: | ||
raise NotImplementedError | ||
|
||
def inverse_transform(self, X: np.ndarray) -> np.ndarray: | ||
raise NotImplementedError | ||
|
||
|
||
class LinearScaler(BaseScaler): | ||
def transform(self, X: np.ndarray) -> np.ndarray: | ||
return scale_linear(X, self.lo_, self.hi_) | ||
|
||
def inverse_transform(self, X: np.ndarray) -> np.ndarray: | ||
return unscale_linear(X, self.lo_, self.hi_) | ||
|
||
|
||
class LogScaler(BaseScaler): | ||
def transform(self, X: np.ndarray) -> np.ndarray: | ||
return scale_log(X, self.lo_, self.hi_) | ||
|
||
def inverse_transform(self, X: np.ndarray) -> np.ndarray: | ||
return unscale_log(X, self.lo_, self.hi_) | ||
|
||
|
||
class LogScaler2(BaseScaler): | ||
def transform(self, X: np.ndarray) -> np.ndarray: | ||
return scale_log2(X, self.lo_, self.hi_) | ||
|
||
def inverse_transform(self, X: np.ndarray) -> np.ndarray: | ||
return unscale_log2(X, self.lo_, self.hi_) | ||
|
||
|
||
class PowerScaler(BaseScaler): | ||
def transform(self, X: np.ndarray) -> np.ndarray: | ||
return scale_power(X, self.lo_, self.hi_) | ||
|
||
def inverse_transform(self, X: np.ndarray) -> np.ndarray: | ||
return unscale_power(X, self.lo_, self.hi_) | ||
|
||
|
||
class PowerScaler2(BaseScaler): | ||
def transform(self, X: np.ndarray) -> np.ndarray: | ||
return scale_power2(X, self.lo_, self.hi_) | ||
|
||
def inverse_transform(self, X: np.ndarray) -> np.ndarray: | ||
return unscale_power2(X, self.lo_, self.hi_) | ||
|
||
|
||
map_name_to_scaler = { | ||
"Linear": LinearScaler(), | ||
"Log": LogScaler(), | ||
"Log2": LogScaler2(), | ||
"Power": PowerScaler(), | ||
"Power2": PowerScaler2(), | ||
} | ||
|
||
|
||
def scale_dataframe(df: pd.DataFrame, scaler: BaseScaler) -> Tuple[pd.DataFrame, dict]: | ||
scaled_df = pd.DataFrame(np.nan, columns=df.columns, index=df.index) | ||
bounds = {} | ||
|
||
for col_name in df: | ||
unscaled_col_data = df[col_name] | ||
scaled_col_data = scaler.fit_transform(unscaled_col_data) | ||
bounds[col_name] = scaler.lo_, scaler.hi_ | ||
scaled_df.loc[:, col_name] = scaled_col_data | ||
|
||
return scaled_df, bounds |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be misunderstanding the usage of the annotations here, but what is the outcome of this class? It seems that arrays that are transformed will raise exceptions for any input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a comment to explain the purpose of the BaseScaler class; transform() and inverse_transform() should be implemented by the derived classes, so it raises an error if called from the base class.