-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/gloo spark lightfm wrapper 0.2 #63
base: main
Are you sure you want to change the base?
Conversation
features_columns = features_np[:, 1:] | ||
number_of_features = features_columns.shape[1] | ||
# Scale down dense features | ||
scaler_name = f"{entity}_feat_scaler" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicit is better than implicit.
def init(....)
self.__item_feat_scaler: Optional[Scaler] = None
self.__user_feat_scaler: Optional[Scaler] = None
def __get_scaler(entity: str) -> Scaler:
# create or choose from __item_feat_scaler or __user_feat_scaler
...
if self.__feat_scaler is None:
pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's leave for now as it is. The logic why you do what you do is clear now.
But add an extensive comments to justify why you do it this way
self.__setattr__(scaler_name, MinMaxScaler().fit(features_columns)) | ||
|
||
if features_columns.size: | ||
features_dense = self.__getattribute__(scaler_name).transform( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace getattribute
def _convert_features_to_csr( | ||
self, entity_ids: DataFrame, features: Optional[DataFrame] = None | ||
) -> Optional[sp.csr_matrix]: | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add inputs to the docstring
"user_bias_momentum", | ||
) | ||
losses = ["warp", "bpr", "warp-kos", "logistic"] | ||
user_feat_scaler: Optional[MinMaxScaler] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove from here and move to init (filed of the instance, not field of the class)
losses = ["warp", "bpr", "warp-kos", "logistic"] | ||
user_feat_scaler: Optional[MinMaxScaler] = None | ||
item_feat_scaler: Optional[MinMaxScaler] = None | ||
logger = logging.getLogger("replay") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove from here
Better options right in the module file:
logger = logging.getLogger(name)
self._initialize_model() | ||
self._initialize_world_size_and_threads() | ||
|
||
if user_features is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if user_features is not None:
self.can_predict_cold_users = True
- >
self.can_predict_cold_users = user_features is not None
|
||
if user_features is not None: | ||
self.can_predict_cold_users = True | ||
if item_features is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
_num_epochs = self._num_epochs | ||
_pygloo_timeout = self._pygloo_timeout_sec | ||
|
||
def udf_to_map_on_interactions_with_index( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a comment, but rather TODO for future: need to reconsider whole this function
"""Trainer used in distributed LightFM.""" | ||
|
||
def __init__(self, model: LightFM, world_size: int, num_threads: int): | ||
self.model = model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add docstring
|
||
# pylint: disable=too-many-instance-attributes, too-many-arguments | ||
class LightFMTraining: | ||
"""Trainer used in distributed LightFM.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add more important details to this docstring
No description provided.