-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plans to support export PMML model in R package? #296
Comments
I guess you can use https://github.com/vruusmann/jpmml-export. @vruusmann |
The current working codebase is located at https://github.com/jpmml/jpmml-r I did a few steps in this direction but got stuck with a problem that LightGBM's R wrapper object is not a "pure" R object (eg. a list), but some environment-type R object: model = lgbm(...)
saveRDS(model, "model.rds") # THIS! Such environment-type R objects are very difficult to (re-)use on other platforms. On Java platform, the LightGBM-to-PMML conversion logic is readily available in the form of JPMML-LightGBM library. If there was an easy way to "parse" this environment-type R object, and get hold of the enclosed LightGBM model text file, then the rest would be easy. Lets keep this issue open - I'll share my thoughts about how to make the RDS serialization of LightGBM wrapper objects a bit more intuitive here. |
@vruusmann Can you write on a temp file using You are right it is an environment-type object and not a list, and it is difficult to get to the internals of it (unlike xgboost). That's mainly due to the OO style. Demo for dumping to a temp file and reading it in R right after in the temp folder: library(lightgbm)
data(agaricus.train, package='lightgbm')
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label=train$label)
data(agaricus.test, package='lightgbm')
test <- agaricus.test
dtest <- lgb.Dataset.create.valid(dtrain, test$data, label=test$label)
params <- list(objective="regression", metric="l2")
valids <- list(test=dtest)
model <- lgb.train(params, dtrain, 100, valids, min_data=1, learning_rate=1, early_stopping_rounds=10)
temp <- tempfile()
lgb.save(model, temp)
readChar(temp, file.info(temp)$size) # could be what you want Hand edited file extension to add .txt so it can be uploaded: file18b4140b1d3f.txt (1.199 KB) |
@Laurae2 Any plans for making Currently, the RDS serialized form of Why not embed the LightGBM text file directly into the RDS serialized form of |
@guolinke perhaps a rework of the R6 booster class might be needed (ex: xgboost - for better flexibility/extensibility), what is your opinion on this? This would cause to remove the environment-style (object oriented) and add the booster as an argument to all the public/private functions needed (these functions would become booster-independent). It would allow also to explore the content of the booster model directly in the environment pane in RStudio. It could also make it more obvious to be extended. Refers also to:
Environment pane in RStudio: Structure access comparison: > str(model_lgb)
Classes 'lgb.Booster', 'R6' <lgb.Booster>
Public:
add_valid: function (data, name)
best_iter: -1
current_iter: function ()
dump_model: function (num_iteration = NULL)
eval: function (data, name, feval = NULL)
eval_train: function (feval = NULL)
eval_valid: function (feval = NULL)
finalize: function ()
initialize: function (params = list(), train_set = NULL, modelfile = NULL,
predict: function (data, num_iteration = NULL, rawscore = FALSE, predleaf = FALSE,
record_evals: list
reset_parameter: function (params, ...)
rollback_one_iter: function ()
save_model: function (filename, num_iteration = NULL)
set_train_data_name: function (name)
to_predictor: function ()
update: function (train_set = NULL, fobj = NULL)
Private:
eval_names: l2
get_eval_info: function ()
handle: 1.29778246886006e-315
higher_better_inner_eval: FALSE
init_predictor: NULL
inner_eval: function (data_name, data_idx, feval = NULL)
inner_predict: function (idx)
is_predicted_cur_iter: list
name_train_set: training
name_valid_sets: list
num_class: 1
num_dataset: 2
predict_buffer: list
train_set: lgb.Dataset, R6
valid_sets: list
> str(model_xgb)
List of 8
$ handle :Class 'xgb.Booster.handle' <externalptr>
$ raw : raw [1:1099] 00 00 00 80 ...
$ niter : num 2
$ evaluation_log:Classes ‘data.table’ and 'data.frame': 2 obs. of 3 variables:
..$ iter : num [1:2] 1 2
..$ train_auc: num [1:2] 0.958 0.981
..$ eval_auc : num [1:2] 0.96 0.98
..- attr(*, ".internal.selfref")=<externalptr>
$ call : language xgb.train(params = param, data = dtrain, nrounds = 2, watchlist = watchlist)
$ params :List of 7
..$ max_depth : num 2
..$ eta : num 1
..$ silent : num 1
..$ nthread : num 2
..$ objective : chr "binary:logistic"
..$ eval_metric: chr "auc"
..$ silent : num 1
$ callbacks :List of 2
..$ cb.print.evaluation:function (env = parent.frame())
.. ..- attr(*, "call")= language cb.print.evaluation(period = print_every_n)
.. ..- attr(*, "name")= chr "cb.print.evaluation"
..$ cb.evaluation.log :function (env = parent.frame(), finalize = FALSE)
.. ..- attr(*, "call")= language cb.evaluation.log()
.. ..- attr(*, "name")= chr "cb.evaluation.log"
$ feature_names : chr [1:126] "cap-shape=bell" "cap-shape=conical" "cap-shape=convex" "cap-shape=flat" ...
- attr(*, "class")= chr "xgb.Booster" |
My request for more "statefulness" was based on my previous experience with converting XGBoost model objects. The state of an
The current state of an
In the future, the |
Thanks @vruusmann @Laurae2 @wxchan @guolinke . |
I think downgrade to "S4" is not so easy for LightGBM. @Laurae2 Can we just add something like lgb.Booster$raw to achieve stateful? |
@guolinke yes if a S4 class is not a solution, I'm thinking about this:
xgboost has a similar approach for the last point, it allows to reset pointers from a loaded model so the model is usable. Adding a $raw would also help using saveRDS/loadRDS to save/load objects from R (instead of using the provided lgb.save/lgb.load functions - in R we serialize to save objects, but we cannot do that in environments like @vruusmann described). I think it would need a way to convert from $raw to lgb.Booster though (like a new lgb.init function to create lgb.Booster from $raw?). This could be something like this: @vruusmann @kevinzhangguangjin @guolinke opinions about this scheme? Conditions:
Fixes:
|
Asking user to Actually, my first complaint (#296 (comment)) is not about the environment-type R objects per se. R6 classes appear to be fully interoperable with the This complaint should be reinterpreted as: "it would be nice if one could use the So, for starters, introducing the |
@Laurae2 can you help to implement this? i can help after you opening PR. |
@guolinke sorry I was sick, I will try working on it now. |
It would be great to do this directly in R.
The text was updated successfully, but these errors were encountered: