Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply map_batches and forward in lazy mode with sequence and numeric. #249

Open
linjing-lab opened this issue Jul 26, 2024 · 0 comments
Open

Comments

@linjing-lab
Copy link

linjing-lab commented Jul 26, 2024

The map_batches function need LazyFrame query optimization and stream compuation in forward module.

df.with_columns([
    polars.col("features").map_batches(lambda seq: NeuralNetwork.forward(seq.to_numpy())).alias("activations")
])

The above solution pattern is eager mode, so if lazy mode would enhance with df namespace, the whole program execute both query optimization and forward computation.

df.lazy().with_columns([
    polars.col("features").map_batches(lambda seq: NeuralNetwork.forward(seq.to_numpy())).alias("activations")
]).collect()

I'd rather choose NeuralNetwork use the same numeric level as numpy.ndarray, to make numerical forward extensible and compatible with lazy mode in stream computation. Cause lambda function takes a delayed buffer in completion of query optimization. The recommand solution describes like, col and map_batches are the expressions of lazy query, so the execute process located at collect function when memory streamly pass expression (not occupy new memory) and make query plan.

The NeuralNetwork can be perceptron and any sequence-friendly model with activated high-dimensional sequences to predict downstream task with activations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant