Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 43 additions & 24 deletions python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,60 +4,74 @@

<!-- badges: start -->

[![PyPi](https://img.shields.io/pypi/v/mlverse-mall.png)](https://pypi.org/project/mlverse-mall/)
[![Python
tests](https://github.com/mlverse/mall/actions/workflows/python-tests.yaml/badge.svg)](https://github.com/mlverse/mall/actions/workflows/python-tests.yaml)
[![Code
\| [![Package
coverage](https://codecov.io/gh/mlverse/mall/branch/main/graph/badge.svg)](https://app.codecov.io/gh/mlverse/mall?branch=main)
[![Lifecycle:
experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)

<!-- badges: end -->

Run multiple LLM predictions against a data frame. The predictions are
processed row-wise over a specified column. It works using a
pre-determined one-shot prompt, along with the current row’s content.
`mall` has been implemented for both R and Python. The prompt that is
use will depend of the type of analysis needed.
Use Large Language Models (LLM) to run Natural Language Processing (NLP)
operations against your data. It takes advantage of the LLMs general
language training in order to get the predictions, thus removing the
need to train a new NLP model. `mall` is available for R and Python.

Currently, the included prompts perform the following:
It works by running multiple LLM predictions against your data. The
predictions are processed row-wise over a specified column. It relies on
the “one-shot” prompt technique to instruct the LLM on a particular NLP
operation to perform. The package includes prompts to perform the
following specific NLP operations:

- [Sentiment analysis](#sentiment)
- [Text summarizing](#summarize)
- [Classify text](#classify)
- [Extract one, or several](#extract), specific pieces information from
the text
- [Translate text](#translate)
- [Verify that something it true](#verify) about the text (binary)
- [Custom prompt](#custom-prompt)
- [Verify that something is true](#verify) about the text (binary)

This package is inspired by the SQL AI functions now offered by vendors
such as
[Databricks](https://docs.databricks.com/en/large-language-models/ai-functions.html)
and Snowflake. `mall` uses [Ollama](https://ollama.com/) to interact
with LLMs installed locally.
For other NLP operations, `mall` offers the ability for you to [write
your own prompt](#custom-prompt).

For **Python**, `mall` is a library extension to
[Polars](https://pola.rs/). To interact with Ollama, it uses the
official [Python library](https://github.com/ollama/ollama-python).
`mall` is a library extension to [Polars](https://pola.rs/). To interact
with Ollama, it uses the official [Python
library](https://github.com/ollama/ollama-python).

``` python
reviews.llm.sentiment("review")
```

## Motivation

We want to new find ways to help data scientists use LLMs in their daily
work. Unlike the familiar interfaces, such as chatting and code
We want to new find new ways to help data scientists use LLMs in their
daily work. Unlike the familiar interfaces, such as chatting and code
completion, this interface runs your text data directly against the LLM.
This package is inspired by the SQL AI functions now offered by vendors
such as
[Databricks](https://docs.databricks.com/en/large-language-models/ai-functions.html)
and Snowflake.

The LLM’s flexibility, allows for it to adapt to the subject of your
data, and provide surprisingly accurate predictions. This saves the data
scientist the need to write and tune an NLP model.

In recent times, the capabilities of LLMs that can run locally in your
computer have increased dramatically. This means that these sort of
analysis can run in your machine with good accuracy. Additionally, it
makes it possible to take advantage of LLM’s at your institution, since
the data will not leave the corporate network.
analysis can run in your machine with good accuracy. It also makes it
possible to take advantage of LLMs at your institution, since the data
will not leave the corporate network. Additionally, LLM management and
integration platforms, such as [Ollama](https://ollama.com/), are now
very easy to setup and use. `mall` uses Ollama as to interact with local
LLMs.

The development version of `mall` lets you **use external LLMs such as
[OpenAI](https://openai.com/), [Gemini](https://gemini.google.com/) and
[Anthropic](https://www.anthropic.com/)**. In R, `mall` uses the
[`ellmer`](https://ellmer.tidyverse.org/index.html) package to integrate
with the external LLM, and the
[`chatlas`](https://posit-dev.github.io/chatlas/) package to integrate
in Python.

## Get started

Expand Down Expand Up @@ -99,6 +113,11 @@ reviews = data.reviews
reviews
```

/Users/edgar/Projects/mall/python/.venv/lib/python3.12/site-packages/pydantic/_internal/_fields.py:132: UserWarning: Field "model_format" in ContentToolResult has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
warnings.warn(

| review |
|----|
| "This has been the best TV I've ever used. Great screen, and sound." |
Expand Down
72 changes: 40 additions & 32 deletions python/README.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -7,56 +7,64 @@ execute:
<img src="https://mlverse.github.io/mall/site/images/favicon/apple-touch-icon-180x180.png" style="float:right" />

<!-- badges: start -->
[![Python tests](https://github.com/mlverse/mall/actions/workflows/python-tests.yaml/badge.svg)](https://github.com/mlverse/mall/actions/workflows/python-tests.yaml)
[![Code coverage](https://codecov.io/gh/mlverse/mall/branch/main/graph/badge.svg)](https://app.codecov.io/gh/mlverse/mall?branch=main)
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
<!-- badges: end -->

[![PyPi](https://img.shields.io/pypi/v/mlverse-mall)](https://pypi.org/project/mlverse-mall/) [![Python tests](https://github.com/mlverse/mall/actions/workflows/python-tests.yaml/badge.svg)](https://github.com/mlverse/mall/actions/workflows/python-tests.yaml) \| [![Package coverage](https://codecov.io/gh/mlverse/mall/branch/main/graph/badge.svg)](https://app.codecov.io/gh/mlverse/mall?branch=main)

<!-- badges: end -->


Run multiple LLM predictions against a data frame. The predictions are processed
row-wise over a specified column. It works using a pre-determined one-shot prompt,
along with the current row's content. `mall` has been implemented for both R
and Python. The prompt that is use will depend of the type of analysis needed.
Use Large Language Models (LLM) to run Natural Language Processing (NLP)
operations against your data. It takes advantage of the LLMs general language
training in order to get the predictions, thus removing the need to train a new
NLP model. `mall` is available for R and Python.

Currently, the included prompts perform the following:
It works by running multiple LLM predictions against your data. The predictions
are processed row-wise over a specified column. It relies on the "one-shot"
prompt technique to instruct the LLM on a particular NLP operation to perform.
The package includes prompts to perform the following specific NLP operations:

- [Sentiment analysis](#sentiment)
- [Text summarizing](#summarize)
- [Classify text](#classify)
- [Extract one, or several](#extract), specific pieces information from the text
- [Translate text](#translate)
- [Verify that something it true](#verify) about the text (binary)
- [Custom prompt](#custom-prompt)
- [Sentiment analysis](#sentiment)
- [Text summarizing](#summarize)
- [Classify text](#classify)
- [Extract one, or several](#extract), specific pieces information from the text
- [Translate text](#translate)
- [Verify that something is true](#verify) about the text (binary)

This package is inspired by the SQL AI functions now offered by vendors such as
[Databricks](https://docs.databricks.com/en/large-language-models/ai-functions.html)
and Snowflake. `mall` uses [Ollama](https://ollama.com/) to interact with LLMs
installed locally.
For other NLP operations, `mall` offers the ability for you to [write your own prompt](#custom-prompt).

For **Python**, `mall` is a library extension to [Polars](https://pola.rs/). To
`mall` is a library extension to [Polars](https://pola.rs/). To
interact with Ollama, it uses the official
[Python library](https://github.com/ollama/ollama-python).

```python
reviews.llm.sentiment("review")
```

## Motivation

We want to new find ways to help data scientists use LLMs in their daily work.
Unlike the familiar interfaces, such as chatting and code completion, this interface
runs your text data directly against the LLM.
We want to new find new ways to help data scientists use LLMs in their daily work.
Unlike the familiar interfaces, such as chatting and code completion, this
interface runs your text data directly against the LLM. This package is inspired
by the SQL AI functions now offered by vendors such as [Databricks](https://docs.databricks.com/en/large-language-models/ai-functions.html)
and Snowflake.

The LLM's flexibility, allows for it to adapt to the subject of your data, and
provide surprisingly accurate predictions. This saves the data scientist the
need to write and tune an NLP model.
The LLM's flexibility, allows for it to adapt to the subject of your data, and
provide surprisingly accurate predictions. This saves the data scientist the
need to write and tune an NLP model.

In recent times, the capabilities of LLMs that can run locally in your computer
have increased dramatically. This means that these sort of analysis can run
in your machine with good accuracy. Additionally, it makes it possible to take
advantage of LLM's at your institution, since the data will not leave the
corporate network.
have increased dramatically. This means that these sort of analysis can run in
your machine with good accuracy. It also makes it possible to take
advantage of LLMs at your institution, since the data will not leave the
corporate network. Additionally, LLM management and integration platforms, such
as [Ollama](https://ollama.com/), are now very easy to setup and use. `mall`
uses Ollama as to interact with local LLMs.

The development version of `mall` lets you **use external LLMs such as
[OpenAI](https://openai.com/), [Gemini](https://gemini.google.com/) and
[Anthropic](https://www.anthropic.com/)**. In R, `mall` uses the
[`ellmer`](https://ellmer.tidyverse.org/index.html)
package to integrate with the external LLM, and the
[`chatlas`](https://posit-dev.github.io/chatlas/) package to integrate in Python.

## Get started

Expand Down
10 changes: 7 additions & 3 deletions python/mall/llm.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,15 +54,19 @@

hash_call = build_hash(call)
cache = cache_check(hash_call, use)

if cache == "":
if backend == "chatlas":
chat = use.get("chat")
ch = chat.chat(msg[0].get("content") + x, echo="none")
out = ch.get_content()
chat.set_turns(list())
if backend == "ollama":
resp = ollama.chat(
if backend == "ollama" or backend == "ollama-client":
if backend == "ollama":
chat_fun = ollama.chat

Check warning on line 65 in python/mall/llm.py

View check run for this annotation

Codecov / codecov/patch

python/mall/llm.py#L64-L65

Added lines #L64 - L65 were not covered by tests
else:
client = use.get("client")
chat_fun = client.chat
resp = chat_fun(

Check warning on line 69 in python/mall/llm.py

View check run for this annotation

Codecov / codecov/patch

python/mall/llm.py#L67-L69

Added lines #L67 - L69 were not covered by tests
model=use.get("model"),
messages=build_msg(x, msg),
options=use.get("options"),
Expand Down
14 changes: 10 additions & 4 deletions python/mall/polars.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from ollama import Client
from chatlas import Chat
import polars as pl

Expand Down Expand Up @@ -45,9 +46,10 @@

Parameters
------
backend : str | Chat
The name of the backend to use, or a `chatlas` chat object.
At the beginning of the session it defaults to "ollama".
backend : str | Chat | Client
The name of the backend to use, or an Ollama Client object,
or a `chatlas` Chat object.
At the beginning of the session it defaults to "ollama".
If passing `""`, it will remain unchanged
model : str
The name of the model tha the backend should use. At the beginning
Expand Down Expand Up @@ -87,7 +89,7 @@
```

```{python}
# Use a `chatlas` object
# Use a `chatlas` object
from chatlas import ChatOpenAI
chat = ChatOpenAI()
reviews.llm.use(chat)
Expand All @@ -98,6 +100,10 @@
self._use.update(dict(chat=backend))
backend = ""
model = ""
if isinstance(backend, Client):
self._use.update(dict(backend="ollama-client"))
self._use.update(dict(client=backend))
backend = ""

Check warning on line 106 in python/mall/polars.py

View check run for this annotation

Codecov / codecov/patch

python/mall/polars.py#L104-L106

Added lines #L104 - L106 were not covered by tests
if backend != "":
self._use.update(dict(backend=backend))
if model != "":
Expand Down
2 changes: 1 addition & 1 deletion python/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ packages = ["mall"]

[project]
name = "mlverse-mall"
version = "0.1.0.9000"
version = "0.1.0.9001"
description = "Run multiple 'Large Language Model' predictions against a table. The predictions run row-wise over a specified column."
readme = "README.md"
authors = [
Expand Down
1 change: 0 additions & 1 deletion r/man/llm_use.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions r/tests/testthat/_snaps/llm-use.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@

-- mall session object
Backend: ellmer
LLM session: model:gpt-4o
LLM session: model:gpt-4.1

# Ensures empty llm_use works with Chat

Expand All @@ -36,5 +36,5 @@

-- mall session object
Backend: ellmer
LLM session: model:gpt-4o
LLM session: model:gpt-4.1