Skip to content

Commit 05aabe0

Browse files
authored
Merge pull request #54 from mlverse/updates
Updates
2 parents 9734180 + b7cb43b commit 05aabe0

7 files changed

Lines changed: 103 additions & 67 deletions

File tree

python/README.md

Lines changed: 43 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -4,60 +4,74 @@
44

55
<!-- badges: start -->
66

7+
[![PyPi](https://img.shields.io/pypi/v/mlverse-mall.png)](https://pypi.org/project/mlverse-mall/)
78
[![Python
89
tests](https://github.com/mlverse/mall/actions/workflows/python-tests.yaml/badge.svg)](https://github.com/mlverse/mall/actions/workflows/python-tests.yaml)
9-
[![Code
10+
\| [![Package
1011
coverage](https://codecov.io/gh/mlverse/mall/branch/main/graph/badge.svg)](https://app.codecov.io/gh/mlverse/mall?branch=main)
11-
[![Lifecycle:
12-
experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
12+
1313
<!-- badges: end -->
1414

15-
Run multiple LLM predictions against a data frame. The predictions are
16-
processed row-wise over a specified column. It works using a
17-
pre-determined one-shot prompt, along with the current row’s content.
18-
`mall` has been implemented for both R and Python. The prompt that is
19-
use will depend of the type of analysis needed.
15+
Use Large Language Models (LLM) to run Natural Language Processing (NLP)
16+
operations against your data. It takes advantage of the LLMs general
17+
language training in order to get the predictions, thus removing the
18+
need to train a new NLP model. `mall` is available for R and Python.
2019

21-
Currently, the included prompts perform the following:
20+
It works by running multiple LLM predictions against your data. The
21+
predictions are processed row-wise over a specified column. It relies on
22+
the “one-shot” prompt technique to instruct the LLM on a particular NLP
23+
operation to perform. The package includes prompts to perform the
24+
following specific NLP operations:
2225

2326
- [Sentiment analysis](#sentiment)
2427
- [Text summarizing](#summarize)
2528
- [Classify text](#classify)
2629
- [Extract one, or several](#extract), specific pieces information from
2730
the text
2831
- [Translate text](#translate)
29-
- [Verify that something it true](#verify) about the text (binary)
30-
- [Custom prompt](#custom-prompt)
32+
- [Verify that something is true](#verify) about the text (binary)
3133

32-
This package is inspired by the SQL AI functions now offered by vendors
33-
such as
34-
[Databricks](https://docs.databricks.com/en/large-language-models/ai-functions.html)
35-
and Snowflake. `mall` uses [Ollama](https://ollama.com/) to interact
36-
with LLMs installed locally.
34+
For other NLP operations, `mall` offers the ability for you to [write
35+
your own prompt](#custom-prompt).
3736

38-
For **Python**, `mall` is a library extension to
39-
[Polars](https://pola.rs/). To interact with Ollama, it uses the
40-
official [Python library](https://github.com/ollama/ollama-python).
37+
`mall` is a library extension to [Polars](https://pola.rs/). To interact
38+
with Ollama, it uses the official [Python
39+
library](https://github.com/ollama/ollama-python).
4140

4241
``` python
4342
reviews.llm.sentiment("review")
4443
```
4544

4645
## Motivation
4746

48-
We want to new find ways to help data scientists use LLMs in their daily
49-
work. Unlike the familiar interfaces, such as chatting and code
47+
We want to new find new ways to help data scientists use LLMs in their
48+
daily work. Unlike the familiar interfaces, such as chatting and code
5049
completion, this interface runs your text data directly against the LLM.
50+
This package is inspired by the SQL AI functions now offered by vendors
51+
such as
52+
[Databricks](https://docs.databricks.com/en/large-language-models/ai-functions.html)
53+
and Snowflake.
5154

5255
The LLM’s flexibility, allows for it to adapt to the subject of your
5356
data, and provide surprisingly accurate predictions. This saves the data
5457
scientist the need to write and tune an NLP model.
5558

5659
In recent times, the capabilities of LLMs that can run locally in your
5760
computer have increased dramatically. This means that these sort of
58-
analysis can run in your machine with good accuracy. Additionally, it
59-
makes it possible to take advantage of LLM’s at your institution, since
60-
the data will not leave the corporate network.
61+
analysis can run in your machine with good accuracy. It also makes it
62+
possible to take advantage of LLMs at your institution, since the data
63+
will not leave the corporate network. Additionally, LLM management and
64+
integration platforms, such as [Ollama](https://ollama.com/), are now
65+
very easy to setup and use. `mall` uses Ollama as to interact with local
66+
LLMs.
67+
68+
The development version of `mall` lets you **use external LLMs such as
69+
[OpenAI](https://openai.com/), [Gemini](https://gemini.google.com/) and
70+
[Anthropic](https://www.anthropic.com/)**. In R, `mall` uses the
71+
[`ellmer`](https://ellmer.tidyverse.org/index.html) package to integrate
72+
with the external LLM, and the
73+
[`chatlas`](https://posit-dev.github.io/chatlas/) package to integrate
74+
in Python.
6175

6276
## Get started
6377

@@ -99,6 +113,11 @@ reviews = data.reviews
99113
reviews
100114
```
101115

116+
/Users/edgar/Projects/mall/python/.venv/lib/python3.12/site-packages/pydantic/_internal/_fields.py:132: UserWarning: Field "model_format" in ContentToolResult has conflict with protected namespace "model_".
117+
118+
You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
119+
warnings.warn(
120+
102121
| review |
103122
|----|
104123
| "This has been the best TV I've ever used. Great screen, and sound." |

python/README.qmd

Lines changed: 40 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -7,56 +7,64 @@ execute:
77
<img src="https://mlverse.github.io/mall/site/images/favicon/apple-touch-icon-180x180.png" style="float:right" />
88

99
<!-- badges: start -->
10-
[![Python tests](https://github.com/mlverse/mall/actions/workflows/python-tests.yaml/badge.svg)](https://github.com/mlverse/mall/actions/workflows/python-tests.yaml)
11-
[![Code coverage](https://codecov.io/gh/mlverse/mall/branch/main/graph/badge.svg)](https://app.codecov.io/gh/mlverse/mall?branch=main)
12-
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
13-
<!-- badges: end -->
1410

11+
[![PyPi](https://img.shields.io/pypi/v/mlverse-mall)](https://pypi.org/project/mlverse-mall/) [![Python tests](https://github.com/mlverse/mall/actions/workflows/python-tests.yaml/badge.svg)](https://github.com/mlverse/mall/actions/workflows/python-tests.yaml) \| [![Package coverage](https://codecov.io/gh/mlverse/mall/branch/main/graph/badge.svg)](https://app.codecov.io/gh/mlverse/mall?branch=main)
12+
13+
<!-- badges: end -->
1514

1615

17-
Run multiple LLM predictions against a data frame. The predictions are processed
18-
row-wise over a specified column. It works using a pre-determined one-shot prompt,
19-
along with the current row's content. `mall` has been implemented for both R
20-
and Python. The prompt that is use will depend of the type of analysis needed.
16+
Use Large Language Models (LLM) to run Natural Language Processing (NLP)
17+
operations against your data. It takes advantage of the LLMs general language
18+
training in order to get the predictions, thus removing the need to train a new
19+
NLP model. `mall` is available for R and Python.
2120

22-
Currently, the included prompts perform the following:
21+
It works by running multiple LLM predictions against your data. The predictions
22+
are processed row-wise over a specified column. It relies on the "one-shot"
23+
prompt technique to instruct the LLM on a particular NLP operation to perform.
24+
The package includes prompts to perform the following specific NLP operations:
2325

24-
- [Sentiment analysis](#sentiment)
25-
- [Text summarizing](#summarize)
26-
- [Classify text](#classify)
27-
- [Extract one, or several](#extract), specific pieces information from the text
28-
- [Translate text](#translate)
29-
- [Verify that something it true](#verify) about the text (binary)
30-
- [Custom prompt](#custom-prompt)
26+
- [Sentiment analysis](#sentiment)
27+
- [Text summarizing](#summarize)
28+
- [Classify text](#classify)
29+
- [Extract one, or several](#extract), specific pieces information from the text
30+
- [Translate text](#translate)
31+
- [Verify that something is true](#verify) about the text (binary)
3132

32-
This package is inspired by the SQL AI functions now offered by vendors such as
33-
[Databricks](https://docs.databricks.com/en/large-language-models/ai-functions.html)
34-
and Snowflake. `mall` uses [Ollama](https://ollama.com/) to interact with LLMs
35-
installed locally.
33+
For other NLP operations, `mall` offers the ability for you to [write your own prompt](#custom-prompt).
3634

37-
For **Python**, `mall` is a library extension to [Polars](https://pola.rs/). To
35+
`mall` is a library extension to [Polars](https://pola.rs/). To
3836
interact with Ollama, it uses the official
3937
[Python library](https://github.com/ollama/ollama-python).
4038

4139
```python
4240
reviews.llm.sentiment("review")
4341
```
44-
4542
## Motivation
4643

47-
We want to new find ways to help data scientists use LLMs in their daily work.
48-
Unlike the familiar interfaces, such as chatting and code completion, this interface
49-
runs your text data directly against the LLM.
44+
We want to new find new ways to help data scientists use LLMs in their daily work.
45+
Unlike the familiar interfaces, such as chatting and code completion, this
46+
interface runs your text data directly against the LLM. This package is inspired
47+
by the SQL AI functions now offered by vendors such as [Databricks](https://docs.databricks.com/en/large-language-models/ai-functions.html)
48+
and Snowflake.
5049

51-
The LLM's flexibility, allows for it to adapt to the subject of your data, and
52-
provide surprisingly accurate predictions. This saves the data scientist the
53-
need to write and tune an NLP model.
50+
The LLM's flexibility, allows for it to adapt to the subject of your data, and
51+
provide surprisingly accurate predictions. This saves the data scientist the
52+
need to write and tune an NLP model.
5453

5554
In recent times, the capabilities of LLMs that can run locally in your computer
56-
have increased dramatically. This means that these sort of analysis can run
57-
in your machine with good accuracy. Additionally, it makes it possible to take
58-
advantage of LLM's at your institution, since the data will not leave the
59-
corporate network.
55+
have increased dramatically. This means that these sort of analysis can run in
56+
your machine with good accuracy. It also makes it possible to take
57+
advantage of LLMs at your institution, since the data will not leave the
58+
corporate network. Additionally, LLM management and integration platforms, such
59+
as [Ollama](https://ollama.com/), are now very easy to setup and use. `mall`
60+
uses Ollama as to interact with local LLMs.
61+
62+
The development version of `mall` lets you **use external LLMs such as
63+
[OpenAI](https://openai.com/), [Gemini](https://gemini.google.com/) and
64+
[Anthropic](https://www.anthropic.com/)**. In R, `mall` uses the
65+
[`ellmer`](https://ellmer.tidyverse.org/index.html)
66+
package to integrate with the external LLM, and the
67+
[`chatlas`](https://posit-dev.github.io/chatlas/) package to integrate in Python.
6068

6169
## Get started
6270

python/mall/llm.py

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -54,15 +54,19 @@ def llm_call(x, msg, use, valid_resps="", convert=None, data_type=None):
5454

5555
hash_call = build_hash(call)
5656
cache = cache_check(hash_call, use)
57-
5857
if cache == "":
5958
if backend == "chatlas":
6059
chat = use.get("chat")
6160
ch = chat.chat(msg[0].get("content") + x, echo="none")
6261
out = ch.get_content()
6362
chat.set_turns(list())
64-
if backend == "ollama":
65-
resp = ollama.chat(
63+
if backend == "ollama" or backend == "ollama-client":
64+
if backend == "ollama":
65+
chat_fun = ollama.chat
66+
else:
67+
client = use.get("client")
68+
chat_fun = client.chat
69+
resp = chat_fun(
6670
model=use.get("model"),
6771
messages=build_msg(x, msg),
6872
options=use.get("options"),

python/mall/polars.py

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
from ollama import Client
12
from chatlas import Chat
23
import polars as pl
34

@@ -45,9 +46,10 @@ def use(self, backend="", model="", _cache="_mall_cache", **kwargs):
4546
4647
Parameters
4748
------
48-
backend : str | Chat
49-
The name of the backend to use, or a `chatlas` chat object.
50-
At the beginning of the session it defaults to "ollama".
49+
backend : str | Chat | Client
50+
The name of the backend to use, or an Ollama Client object,
51+
or a `chatlas` Chat object.
52+
At the beginning of the session it defaults to "ollama".
5153
If passing `""`, it will remain unchanged
5254
model : str
5355
The name of the model tha the backend should use. At the beginning
@@ -87,7 +89,7 @@ def use(self, backend="", model="", _cache="_mall_cache", **kwargs):
8789
```
8890
8991
```{python}
90-
# Use a `chatlas` object
92+
# Use a `chatlas` object
9193
from chatlas import ChatOpenAI
9294
chat = ChatOpenAI()
9395
reviews.llm.use(chat)
@@ -98,6 +100,10 @@ def use(self, backend="", model="", _cache="_mall_cache", **kwargs):
98100
self._use.update(dict(chat=backend))
99101
backend = ""
100102
model = ""
103+
if isinstance(backend, Client):
104+
self._use.update(dict(backend="ollama-client"))
105+
self._use.update(dict(client=backend))
106+
backend = ""
101107
if backend != "":
102108
self._use.update(dict(backend=backend))
103109
if model != "":

python/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ packages = ["mall"]
33

44
[project]
55
name = "mlverse-mall"
6-
version = "0.1.0.9000"
6+
version = "0.1.0.9001"
77
description = "Run multiple 'Large Language Model' predictions against a table. The predictions run row-wise over a specified column."
88
readme = "README.md"
99
authors = [

r/man/llm_use.Rd

Lines changed: 0 additions & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

r/tests/testthat/_snaps/llm-use.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
2727
-- mall session object
2828
Backend: ellmer
29-
LLM session: model:gpt-4o
29+
LLM session: model:gpt-4.1
3030

3131
# Ensures empty llm_use works with Chat
3232

@@ -36,5 +36,5 @@
3636
3737
-- mall session object
3838
Backend: ellmer
39-
LLM session: model:gpt-4o
39+
LLM session: model:gpt-4.1
4040

0 commit comments

Comments
 (0)