-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME-elpa
879 lines (708 loc) · 38.9 KB
/
README-elpa
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
━━━━━━━━━━━━━━━━━━━━━━━
LLM PACKAGE FOR EMACS
━━━━━━━━━━━━━━━━━━━━━━━
1 Introduction
══════════════
This library provides an interface for interacting with Large Language
Models (LLMs). It allows elisp code to use LLMs while also giving
end-users the choice to select their preferred LLM. This is
particularly beneficial when working with LLMs since various
high-quality models exist, some of which have paid API access, while
others are locally installed and free but offer medium
quality. Applications using LLMs can utilize this library to ensure
compatibility regardless of whether the user has a local LLM or is
paying for API access.
This library abstracts several kinds of features:
• Chat functionality: the ability to query the LLM and get a response,
and continue to take turns writing to the LLM and receiving
responses. The library supports both synchronous, asynchronous, and
streaming responses.
• Chat with image and other kinda of media inputs are also supported,
so that the user can input images and discuss them with the LLM.
• Tool use is supported, for having the LLM call elisp functions that
it chooses, with arguments it provides.
• Embeddings: Send text and receive a vector that encodes the semantic
meaning of the underlying text. Can be used in a search system to
find similar passages.
• Prompt construction: Create a prompt to give to an LLM from one more
sources of data.
Certain functionalities might not be available in some LLMs. Any such
unsupported functionality will raise a `'not-implemented' signal, or
it may fail in some other way. Clients are recommended to check
`llm-capabilities' when trying to do something beyond basic text chat.
2 Setting up providers
══════════════════════
Users of an application that uses this package should not need to
install it themselves. The llm package should be installed as a
dependency when you install the package that uses it. However, you do
need to require the llm module and set up the provider you will be
using. Typically, applications will have a variable you can set. For
example, let's say there's a package called "llm-refactoring", which
has a variable `llm-refactoring-provider'. You would set it up like
so:
┌────
│ (use-package llm-refactoring
│ :init
│ (require 'llm-openai)
│ (setq llm-refactoring-provider (make-llm-openai :key my-openai-key))
└────
Here `my-openai-key' would be a variable you set up before with your
OpenAI key. Or, just substitute the key itself as a string. It's
important to remember never to check your key into a public repository
such as GitHub, because your key must be kept private. Anyone with
your key can use the API, and you will be charged.
You can also use a function as a key, so you can store your key in a
secure place and retrieve it via a function. For example, you could
add a line to `~/.authinfo.gpg':
┌────
│ machine llm.openai password <key>
└────
And then set up your provider like:
┌────
│ (setq llm-refactoring-provider (make-llm-openai :key (plist-get (car (auth-source-search :host "llm.openai")) :secret)))
└────
All of the providers (except for `llm-fake'), can also take default
parameters that will be used if they are not specified in the prompt.
These are the same parameters as appear in the prompt, but prefixed
with `default-chat-'. So, for example, if you find that you like
Ollama to be less creative than the default, you can create your
provider like:
┌────
│ (make-llm-ollama :embedding-model "mistral:latest" :chat-model "mistral:latest" :default-chat-temperature 0.1)
└────
For embedding users. if you store the embeddings, you *must* set the
embedding model. Even though there's no way for the llm package to
tell whether you are storing it, if the default model changes, you may
find yourself storing incompatible embeddings.
2.1 Open AI
───────────
You can set up with `make-llm-openai', with the following parameters:
• `:key', the Open AI key that you get when you sign up to use Open
AI's APIs. Remember to keep this private. This is non-optional.
• `:chat-model': A model name from the [list of Open AI's model
names.] Keep in mind some of these are not available to everyone.
This is optional, and will default to a reasonable model.
• `:embedding-model': A model name from [list of Open AI's embedding
model names.] This is optional, and will default to a reasonable
model.
[list of Open AI's model names.]
<https://platform.openai.com/docs/models/gpt-4>
[list of Open AI's embedding model names.]
<https://platform.openai.com/docs/guides/embeddings/embedding-models>
2.2 Open AI Compatible
──────────────────────
There are many Open AI compatible APIs and proxies of Open AI. You
can set up one with `make-llm-openai-compatible', with the following
parameter:
1) `:url', the URL of leading up to the command ("embeddings" or
"chat/completions"). So, for example,
"<https://api.openai.com/v1/>" is the URL to use Open AI (although
if you wanted to do that, just use `make-llm-openai' instead).
2) `:chat-model': The chat model that is supported by the provider.
Some providers don't need a model to be set, but still require it
in the API, so we default to "unset".
3) `:embedding-model': An embedding model name that is supported by
the provider. This is also defaulted to "unset".
2.3 Azure's Open AI
───────────────────
Microsoft Azure has an Open AI integration, although it doesn't
support everything Open AI does, such as tool use. You can set it up
with `make-llm-azure', with the following parameter:
• `:url', the endpoint URL, such as
"<https://docs-test-001.openai.azure.com/>".
• `:key', the Azure key for Azure OpenAI service.
• `:chat-model', the chat model, which must be deployed in Azure.
• `embedding-model', the embedding model which must be deployed in
Azure.
2.4 GitHub Models
─────────────────
GitHub now has its own platform for interacting with AI models. For a
list of models check the [marketplace]. You can set it up with
`make-llm-github', with the following parameters:
• `:key', a GitHub token or an Azure AI production key.
• `:chat-model', the chat model, which can be any of the ones you have
access for (currently o1 is restricted).
• `:embedding-model', the embedding model, which can be better found
[through a filter]a.
[marketplace] <https://github.com/marketplace/models>
[through a filter]
<https://github.com/marketplace?type=models&task=Embeddings>
2.5 Gemini (not via Google Cloud)
─────────────────────────────────
This is Google's AI model. You can get an API key via their [page on
Google AI Studio]. Set this up with `make-llm-gemini', with the
following parameters:
• `:key', the Google AI key that you get from Google AI Studio.
• `:chat-model', the model name, from the [list] of models. This is
optional and will default to the text Gemini model.
• `:embedding-model': the model name, currently must be
"embedding-001". This is optional and will default to
"embedding-001".
[page on Google AI Studio] <https://makersuite.google.com/app/apikey>
[list] <https://ai.google.dev/models>
2.6 Vertex (Gemini via Google Cloud)
────────────────────────────────────
This is mostly for those who want to use Google Cloud specifically,
most users should use Gemini instead, which is easier to set up.
You can set up with `make-llm-vertex', with the following parameters:
• `:project': Your project number from Google Cloud that has Vertex
API enabled.
• `:chat-model': A model name from the [list of Vertex's model names.]
This is optional, and will default to a reasonable model.
• `:embedding-model': A model name from the [list of Vertex's
embedding model names.] This is optional, and will default to a
reasonable model.
In addition to the provider, which you may want multiple of (for
example, to charge against different projects), there are customizable
variables:
• `llm-vertex-gcloud-binary': The binary to use for generating the API
key.
• `llm-vertex-gcloud-region': The gcloud region to use. It's good to
set this to a region near where you are for best latency. Defaults
to "us-central1".
If you haven't already, you must run the following command before
using this:
┌────
│ gcloud beta services identity create --service=aiplatform.googleapis.com --project=PROJECT_ID
└────
[list of Vertex's model names.]
<https://cloud.google.com/vertex-ai/docs/generative-ai/chat/chat-prompts#supported_model>
[list of Vertex's embedding model names.]
<https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings#supported_models>
2.7 Claude
──────────
[Claude] is Anthropic's large language model. It does not support
embeddings. You can set it up with the following parameters:
`:key': The API key you get from [Claude's settings page]. This is
required. `:chat-model': One of the [Claude models]. Defaults to
"claude-3-opus-20240229", the most powerful model.
[Claude] <https://docs.anthropic.com/claude/docs/intro-to-claude>
[Claude's settings page] <https://console.anthropic.com/settings/keys>
[Claude models] <https://docs.anthropic.com/claude/docs/models-overview>
2.8 Ollama
──────────
[Ollama] is a way to run large language models locally. There are
[many different models] you can use with it, and some of them [support
tool use]. You set it up with the following parameters:
• `:scheme': The scheme (http/https) for the connection to ollama.
This default to "http".
• `:host': The host that ollama is run on. This is optional and will
default to localhost.
• `:port': The port that ollama is run on. This is optional and will
default to the default ollama port.
• `:chat-model': The model name to use for chat. This is not optional
for chat use, since there is no default.
• `:embedding-model': The model name to use for embeddings. Only
[some models] can be used for embeddings. This is not optional for
embedding use, since there is no default.
[Ollama] <https://ollama.ai/>
[many different models] <https://ollama.ai/library>
[support tool use] <https://ollama.com/search?c=tools>
[some models] <https://ollama.com/search?q=&c=embedding>
2.9 Deepseek
────────────
[Deepseek] is a company offers both reasoning and chat high-quality
models. This provider connects to their server. It is also possible
to run their model locally as a free model via Ollama. To use the
service, you can set it up with the following parameters:
`:key': The API Key you get from DeepSeek [API key page]. This is
required. `:chat-model': One of the models from their [model list.]
[Deepseek] <https://deepseek.com>
[API key page] <https://platform.deepseek.com/api_keys>
[model list.] <https://api-docs.deepseek.com/quick_start/pricing>
2.10 GPT4All
────────────
[GPT4All] is a way to run large language models locally. To use it
with `llm' package, you must click "Enable API Server" in the
settings. It does not offer embeddings or streaming functionality,
though, so Ollama might be a better fit for users who are not already
set up with local models. You can set it up with the following
parameters:
• `:host': The host that GPT4All is run on. This is optional and will
default to localhost.
• `:port': The port that GPT4All is run on. This is optional and will
default to the default ollama port.
• `:chat-model': The model name to use for chat. This is not optional
for chat use, since there is no default.
[GPT4All] <https://gpt4all.io/index.html>
2.11 llama.cpp
──────────────
[llama.cpp] is a way to run large language models locally. To use it
with the `llm' package, you need to start the server (with the
"–embedding" flag if you plan on using embeddings). The server must
be started with a model, so it is not possible to switch models until
the server is restarted to use the new model. As such, model is not a
parameter to the provider, since the model choice is already set once
the server starts.
There is a deprecated provider, however it is no longer needed.
Instead, llama cpp is Open AI compatible, so the Open AI Compatible
provider should work.
[llama.cpp] <https://github.com/ggerganov/llama.cpp>
2.12 Fake
─────────
This is a client that makes no call, but it just there for testing and
debugging. Mostly this is of use to programmatic clients of the llm
package, but end users can also use it to understand what will be sent
to the LLMs. It has the following parameters:
• `:output-to-buffer': if non-nil, the buffer or buffer name to append
the request sent to the LLM to.
• `:chat-action-func': a function that will be called to provide a
string or symbol and message cons which are used to raise an error.
• `:embedding-action-func': a function that will be called to provide
a vector or symbol and message cons which are used to raise an
error.
3 Models
════════
When picking a chat or embedding model, anything can be used, as long
as the service thinks it is valid. However, models vary on context
size and capabilities. The `llm-prompt' module, and any client, can
depend on the context size of the model via `llm-chat-token-limit'.
Similarly, some models have different capabilities, exposed in
`llm-capabilities'. The `llm-models' module defines a list of popular
models, but this isn't a comprehensive list. If you want to add a
model, it is fairly easy to do, for example here is adding the Mistral
model (which is already included, though):
┌────
│ (require 'llm-models)
│ (llm-models-add
│ :name "Mistral" :symbol 'mistral
│ :capabilities '(generation tool-use free-software)
│ :context-length 8192
│ :regex "mistral"))
└────
The `:regex' needs to uniquely identify the model passed in from a
provider's chat or embedding model.
Once this is done, the model will be recognized to have the given
context length and capabilities.
4 `llm' and the use of non-free LLMs
════════════════════════════════════
The `llm' package is part of GNU Emacs by being part of GNU ELPA.
Unfortunately, the most popular LLMs in use are non-free, which is not
what GNU software should be promoting by inclusion. On the other
hand, by use of the `llm' package, the user can make sure that any
client that codes against it will work with free models that come
along. It's likely that sophisticated free LLMs will, emerge,
although it's unclear right now what free software means with respect
to LLMs. Because of this tradeoff, we have decided to warn the user
when using non-free LLMs (which is every LLM supported right now
except the fake one). You can turn this off the same way you turn off
any other warning, by clicking on the left arrow next to the warning
when it comes up. Alternatively, you can set `llm-warn-on-nonfree' to
`nil'. This can be set via customization as well.
To build upon the example from before:
┌────
│ (use-package llm-refactoring
│ :init
│ (require 'llm-openai)
│ (setq llm-refactoring-provider (make-llm-openai :key my-openai-key)
│ llm-warn-on-nonfree nil)
└────
5 Programmatic use
══════════════════
Client applications should require the `llm' package, and code against
it. Most functions are generic, and take a struct representing a
provider as the first argument. The client code, or the user
themselves can then require the specific module, such as `llm-openai',
and create a provider with a function such as `(make-llm-openai :key
user-api-key)'. The client application will use this provider to call
all the generic functions.
For all callbacks, the callback will be executed in the buffer the
function was first called from. If the buffer has been killed, it
will be executed in a temporary buffer instead.
5.1 Main functions
──────────────────
• `llm-chat provider prompt multi-output': With user-chosen `provider'
, and a `llm-chat-prompt' structure (created by
`llm-make-chat-prompt'), send that prompt to the LLM and wait for
the string output.
• `llm-chat-async provider prompt response-callback error-callback
multi-output': Same as `llm-chat', but executes in the background.
Takes a `response-callback' which will be called with the text
response. The `error-callback' will be called in case of error,
with the error symbol and an error message.
• `llm-chat-streaming provider prompt partial-callback
response-callback error-callback multi-output': Similar to
`llm-chat-async', but request a streaming response. As the response
is built up, `partial-callback' is called with the all the text
retrieved up to the current point. Finally, `reponse-callback' is
called with the complete text.
• `llm-embedding provider string': With the user-chosen `provider',
send a string and get an embedding, which is a large vector of
floating point values. The embedding represents the semantic
meaning of the string, and the vector can be compared against other
vectors, where smaller distances between the vectors represent
greater semantic similarity.
• `llm-embedding-async provider string vector-callback
error-callback': Same as `llm-embedding' but this is processed
asynchronously. `vector-callback' is called with the vector
embedding, and, in case of error, `error-callback' is called with
the same arguments as in `llm-chat-async'.
• `llm-batch-embedding provider strings': same as `llm-embedding', but
takes in a list of strings, and returns a list of vectors whose
order corresponds to the ordering of the strings.
• `llm-batch-embedding-async provider strings vectors-callback
error-callback': same as `llm-embedding-async', but takes in a list
of strings, and returns a list of vectors whose order corresponds to
the ordering of the strings.
• `llm-count-tokens provider string': Count how many tokens are in
`string'. This may vary by `provider', because some provideres
implement an API for this, but typically is always about the same.
This gives an estimate if the provider has no API support.
• `llm-cancel-request request' Cancels the given request, if possible.
The `request' object is the return value of async and streaming
functions.
• `llm-name provider'. Provides a short name of the model or
provider, suitable for showing to users.
• `llm-models provider'. Return a list of all the available model
names for the provider. This could be either embedding or chat
models. You can use `llm-models-match' to filter on models that
have a certain capability (as long as they are in `llm-models').
• `llm-chat-token-limit'. Gets the token limit for the chat model.
This isn't possible for some backends like `llama.cpp', in which the
model isn't selected or known by this library.
And the following helper functions:
• `llm-make-chat-prompt text &keys context examples tools
temperature max-tokens response-format non-standard-params': This
is how you make prompts. `text' can be a string (the user input
to the llm chatbot), or a list representing a series of
back-and-forth exchanges, of odd number, with the last element of
the list representing the user's latest input. This supports
inputting context (also commonly called a system prompt, although
it isn't guaranteed to replace the actual system prompt),
examples, and other important elements, all detailed in the
docstring for this function. `response-format' can be `'json', to
force JSON output, or a JSON schema (see below) but the prompt
also needs to mention and ideally go into detail about what kind
of JSON response is desired. Providers with the `json-response'
capability support JSON output, and it will be ignored if
unsupported. The `non-standard-params' let you specify other
options that might vary per-provider, and for this, the
correctness is up to the client.
• `llm-chat-prompt-to-text prompt': From a prompt, return a string
representation. This is not usually suitable for passing to LLMs,
but for debugging purposes.
• `llm-chat-streaming-to-point provider prompt buffer point
finish-callback': Same basic arguments as `llm-chat-streaming',
but will stream to `point' in `buffer'.
• `llm-chat-prompt-append-response prompt response role': Append a
new response (from the user, usually) to the prompt. The `role'
is optional, and defaults to `'user'.
5.1.1 Return and multi-output
╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌
The default return value is text except for when tools are called, in
which case it is a record of the return values of the tools called.
Models can potentially return many types of information, though, so
the `multi-output' option was added to the `llm-chat' calls so that
the single return value can instead be a plist that represents the
various possible values. In the case of `llm-chat', this plist is
returned, in `llm-chat-async', it is passed to the success function.
In `llm-chat-streaming', it is passed to the success function, and
each partial update will be a plist, with no guarantee that the same
keys will always be present.
The possible plist keys are:
• `:text' , for the main textual output.
• `:reasoning', for reasoning output, when the model separates it.
• `:tool-uses', the tools that the llm identified to be called, as a
list of plists, with `:name' and `:args' values.
• `:tool-results', the results of calling the tools.
5.1.2 JSON schema
╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌
By using the `response-format' argument to `llm-make-chat-prompt', you
can ask the LLM to return items according to a specified JSON schema,
based on the [JSON Schema Spec]. Not everything is supported, but the
most commonly used parts are. To specify the JSON schema, we use a
plist-based approach. JSON objects are defined with `(:type object
:properties (:<var1> <schema1> :<var2> <schema2> ... :<varn>
<scheman>) :required (<req var1> ... <req varn>))'. Arrays are
defined with `(:type array :items <schema>)'. Enums are defined with
`(:enum [<val1> <val2> <val3>])'. You can also request integers,
strings, and other types defined by the JSON Schema Spec, by just
having `(:type <type>)'. Typically, LLMs often require the top-level
schema object to be an object, and often that all properties on the
top-level object must be required.
Some examples:
┌────
│ (llm-chat my-provider (llm-make-chat-prompt
│ "How many countries are there? Return the result as JSON."
│ :response-format
│ '(:type object :properties (:num (:type "integer")) :required ["num"])))
└────
┌────
│ (llm-chat my-provider (llm-make-chat-prompt
│ "Which editor is hard to quit? Return the result as JSON."
│ :response-format
│ '(:type object :properties (:editor (:enum ["emacs" "vi" "vscode"])
│ :authors (:type "array" :items (:type "string")))
│ :required ["editor" "authors"])))
└────
[JSON Schema Spec] <https://json-schema.org>
5.2 Logging
───────────
Interactions with the `llm' package can be logged by setting `llm-log'
to a non-nil value. This should be done only when developing. The
log can be found in the `*llm log*' buffer.
5.3 How to handle conversations
───────────────────────────────
Conversations can take place by repeatedly calling `llm-chat' and its
variants. The prompt should be constructed with
`llm-make-chat-prompt'. For a conversation, the entire prompt must be
kept as a variable, because the `llm-chat-prompt-interactions' slot
will be getting changed by the chat functions to store the
conversation. For some providers, this will store the history
directly in `llm-chat-prompt-interactions', but other LLMs have an
opaque conversation history. For that reason, the correct way to
handle a conversation is to repeatedly call `llm-chat' or variants
with the same prompt structure, kept in a variable, and after each
time, add the new user text with `llm-chat-prompt-append-response'.
The following is an example:
┌────
│ (defvar-local llm-chat-streaming-prompt nil)
│ (defun start-or-continue-conversation (text)
│ "Called when the user has input TEXT as the next input."
│ (if llm-chat-streaming-prompt
│ (llm-chat-prompt-append-response llm-chat-streaming-prompt text)
│ (setq llm-chat-streaming-prompt (llm-make-chat-prompt text))
│ (llm-chat-streaming-to-point provider llm-chat-streaming-prompt (current-buffer) (point-max) (lambda ()))))
└────
5.4 Caution about `llm-chat-prompt-interactions'
────────────────────────────────────────────────
The interactions in a prompt may be modified by conversation or by the
conversion of the context and examples to what the LLM understands.
Different providers require different things from the interactions.
Some can handle system prompts, some cannot. Some require alternating
user and assistant chat interactions, others can handle anything.
It's important that clients keep to behaviors that work on all
providers. Do not attempt to read or manipulate
`llm-chat-prompt-interactions' after initially setting it up for the
first time, because you are likely to make changes that only work for
some providers. Similarly, don't directly create a prompt with
`make-llm-chat-prompt', because it is easy to create something that
wouldn't work for all providers.
5.5 Tool use
────────────
*Note: tool use is currently beta quality. If you want to use tool
use, please watch the `llm' [discussions] for any announcements about
changes.*
Tool use is a way to give the LLM a list of functions it can call, and
have it call the functions for you. The standard interaction has the
following steps:
1. The client sends the LLM a prompt with tools it can use.
2. The LLM may return which tools to use, and with what arguments, or
text as normal.
3. If the LLM has decided to use one or more tools, those tool's
functions should be called, and their results sent back to the LLM.
This could be the final step depending on if any follow-on is
needed.
4. The LLM will return with a text response based on the initial
prompt and the results of the tool use.
5. The client can now can continue the conversation.
This basic structure is useful because it can guarantee a
well-structured output (if the LLM does decide to use the tool). *Not
every LLM can handle tool use, and those that do not will ignore the
tools entirely*. The function `llm-capabilities' will return a list
with `tool-use' in it if the LLM supports tool use. Because not all
providers support tool use when streaming, `streaming-tool-use'
indicates the ability to use tool uses in `llm-chat-streaming'. Right
now only Gemini, Vertex, Claude, and Open AI support tool use.
However, even for LLMs that handle tool use, there is sometimes a
difference in the capabilities. Right now, it is possible to write
tools that succeed in Open AI but cause errors in Gemini, because
Gemini does not appear to handle tools that have types that contain
other types. So client programs are advised for right now to keep
function to simple types.
The way to call functions is to attach a list of functions to the
`tools' slot in the prompt. This is a list of `llm-tool' structs,
which is a tool that is an elisp function, with a name, a description,
and a list of arguments. The docstrings give an explanation of the
format. An example is:
┌────
│ (llm-chat-async
│ my-llm-provider
│ (llm-make-chat-prompt
│ "What is the capital of France?"
│ :tools
│ (list (llm-make-tool
│ :function
│ (lambda (callback result)
│ ;; In this example function the assumption is that the
│ ;; callback will be called after processing the result is
│ ;; complete.
│ (notify-user-of-capital result callback))
│ :name "capital_of_country"
│ :description "Get the capital of a country."
│ :args '((:name "country"
│ :description "The country whose capital to look up."
│ :type string))
│ :async t)))
│ #'identity ;; No need to process the result in this example.
│ (lambda (_ err)
│ (error "Error on getting capital: %s" err)))
└────
Note that tools have the same arguments and structure as the tool
definitions in [GTPel].
The various chat APIs will execute the functions defined in `tools'
slot with the arguments supplied by the LLM. The chat functions will,
Instead of returning (or passing to a callback) a string, instead a
list will be returned of tool names and return values. This is not
technically an alist because the same tool might be used several
times, so the `car' can be equivalent.
After the tool is called, the client could use the result, but if you
want to proceed with the conversation, or get a textual response that
accompany the function you should just send the prompt back with no
modifications. This is because the LLM gives the tool use to perform,
and then expects to get back the results of that tool use. The
results were already executed at the end of the call which returned
the tools used, which also stores the result of that execution in the
prompt. This is why it should be sent back without further
modifications.
Be aware that there is no gaurantee that the tool will be called
correctly. While the LLMs mostly get this right, they are trained on
Javascript functions, so imitating Javascript names is
recommended. So, "write_email" is a better name for a function than
"write-email".
Examples can be found in `llm-tester'. There is also a function call
to generate function calls from existing elisp functions in
`utilities/elisp-to-tool.el'.
[discussions] <https://github.com/ahyatt/llm/discussions>
[GTPel] <https://github.com/karthink/gptel>
5.6 Media input
───────────────
*Note: media input functionality is currently alpha quality. If you
want to use it, please watch the `llm' [discussions] for any
announcements about changes.*
Media can be used in `llm-chat' and related functions. To use media,
you can use `llm-multipart' in `llm-make-chat-prompt', and pass it an
Emacs image or an `llm-media' object for other kinds of media.
Besides images, some models support video and audio. Not all
providers or models support these, with images being the most
frequently supported media type, and video and audio more rare.
[discussions] <https://github.com/ahyatt/llm/discussions>
5.7 Advanced prompt creation
────────────────────────────
The `llm-prompt' module provides helper functions to create prompts
that can incorporate data from your application. In particular, this
should be very useful for application that need a lot of context.
A prompt defined with `llm-prompt' is a template, with placeholders
that the module will fill in. Here's an example of a prompt
definition, from the [ekg] package:
┌────
│ (llm-defprompt ekg-llm-fill-prompt
│ "The user has written a note, and would like you to append to it,
│ to make it more useful. This is important: only output your
│ additions, and do not repeat anything in the user's note. Write
│ as a third party adding information to a note, so do not use the
│ first person.
│
│ First, I'll give you information about the note, then similar
│ other notes that user has written, in JSON. Finally, I'll give
│ you instructions. The user's note will be your input, all the
│ rest, including this, is just context for it. The notes given
│ are to be used as background material, which can be referenced in
│ your answer.
│
│ The user's note uses tags: {{tags}}. The notes with the same
│ tags, listed here in reverse date order: {{tag-notes:10}}
│
│ These are similar notes in general, which may have duplicates
│ from the ones above: {{similar-notes:1}}
│
│ This ends the section on useful notes as a background for the
│ note in question.
│
│ Your instructions on what content to add to the note:
│
│ {{instructions}}
│ ")
└────
When this is filled, it is done in the context of a provider, which
has a known context size (via `llm-chat-token-limit'). Care is taken
to not overfill the context, which is checked as it is filled via
`llm-count-tokens'. We usually want to not fill the whole context,
but instead leave room for the chat and subsequent terms. The
variable `llm-prompt-default-max-pct' controls how much of the context
window we want to fill. The way we estimate the number of tokens used
is quick but inaccurate, so limiting to less than the maximum context
size is useful for guarding against a miscount leading to an error
calling the LLM due to too many tokens. If you want to have a hard
limit as well that doesn't depend on the context window size, you can
use `llm-prompt-default-max-tokens'. We will use the minimum of
either value.
Variables are enclosed in double curly braces, like this:
`{{instructions}}'. They can just be the variable, or they can also
denote a number of tickets, like so: `{{tag-notes:10}}'. Tickets
should be thought of like lottery tickets, where the prize is a single
round of context filling for the variable. So the variable
`tag-notes' gets 10 tickets for a drawing. Anything else where
tickets are unspecified (unless it is just a single variable, which
will be explained below) will get a number of tickets equal to the
total number of specified tickets. So if you have two variables, one
with 1 ticket, one with 10 tickets, one will be filled 10 times more
than the other. If you have two variables, one with 1 ticket, one
unspecified, the unspecified one will get 1 ticket, so each will have
an even change to get filled. If no variable has tickets specified,
each will get an equal chance. If you have one variable, it could
have any number of tickets, but the result would be the same, since it
would win every round. This algorithm is the contribution of David
Petrou.
The above is true of variables that are to be filled with a sequence
of possible values. A lot of LLM context filling is like this. In
the above example, `{{similar-notes}}' is a retrieval based on a
similarity score. It will continue to fill items from most similar to
least similar, which is going to return almost everything the ekg app
stores. We want to retrieve only as needed. Because of this, the
`llm-prompt' module takes in /generators/ to supply each variable.
However, a plain list is also acceptable, as is a single value. Any
single value will not enter into the ticket system, but rather be
prefilled before any tickets are used.
Values supplied in either the list or generators can be the values
themselves, or conses. If a cons, the variable to fill is the `car'
of the cons, and the `cdr' is the place to fill the new value, `front'
or `back'. The `front' is the default: new values will be appended to
the end. `back' will add new values to the start of the filled text
for the variable instead.
So, to illustrate with this example, here's how the prompt will be
filled:
1. First, the `{{tags}}' and `{{instructions}}' will be filled first.
This will happen regardless before we check the context size, so
the module assumes that these will be small and not blow up the
context.
2. Check the context size we want to use (`llm-prompt-default-max-pct'
multiplied by `llm-chat-token-limit') and exit if exceeded.
3. Run a lottery with all tickets and choose one of the remaining
variables to fill.
4. If the variable won't make the text too large, fill the variable
with one entry retrieved from a supplied generator, otherwise
ignore. These are values are not conses, so values will be
appended to the end of the generated text for each variable (so a
new variable generated for tags will append after other generated
tags but before the subsequent "and" in the text.
5. Goto 2
The prompt can be filled two ways, one using predefined prompt
template (`llm-defprompt' and `llm-prompt-fill'), the other using a
prompt template that is passed in (`llm-prompt-fill-text').
┌────
│ (llm-defprompt my-prompt "My name is {{name}} and I'm here's to say {{messages}}")
│
│ (llm-prompt-fill 'my-prompt my-llm-provider :name "Pat" :messages #'my-message-retriever)
│
│ (iter-defun my-message-retriever ()
│ "Return the messages I like to say."
│ (my-message-reset-messages)
│ (while (my-has-next-message)
│ (iter-yield (my-get-next-message))))
└────
Alternatively, you can just fill it directly:
┌────
│ (llm-prompt-fill-text "Hi, I'm {{name}} and I'm here to say {{messages}}"
│ :name "John" :messages #'my-message-retriever)
└────
As you can see in the examples, the variable values are passed in with
matching keys.
[ekg] <https://github.com/ahyatt/ekg>
6 Contributions
═══════════════
If you are interested in creating a provider, please send a pull
request, or open a bug. This library is part of GNU ELPA, so any
major provider that we include in this module needs to be written by
someone with FSF papers. However, you can always write a module and
put it on a different package archive, such as MELPA.