Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 69 additions & 10 deletions docs/ai-chat.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,9 @@ Alternatively, you can run AI models locally so that your data never leaves your

### Hardware for Local AI Models

Local models are also fairly accessible. It's possible to run smaller models at lower speeds on as little as 8 GB of RAM. Using more powerful hardware such as a dedicated GPU with sufficient VRAM or a modern system with fast LPDDR5X memory offers the best experience.
Local models are also fairly accessible. They can run on most PCs and some high-end smartphones. It's possible to run smaller models at lower speeds on as little as 8 GB of RAM. Using more powerful hardware such as a dedicated GPU with sufficient VRAM or a modern system with fast LPDDR5X memory offers the best experience.

LLMs can usually be differentiated by the number of parameters, which can vary between 1.3B to 405B for open-source models available for end users. For example, models below 6.7B parameters are only good for basic tasks like text summaries, while models between 7B and 13B are a great compromise between quality and speed. Models with advanced reasoning capabilities are generally around 70B.
LLMs can usually be differentiated by the number of parameters, which can vary between 1.3B to 405B for open-source models available for end users. For example, models below 3B parameters are only reliably good for simple lingustic tasks such as summarization, models above 4B start understanding context and have a good knowledge of the world. From 8B, they can exercise basic reasoning. Models with advanced reasoning capabilities start at around 30B parameters.

For consumer-grade hardware, it is generally recommended to use [quantized models](https://huggingface.co/docs/optimum/en/concept_guides/quantization) for the best balance between model quality and performance. Check out the table below for more precise information about the typical requirements for different sizes of quantized models.

Expand All @@ -45,16 +45,75 @@ There are many permissively licensed models available to download. [Hugging Face

To help you choose a model that fits your needs, you can look at leaderboards and benchmarks. The most widely-used leaderboard is the community-driven [LM Arena](https://lmarena.ai). Additionally, the [OpenLLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) focuses on the performance of open-weights models on common benchmarks like [MMLU-Pro](https://arxiv.org/abs/2406.01574). There are also specialized benchmarks which measure factors like [emotional intelligence](https://eqbench.com), ["uncensored general intelligence"](https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard), and [many others](https://nebuly.com/blog/llm-leaderboards).

#### Recommended models

Below is a table of good models to start with.

|Developer|Model name| Available sizes| Strengths | Weaknesses | Censorship |
|---|---|---|---|---|---|
|Google| Gemma 3|1, 4, 12, 27| Multimodal, efficient| Many hallucinations | Sexuality, Drugs |
|Google| Gemma 3n| >2, >4| Vision capabilities, efficient, mobile-friendly | Many hallucinations | Sexuality, Drugs |
|Alibaba| Qwen 3|0.7, 1.7, 4, 8, 14, 32, 235|Multilingual, efficient, intelligent|Not multimodal| CCP-sensitive topics|

## AI Chat Clients

| Feature | [Kobold.cpp](#koboldcpp) | [Ollama](#ollama-cli) | [Llamafile](#llamafile) |
|---|---|---|---|
| GPU Support | :material-check:{ .pg-green } | :material-check:{ .pg-green } | :material-check:{ .pg-green } |
| Image Generation | :material-check:{ .pg-green } | :material-close:{ .pg-red } | :material-close:{ .pg-red } |
| Speech Recognition | :material-check:{ .pg-green } | :material-close:{ .pg-red } | :material-close:{ .pg-red } |
| Auto-download Models | :material-close:{ .pg-red } | :material-check:{ .pg-green } | :material-alert-outline:{ .pg-orange } Few models available |
| Custom Parameters | :material-check:{ .pg-green } | :material-close:{ .pg-red } | :material-check:{ .pg-green } |
| Multi-platform | :material-check:{ .pg-green } | :material-check:{ .pg-green } | :material-alert-outline:{ .pg-orange } Size limitations on Windows |
| Feature | [Kobold.cpp](#koboldcpp) | [Ollama](#ollama-cli) | [Llamafile](#llamafile) | [Edge Gallery](#edge-gallery-android)| [PocketPal](#pocketpal-ai-mobile)|
|---|---|---|---|---|---|
| GPU Support | :material-check:{ .pg-green } | :material-check:{ .pg-green } | :material-check:{ .pg-green } | / | / |
| Image Generation | :material-check:{ .pg-green } | :material-close:{ .pg-red } | :material-close:{ .pg-red } | :material-close:{. pg-red } | :material-close:{. pg-red } |
| Vision capabilities | :material-check:{ .pg-green } | :material-check:{ .pg-green } | :material-check:{ .pg-green } | :material-check:{ .pg-green } | :material-alert-outline:{ .pg-orange } Limited quality
| Speech-to-text | :material-check:{ .pg-green } | :material-close:{ .pg-red } | :material-close:{ .pg-red } | :material-close:{ .pg-red } | :material-close:{ .pg-red } |
| Auto-download Models | :material-close:{ .pg-red } | :material-check:{ .pg-green } |:material-alert-outline:{ .pg-orange } Few models available | :material-alert-outline:{ .pg-orange } Few models available | :material-check:{ .pg-green } |
| Custom Parameters | :material-check:{ .pg-green } | :material-close:{ .pg-red } | :material-check:{ .pg-green } | :material-alert-outline:{ .pg-orange } Model prompt cannot be changed | :material-check:{ .pg-green }|
| Platform | :simple-linux: Linux \ :fontawesome-brands-windows: Windows \ :simple-apple: macOS :simple-macos:| :simple-gnometerminal: All CLI :simple-linux: Linux :fontawesome-brands-windows: Windows :simple-apple: macOS | :simple-linux: Linux \ :fontawesome-brands-windows: Windows (Size limitations) \ :simple-apple: macOS | :simple-android: Android | :simple-android: Android :simple-apple: :simple-ios: |

## Edge Gallery (Android)

<div class="admonition recommendation" markdown>

![Edge Gallery Logo](assets/img/ai-chat/edge_gallery_white.png){align=right}

**Edge Gallery** is an AI client developped by Google that features Google's latest open-weights models. It runs fully locally, but downloading the preloaded models require a Hugging Face account as you will need to [accept Google's Acceptable Use policy](https://github.com/google-ai-edge/gallery/wiki/5.-Model-Management#hugging-face-login--license-acknowledgement) to access the model files. You can import models you downloaded, but the app only accepts the [LiteRT](https://github.com/google-ai-edge/litert) format, which isn't widely used.

The preloaded models are performant and can analyse an image. However, you might not be able to have long conversation or analyse long documents, as the app restricts the maximum context length to an ammount [multiple times smaller](https://github.com/google-ai-edge/gallery/issues/183) than what the AI model can normally support.

[:octicons-repo-16: Repository](https://github.com/google-ai-edge/gallery?tab=readme-ov-file#google-ai-edge-gallery-){ .md-button .md-button--primary }
[:octicons-info-16:](https://github.com/google-ai-edge/gallery/wiki){ .card-link title="Documentation" }
[:octicons-code-16:](https://github.com/google-ai-edge/gallery){ .card-link title="Source Code" }

<details class="downloads" markdown>
<summary>Downloads</summary>

- [:simple-github: GitHub](https://github.com/google-ai-edge/gallery/releases)
- [:simple-googleplay: Google Play](https://play.google.com/store/apps/details?id=com.google.ai.edge.gallery)

</details>
</div>

## PocketPal AI (Mobile)
Copy link

@GorujoCY GorujoCY Sep 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specify how to calculate what parameter should be used on your phone for the local model to not feel sluggish on pocketpal since pocketpal allows running local models from hugging face and there are bigger parameter ones listed there
eg.
12GB RAM smartphones could adequately run around a 3B parameter local AI model (commonly found on android devices now) (with overhead), the more ram the higher the parameter it can run (it is commonly calculated by parameters in B * 2(FP16) * 2(Overhead))
8GB of RAM commonly found in newer iPhones can run at least 2B models

for more clarity see https://developer.nvidia.com/blog/gpu-memory-essentials-for-ai-performance/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nowadays most local models run with quantization. I can run a 8B model on my Pixel 9. So I would say perhaps RAM needed =parameter size + 3.

The 3 is amount of RAM likely used by the OS and other apps.

Hugging Face has a RAM calculator BTW, so we could point that out.


<div class="admonition recommendation" markdown>

![PocketPal Logo](assets/img/ai-chat/pocketpal_dark.png){align=right}

**PocketPal** is an AI client available on both iOS and Android. It focus on customisability, ease-of use, and above all on small language models (SLMs), allowing you to run AI models on basically any Android phone. Keep in mind however that SLMs aren't useful in many domains.

Other models can easily be imported, either from a Hugging Face repo link, of from a gguf file directly. Models can also be customised, and the client also offers "Pals", which are describe as "personalized AI assistants with different personalities: but are essentially presets of a model and a system prompt. You can chose between an assitant pal and a roleplay Pal.

[:octicons-repo-16: Repository](https://github.com/a-ghorbani/pocketpal-ai?tab=readme-ov-file#pocketpal-ai-){ .md-button .md-button--primary }
[:octicons-info-16:](https://github.com/a-ghorbani/pocketpal-ai?tab=readme-ov-file#table-of-contents){ .card-link title="Documentation" }
[:octicons-code-16:](https://github.com/a-ghorbani/pocketpal-ai/){ .card-link title="Source Code" }

<details class="downloads" markdown>
<summary>Downloads</summary>

- [:simple-github: GitHub](https://github.com/a-ghorbani/pocketpal-ai/releases)
- [:simple-googleplay: Google Play](https://play.google.com/store/apps/details?id=com.pocketpalai)
- [:simple-appstore: App Store](https://apps.apple.com/us/app/pocketpal-ai/id6502579498)

</details>

</div>

### Kobold.cpp

Expand Down
12 changes: 12 additions & 0 deletions docs/tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -349,6 +349,8 @@ We [recommend](dns.md#recommended-providers) a number of encrypted DNS servers b

### AI Chat

#### Desktop clients

<div class="grid cards" markdown>

- ![Kobold logo](assets/img/ai-chat/kobold.png){ .twemoji loading=lazy } [Kobold.cpp](ai-chat.md#koboldcpp)
Expand All @@ -357,6 +359,16 @@ We [recommend](dns.md#recommended-providers) a number of encrypted DNS servers b

</div>

#### Mobile clients

<div class="grid cards" markdown>


- ![Edge logo](assets/img/ai-chat/edge_gallery_white.png){ .tweemoji loading=lazy} [Edge Gallery AI](ai-chat.md#edge-gallery-android)
- ![Pocketpal logo](assets/img/ai-chat/pocketpal_dark.png){ .tweemoji loading=lazy}[PocketPal AI](ai-chat.md#pocketpal-ai-mobile)

</div>

[Learn more :material-arrow-right-drop-circle:](ai-chat.md)

### Calendar Sync
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added theme/assets/img/ai-chat/edge_gallery_white.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added theme/assets/img/ai-chat/ollama_transparent.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added theme/assets/img/ai-chat/pocketpal_dark.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading