Skip to content

Commit 0f8ca2d

Browse files
authored
Merge pull request #2197 from juliensimon/arcee-foundation-model-on-gcp
New learning path: Deploy Arcee AFM-4.5B on Google Axion - final version
2 parents 599a160 + 119e059 commit 0f8ca2d

11 files changed

+894
-0
lines changed
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
---
2+
title: Overview
3+
weight: 2
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## The AFM-4.5B model
10+
11+
[AFM-4.5B](https://huggingface.co/arcee-ai/AFM-4.5B) is a 4.5-billion-parameter foundation model designed to balance accuracy, efficiency, and broad language coverage. Trained on nearly 8 trillion tokens of carefully filtered data, it performs well across a wide range of languages, including Arabic, English, French, German, Hindi, Italian, Korean, Mandarin, Portuguese, Russian, and Spanish.
12+
13+
In this Learning Path, you'll deploy [AFM-4.5B](https://huggingface.co/arcee-ai/AFM-4.5B) using [Llama.cpp](https://github.com/ggerganov/llama.cpp) on an Arm-based Google Cloud Axion instance. You’ll walk through the full workflow, from setting up your environment and compiling the runtime, to downloading, quantizing, and running inference on the model. You'll also evaluate model quality using perplexity, a common metric for measuring how well a language model predicts text.
14+
15+
This hands-on guide helps developers build cost-efficient, high-performance LLM applications on modern Arm server infrastructure using open-source tools and real-world deployment practices.
16+
17+
### LLM deployment workflow on Google Axion
18+
19+
- **Provision compute**: launch a Google Cloud instance using an Axion-based instance type (for example, `c4a-standard-16`)
20+
21+
- **Set up your environment**: install the required build tools and dependencies (such as CMake, Python, and Git)
22+
23+
- **Build the inference engine**: clone the [Llama.cpp](https://github.com/ggerganov/llama.cpp) repository and compile the project for your Arm-based environment
24+
25+
- **Prepare the model**: download the **AFM-4.5B** model files from Hugging Face and use Llama.cpp's quantization tools to reduce model size and optimize performance
26+
27+
- **Run inference**: load the quantized model and run sample prompts using Llama.cpp.
28+
29+
- **Evaluate model quality**: calculate **perplexity** or use other metrics to assess model performance
30+
31+
{{< notice Note>}}
32+
You can reuse this deployment flow with other models supported by Llama.cpp by swapping out the model file and adjusting quantization settings.
33+
{{< /notice >}}
34+
35+
36+
37+
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
---
2+
title: Provision your Axion environment
3+
weight: 3
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Requirements
10+
11+
Before you begin, make sure you have the following:
12+
13+
- A Google Cloud account
14+
- Permission to launch a Compute Engine Axion instance of type `c4a-standard-16` (or larger)
15+
- At least 128 GB of available storage
16+
17+
If you're new to Google Cloud, check out the Learning Path [Getting Started with Google Cloud](/learning-paths/servers-and-cloud-computing/csp/google/).
18+
19+
## Launch and configure the Compute Engine instance
20+
21+
In the left sidebar of the [Compute Engine dashboard](https://console.cloud.google.com/compute), select **VM instances**, and then **Create instance**.
22+
23+
Use the following settings to configure your instance:
24+
25+
- **Name**: `arcee-axion-instance`
26+
- **Region** and **Zone**: the region and zone where you have access to c4a instances
27+
- Select **General purpose**, then click **C4A**
28+
- **Machine type**: c4a-standard-16 or larger
29+
30+
## Configure OS and Storage
31+
32+
In the left sidebar, select **OS and storage**.
33+
34+
Under **Operating system and storage**, click on **Change**
35+
36+
Set the size of the disk to 128 GB, then click on **Select**.
37+
38+
## Review and launch the instance
39+
40+
Leave the other settings as they are.
41+
42+
When you're ready, click on **Create** to create your Compute Engine instance.
43+
44+
## Monitor the instance launch
45+
46+
After a few seconds, you should see that your instance is ready.
47+
48+
If the launch fails, double-check your settings and permissions, and try again.
49+
50+
## Connect to your instance
51+
52+
Open the **SSH** dropdown list, and select **Open in browser window**.
53+
54+
Your browser may ask you to authenticate. Once you've done that, a terminal window will open.
55+
56+
You are now connected to your Ubuntu instance running on Axion.
57+
58+
{{% notice Note %}}
59+
**Region**: make sure you're launching in your preferred Google Cloud region.
60+
**Storage**: 128 GB is sufficient for the AFM-4.5B model and dependencies.
61+
{{% /notice %}}
62+
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
---
2+
title: Configure your Axion environment
3+
weight: 4
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
In this step, you'll set up the Axion instance with the tools and dependencies required to build and run the Arcee Foundation Model. This includes installing system packages and a Python environment.
10+
11+
## Update the package list
12+
13+
Run the following command to update your local APT package index:
14+
15+
```bash
16+
sudo apt-get update
17+
```
18+
19+
This step ensures you have the most recent metadata about available packages, including versions and dependencies. It helps prevent conflicts when installing new packages.
20+
21+
## Install system dependencies
22+
23+
Install the build tools and Python environment:
24+
25+
```bash
26+
sudo apt-get install cmake gcc g++ git python3 python3-pip python3-virtualenv libcurl4-openssl-dev unzip -y
27+
```
28+
29+
This command installs the following tools and dependencies:
30+
31+
- **CMake**: cross-platform build system generator used to compile and build Llama.cpp
32+
33+
- **GCC and G++**: GNU C and C++ compilers for compiling native code
34+
35+
- **Git**: version control system for cloning repositories
36+
37+
- **Python 3**: Python interpreter for running Python-based tools and scripts
38+
39+
- **Pip**: Python package manager
40+
41+
- **Virtualenv**: tool for creating isolated Python environments
42+
43+
- **libcurl4-openssl-dev**: development files for the curl HTTP library
44+
45+
- **Unzip**: tool to extract `.zip` files (used in some model downloads)
46+
47+
The `-y` flag automatically approves the installation of all packages without prompting.
48+
49+
## Ready for build and deployment
50+
51+
After completing the setup, your instance includes the following tools and environments:
52+
53+
- A complete C/C++ development environment for building Llama.cpp
54+
- Python 3, pip, and virtualenv for managing Python tools and environments
55+
- Git for cloning repositories
56+
- All required dependencies for compiling optimized Arm64 binaries
57+
58+
You're now ready to build Llama.cpp and download the Arcee Foundation Model.
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
---
2+
title: Build Llama.cpp
3+
weight: 5
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
## Build the Llama.cpp inference engine
9+
10+
In this step, you'll build Llama.cpp from source. Llama.cpp is a high-performance C++ implementation of the LLaMA model, optimized for inference on a range of hardware platforms, including Arm-based processors like Google Axion.
11+
12+
Even though AFM-4.5B uses a custom model architecture, you can still use the standard Llama.cpp repository - Arcee AI has contributed the necessary modeling code upstream.
13+
14+
## Clone the repository
15+
16+
```bash
17+
git clone https://github.com/ggerganov/llama.cpp
18+
```
19+
20+
This command clones the Llama.cpp repository from GitHub to your local machine. The repository contains the source code, build scripts, and documentation needed to compile the inference engine.
21+
22+
## Navigate to the project directory
23+
24+
```bash
25+
cd llama.cpp
26+
```
27+
28+
Change into the llama.cpp directory to run the build process. This directory contains the `CMakeLists.txt` file and all source code.
29+
30+
## Configure the build with CMake
31+
32+
```bash
33+
cmake -B .
34+
```
35+
36+
This command configures the build system using CMake:
37+
38+
- `-B .` tells CMake to generate build files in the current directory
39+
- CMake detects your system's compiler, libraries, and hardware capabilities
40+
- It produces Makefiles (on Linux) or platform-specific build scripts for compiling the project
41+
42+
43+
If you're running on Axion, the CMake output should include hardware-specific optimizations targeting the Neoverse V2 architecture. These optimizations are crucial for achieving high performance on Axion:
44+
45+
```output
46+
-- ARM feature DOTPROD enabled
47+
-- ARM feature SVE enabled
48+
-- ARM feature MATMUL_INT8 enabled
49+
-- ARM feature FMA enabled
50+
-- ARM feature FP16_VECTOR_ARITHMETIC enabled
51+
-- Adding CPU backend variant ggml-cpu: -mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+dotprod+i8mm+sve
52+
```
53+
54+
These features enable advanced CPU instructions that accelerate inference performance on Arm64:
55+
56+
- **DOTPROD: Dot Product**: hardware-accelerated dot product operations for neural network workloads
57+
58+
- **SVE (Scalable Vector Extension)**: advanced vector processing capabilities that can handle variable-length vectors up to 2048 bits, providing significant performance improvements for matrix operations
59+
60+
- **MATMUL_INT8**: integer matrix multiplication units optimized for transformers
61+
62+
- **FMA**: fused multiply-add operations to speed up floating-point math
63+
64+
- **FP16 vector arithmetic**: 16-bit floating-point vector operations to reduce memory use without compromising precision
65+
66+
## Compile the project
67+
68+
```bash
69+
cmake --build . --config Release -j16
70+
```
71+
72+
This command compiles the Llama.cpp source code:
73+
74+
- `--build .` tells CMake to build the project in the current directory
75+
- `--config Release` enables optimizations and strips debug symbols
76+
- `-j16` runs the build with 16 parallel jobs, which speeds up compilation on multi-core systems like Axion.
77+
78+
The build process compiles the C++ source code into executable binaries optimized for the Arm64 architecture. Compilation typically takes under a minute.
79+
80+
## Key binaries after compilation
81+
82+
After compilation, you'll find several key command-line tools in the `bin` directory:
83+
- `llama-cli`: the main inference executable for running LLaMA models
84+
- `llama-server`: a web server for serving model inference over HTTP
85+
- `llama-quantize`: a tool for model quantization to reduce memory usage
86+
- Additional utilities for model conversion and optimization
87+
88+
You can find more tools and usage details in the llama.cpp [GitHub repository](https://github.com/ggml-org/llama.cpp/tree/master/tools).
89+
90+
These binaries are specifically optimized for the Arm architecture and will provide excellent performance on your Axion instance.
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
---
2+
title: Install Python dependencies
3+
weight: 6
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
## Overview
9+
10+
In this step, you'll set up a Python virtual environment and install the required dependencies for working with Llama.cpp. This ensures you have a clean, isolated Python environment with all the necessary packages for model optimization.
11+
12+
## Create a Python virtual environment
13+
14+
```bash
15+
virtualenv env-llama-cpp
16+
```
17+
18+
This command creates a new Python virtual environment named `env-llama-cpp`, which has the following benefits:
19+
- Provides an isolated Python environment to prevent package conflicts between projects
20+
- Creates a local directory containing its own Python interpreter and installation space
21+
- Ensures Llama.cpp dependencies don’t interfere with your global Python setup
22+
- Supports reproducible and portable development environments
23+
24+
## Activate the virtual environment
25+
26+
Run the following command to activate the virtual environment:
27+
28+
```bash
29+
source env-llama-cpp/bin/activate
30+
```
31+
This command does the following:
32+
33+
- Runs the activation script, which modifies your shell environment
34+
- Updates your shell prompt to show `env-llama-cpp`, indicating the environment is active
35+
- Updates `PATH` to use so the environment’s Python interpreter
36+
- Ensures all `pip` commands install packages into the isolated environment
37+
38+
## Upgrade pip to the latest version
39+
40+
Before installing dependencies, it’s a good idea to upgrade pip:
41+
42+
```bash
43+
pip install --upgrade pip
44+
```
45+
This command:
46+
47+
- Ensures you have the latest version of pip
48+
- Helps avoid compatibility issues with modern packages
49+
- Applies the `--upgrade` flag to fetch and install the newest release
50+
- Brings in security patches and better dependency resolution logic
51+
52+
## Install project dependencies
53+
54+
Use the following command to install all required Python packages:
55+
56+
```bash
57+
pip install -r requirements.txt
58+
```
59+
60+
This command does the following:
61+
62+
- Uses the `-r` flag to read the list of dependencies from `requirements.txt`
63+
- Installs the exact package versions required for the project
64+
- Ensures consistency across development environments and contributors
65+
- Includes packages for model loading, inference, and Python bindings for `llama.cpp`
66+
67+
This step sets up everything you need to run AFM-4.5B in your Python environment.
68+
69+
## What the environment includes
70+
71+
After the installation completes, your virtual environment includes:
72+
- **NumPy**: for numerical computations and array operations
73+
- **Requests**: for HTTP operations and API calls
74+
- **Other dependencies**: additional packages required by llama.cpp's Python bindings and utilities
75+
Your environment is now ready to run Python scripts that integrate with the compiled Llama.cpp binaries
76+
77+
{{< notice Tip >}}
78+
Before running any Python commands, make sure your virtual environment is activated. {{< /notice >}}
79+
80+

0 commit comments

Comments
 (0)