Skip to content

Commit 5cc83cc

Browse files
authored
Merge pull request #71 from mutablelogic/djt-0604-api
Update whisper command
2 parents 678cd8e + fcbecdf commit 5cc83cc

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

56 files changed

+2630
-766
lines changed

Makefile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ ifeq ($(GGML_VULKAN),1)
4848
endif
4949

5050
# Targets
51-
all: whisper api
51+
all: whisper
5252

5353
# Generate the pkg-config files
5454
generate: mkdir go-tidy libwhisper
@@ -99,6 +99,7 @@ docker: docker-dep submodule
9999
--build-arg SOURCE=${BUILD_MODULE} \
100100
--build-arg VERSION=${VERSION} \
101101
--build-arg GGML_CUDA=${GGML_CUDA} \
102+
--build-arg GGML_VULKAN=${GGML_VULKAN} \
102103
-f ${DOCKER_FILE} .
103104

104105
# Push docker container

README.md

Lines changed: 120 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,41 @@
11
# go-whisper
22

3-
Speech-to-Text in golang. This is an early development version.
3+
[![Go Reference](https://pkg.go.dev/badge/github.com/mutablelogic/go-whisper.svg)](https://pkg.go.dev/github.com/mutablelogic/go-whisper)
4+
[![License](https://img.shields.io/badge/license-Apache-blue.svg)](LICENSE)
45

5-
* `cmd` contains an OpenAI-API compatible service
6-
* `pkg` contains the `whisper` service and client
7-
* `sys` contains the `whisper` bindings to the `whisper.cpp` library
8-
* `third_party` is a submodule for the whisper.cpp source
6+
Speech-to-Text in golang using [whisper.cpp](https://github.com/ggerganov/whisper.cpp).
97

10-
## Running
8+
## Features
119

12-
You can either run the whisper service as a CLI command or in a docker container.
13-
There are docker images for arm64 and amd64 (Intel). The arm64 image is built for
14-
Jetson GPU support specifically, but it will also run on Raspberry Pi's.
10+
- **Transcription & Translation**: Easily transcribe audio files and translate them to English
11+
- **Providers**: Use models from OpenAI, ElevenLabs, and HuggingFace
12+
- **Command Line Interface**: Simple CLI for transcription and managing models
13+
- **HTTP API Server**: OpenAPI-compatible server with streaming support
14+
- **Model Management**: Download, list, and delete models
15+
- **GPU Acceleration**: Support for CUDA, Vulkan, and Metal (macOS) acceleration
16+
- **Docker Support**: Pre-built images for amd64 and arm64 architectures
1517

16-
In order to utilize a NVIDIA GPU, you'll need to install the
18+
## Project Structure
19+
20+
- `cmd` contains the command-line tool, which can also be run as an OpenAPI-compatible HTTP server
21+
- `pkg` contains the `whisper` service and client
22+
- `sys` contains the `whisper` bindings to the `whisper.cpp` library
23+
- `third_party` is a submodule for the whisper.cpp source, and ffmpeg bindings
24+
25+
The following sections describe how to use whisper on the command-line, run it as a service,
26+
download a model, run the server, and build the project.
27+
28+
## Using Docker
29+
30+
You can run whisper as a CLI command or in a Docker container.
31+
There are Docker images for arm64 and amd64 (Intel). There is support for CUDA and Vulkan, but
32+
some features are still under development.
33+
34+
In order to utilize an NVIDIA GPU, you'll need to install the
1735
[NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) first.
1836

19-
A docker volume should be created called "whisper" can be used for storing the Whisper language
20-
models. You can see which models are available to download locally [here](https://huggingface.co/ggerganov/whisper.cpp).
37+
A Docker volume called "whisper" can be used for storing the Whisper language
38+
models. You can see which models are available to download from the [HuggingFace whisper.cpp repository](https://huggingface.co/ggerganov/whisper.cpp).
2139

2240
The following command will run the server on port 8080 for an NVIDIA GPU:
2341

@@ -26,98 +44,147 @@ docker run \
2644
--name whisper-server --rm \
2745
--runtime nvidia --gpus all \ # When using a NVIDIA GPU
2846
-v whisper:/data -p 8080:80 \
29-
ghcr.io/mutablelogic/go-whisper:latest
47+
ghcr.io/mutablelogic/go-whisper:latest-cuda
3048
```
3149

32-
The API is then
33-
available at `http://localhost:8080/v1` and it generally conforms to the
34-
[OpenAI API](https://platform.openai.com/docs/api-reference/audio) spec.
50+
The API is then available at `http://localhost:8080/api/v1` and it generally conforms to the [OpenAI API](https://platform.openai.com/docs/api-reference/audio) spec.
51+
52+
## API Examples
3553

36-
### Sample Usage
54+
The API is available through the server and conforms generally to the OpenAI API spec. Here are some common usage examples:
3755

38-
In order to download a model, you can use the following command (for example):
56+
### Download a model
3957

4058
```bash
41-
curl -X POST -H "Content-Type: application/json" -d '{"Path" : "ggml-medium-q5_0.bin" }' localhost:8080/v1/models\?stream=true
59+
curl -X POST -H "Content-Type: application/json" \
60+
-d '{"path": "ggml-medium-q5_0.bin"}' \
61+
localhost:8080/v1/models?stream=true
4262
```
4363

44-
To list the models available, you can use the following command:
64+
### List available models
4565

4666
```bash
4767
curl -X GET localhost:8080/v1/models
4868
```
4969

50-
To delete a model, you can use the following command:
70+
### Delete a model
5171

5272
```bash
5373
curl -X DELETE localhost:8080/v1/models/ggml-medium-q5_0
5474
```
5575

56-
To transcribe a media file into it's original language, you can use the following command:
76+
### Transcribe an audio file
5777

5878
```bash
59-
curl -F model=ggml-medium-q5_0 -F file=@samples/jfk.wav localhost:8080/v1/audio/transcriptions\?stream=true
79+
curl -F model=ggml-medium-q5_0 \
80+
-F file=@samples/jfk.wav \
81+
localhost:8080/v1/audio/transcriptions?stream=true
6082
```
6183

62-
To translate a media file into a different language, you can use the following command:
84+
### Translate an audio file to English
6385

6486
```bash
65-
curl -F model=ggml-medium-q5_0 -F file=@samples/de-podcast.wav -F language=en localhost:8080/v1/audio/translations\?stream=true
87+
curl -F model=ggml-medium-q5_0 \
88+
-F file=@samples/de-podcast.wav \
89+
-F language=en \
90+
localhost:8080/v1/audio/translations?stream=true
6691
```
6792

68-
There's more information on the API [here](doc/API.md).
93+
For more detailed API documentation, see the [API Reference](doc/API.md).
6994

7095
## Building
7196

72-
If you are building a docker image, you just need make and docker installed:
97+
### Docker Images
98+
99+
If you are building a Docker image, you just need make and Docker installed:
73100

74-
* `DOCKER_REGISTRY=docker.io/user make docker` - builds a docker container with the
75-
server binary for CUDA, tagged to a specific registry
76-
* `OS=linux GGML_CUDA=0 DOCKER_REGISTRY=docker.io/user make docker` - builds a docker container
77-
for Linux, with the server binary without CUDA, tagged to a specific registry
101+
- `GGML_CUDA=1 DOCKER_REGISTRY=docker.io/user make docker` - builds a Docker container with the server binary for CUDA, tagged to a specific registry
102+
- `OS=linux GGML_CUDA=0 DOCKER_REGISTRY=docker.io/user make docker` - builds a Docker container for Linux, with the server binary without CUDA, tagged to a specific registry
78103

79-
If you want to build the server without docker, you can use the `Makefile` in the root
104+
### From Source
105+
106+
If you want to build the server without Docker, you can use the `Makefile` in the root
80107
directory and have the following dependencies met:
81108

82-
* Recent version of Go (ie, 1.22+)
83-
* C++ compiler and cmake
84-
* FFmpeg 6.1 libraries (see [here](doc/build.md) for more information)
85-
* For CUDA, you'll need the CUDA toolkit installed including the `nvcc` compiler
109+
- Recent version of Go (ie, 1.22+)
110+
- C++ compiler and cmake
111+
- For CUDA, you'll need the CUDA toolkit installed including the `nvcc` compiler
86112

87113
The following `Makefile` targets can be used:
88114

89-
* `make server` - creates the server binary, and places it in the `build` directory. Should
90-
link to Metal on macOS
91-
* `GGML_CUDA=1 make server` - creates the server binary linked to CUDA, and places it
92-
in the `build` directory. Should work for amd64 and arm64 (Jetson) platforms
115+
- `OS=linux make whisper` - creates the server binary, and places it in the `build` directory. Should link to Metal on macOS
116+
- `OS=linux GGML_CUDA=1 make whisper` - creates the server binary linked to CUDA, and places it in the `build` directory. Should work for amd64 and arm64 (Jetson) platforms
93117

94118
See all the other targets in the `Makefile` for more information.
95119

96-
## Developing
120+
## Command Line Usage
97121

98-
TODO
122+
The `whisper` command-line tool can be built with `make whisper` and provides various functionalities.
99123

100-
## Status
124+
```bash
125+
# List available models
126+
whisper models
101127

102-
Still in development. See this [issue](https://github.com/mutablelogic/go-whisper/issues/1) for
103-
remaining tasks to be completed.
128+
# Download a model
129+
whisper download ggml-medium-q5_0.bin
130+
131+
# Delete a model
132+
whisper delete ggml-medium-q5_0
133+
134+
# Transcribe an audio file
135+
whisper transcribe ggml-medium-q5_0 samples/jfk.wav
136+
137+
# Translate an audio file to English
138+
whisper translate ggml-medium-q5_0 samples/de-podcast.wav
104139

105-
## Contributing & Distribution
140+
# Run the whisper server
141+
whisper server --listen localhost:8080
142+
```
143+
144+
You can also access transcription and translation functionalities from OpenAI-compatible HTTP endpoints, and ElevenLabs-compatible endpoints:
106145

107-
__This module is currently in development and subject to change.__
146+
- Set `OPENAI_API_KEY` environment variable to your OpenAI API key to use the OpenAI-compatible endpoints.
147+
- Set `ELEVENLABS_API_KEY` environment variable to your ElevenLabs API key
148+
- Set `WHISPER_URL` environment variable to the URL of the whisper server to use the OpenAI-compatible endpoints.
149+
150+
```bash
151+
# List available remote models (including OpenAI and ElevenLabs models)
152+
whisper models --remote
108153

109-
Please do file feature requests and bugs [here](https://github.com/mutablelogic/go-whisper/issues).
154+
# Download a model
155+
whisper download ggml-medium-q5_0.bin --remote
156+
157+
# Transcribe an audio file for subtitles (ElevenLabs)
158+
whisper transcribe scribe_v1 samples/jfk.wav --format srt --diarize --remote
159+
160+
# Translate an audio file to English (OpenAI)
161+
whisper translate whisper-1 samples/de-podcast.wav --remote
162+
```
163+
164+
## Development Status
165+
166+
This project is currently in development and subject to change. See this [GitHub issue](https://github.com/mutablelogic/go-whisper/issues/1) for
167+
remaining tasks to be completed.
168+
169+
## Contributing & License
170+
171+
Please file feature requests and bugs in the [GitHub issues](https://github.com/mutablelogic/go-whisper/issues).
110172
The license is Apache 2 so feel free to redistribute. Redistributions in either source
111173
code or binary form must reproduce the copyright notice, and please link back to this
112174
repository for more information:
113175

114-
> __go-whisper__\
176+
> **go-whisper**\
115177
> [https://github.com/mutablelogic/go-whisper/](https://github.com/mutablelogic/go-whisper/)\
116-
> Copyright (c) 2023-2024 David Thorpe, All rights reserved.
178+
> Copyright (c) David Thorpe, All rights reserved.
117179
>
118-
> __whisper.cpp__\
180+
> **whisper.cpp**\
119181
> [https://github.com/ggerganov/whisper.cpp](https://github.com/ggerganov/whisper.cpp)\
120-
> Copyright (c) 2023-2024 The ggml authors
182+
> Copyright (c) The ggml authors
183+
>
184+
> **ffmpeg**\
185+
> [https://ffmpeg.org/](https://ffmpeg.org/)\
186+
> Copyright (c) the FFmpeg developers
121187
122188
This software links to static libraries of [whisper.cpp](https://github.com/ggerganov/whisper.cpp) licensed under
123-
the [MIT License](http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html).
189+
the [MIT License](http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html). This software links to static libraries of ffmpeg licensed under the
190+
[LGPL 2.1 License](http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html).

cmd/api/delete.go

Lines changed: 0 additions & 12 deletions
This file was deleted.

cmd/api/download.go

Lines changed: 0 additions & 23 deletions
This file was deleted.

0 commit comments

Comments
 (0)