Skip to content

Commit

Permalink
post relase tidy up
Browse files Browse the repository at this point in the history
  • Loading branch information
baxtree committed Mar 14, 2023
1 parent 197495f commit f05f4d7
Show file tree
Hide file tree
Showing 16 changed files with 256 additions and 251 deletions.
1 change: 0 additions & 1 deletion CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ authors:
given-names: Xi
orcid: https://orcid.org/0000-0002-2177-8458
title: "Subaligner: Towards Automated Subtitle Alignment"
version: 0.2.1
doi: 10.5281/zenodo.5603083
date-released: 2021-10-28
url: "https://github.com/baxtree/subaligner"
58 changes: 16 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,12 @@ $ brew install ffmpeg
$ pip install -U pip
$ pip install subaligner
```
or install from source:
```
$ git clone [email protected]:baxtree/subaligner.git
$ cd subaligner
$ python setup.py install
```

## Installation with Optional Packages Supporting Additional Features
```
Expand Down Expand Up @@ -61,31 +67,10 @@ To install all supported features:
$ pip install 'subaligner[harmony]'
```

## Alternative Installations
```
# Install via pipx
$ pip install -U pip pipx
$ pipx install subaligner
```
or
```
# Install from GitHub via Pipenv
$ pipenv install subaligner
$ pipenv install 'subaligner[stretch]'
$ pipenv install 'subaligner[dev]'
```
or
```
# Install from source
## Container Support
If you prefer using a containerised environment over installing everything locally, run:

$ git clone [email protected]:baxtree/subaligner.git
$ cd subaligner
$ python setup.py install
```
or
```
# Use dockerised installation
$ docker run -v `pwd`:`pwd` -w `pwd` -it baxtree/subaligner bash
```
For users on Windows 10: [Docker Desktop](https://docs.docker.com/docker-for-windows/install/) is the only option at present.
Expand All @@ -99,22 +84,13 @@ docker run -v "/d/media":/media -w "/media" -it baxtree/subaligner bash
```
# Single-stage alignment (high-level shift with lower latency)
$ subaligner_1pass -v video.mp4 -s subtitle.srt
$ subaligner_1pass -v https://example.com/video.mp4 -s https://example.com/subtitle.srt -o subtitle_aligned.srt
$ subaligner -m single -v video.mp4 -s subtitle.srt
$ subaligner -m single -v https://example.com/video.mp4 -s https://example.com/subtitle.srt -o subtitle_aligned.srt
```
```
# Dual-stage alignment (low-level shift with higher latency)
$ subaligner_2pass -v video.mp4 -s subtitle.srt
$ subaligner_2pass -v https://example.com/video.mp4 -s https://example.com/subtitle.srt -o subtitle_aligned.srt
```
or
```
# Pass in single-stage or dual-stage as the alignment mode
$ subaligner -m single -v video.mp4 -s subtitle.srt
$ subaligner -m dual -v video.mp4 -s subtitle.srt
$ subaligner -m single -v https://example.com/video.mp4 -s https://example.com/subtitle.srt -o subtitle_aligned.srt
$ subaligner -m dual -v https://example.com/video.mp4 -s https://example.com/subtitle.srt -o subtitle_aligned.srt
```
```
Expand Down Expand Up @@ -142,6 +118,7 @@ $ subaligner -m dual -v video.mkv -s embedded:stream_index=0 -o subtitle_aligned
```
```
# Translative alignment with the ISO 639-3 language code pair (src,tgt)
$ subaligner --languages
$ subaligner -m single -v video.mp4 -s subtitle.srt -t src,tgt
$ subaligner -m dual -v video.mp4 -s subtitle.srt -t src,tgt
Expand Down Expand Up @@ -171,20 +148,17 @@ $ pipx run subaligner -m dual -v video.mp4 -s subtitle.srt
# Run the module as a script
$ python -m subaligner -m single -v video.mp4 -s subtitle.srt
$ python -m subaligner -m dual -v video.mp4 -s subtitle.srt
$ python -m subaligner.subaligner_1pass -v video.mp4 -s subtitle.srt
$ python -m subaligner.subaligner_2pass -v video.mp4 -s subtitle.srt
```
```
# Run alignments with the docker image
$ docker pull baxtree/subaligner
$ docker run -v `pwd`:`pwd` -w `pwd` -it baxtree/subaligner subaligner_1pass -v video.mp4 -s subtitle.srt
$ docker run -v `pwd`:`pwd` -w `pwd` -it baxtree/subaligner subaligner_2pass -v video.mp4 -s subtitle.srt
$ docker run -it baxtree/subaligner subaligner_1pass -v https://example.com/video.mp4 -s https://example.com/subtitle.srt -o subtitle_aligned.srt
$ docker run -it baxtree/subaligner subaligner_2pass -v https://example.com/video.mp4 -s https://example.com/subtitle.srt -o subtitle_aligned.srt
$ docker run -v `pwd`:`pwd` -w `pwd` -it baxtree/subaligner subaligner -m single -v video.mp4 -s subtitle.srt
$ docker run -v `pwd`:`pwd` -w `pwd` -it baxtree/subaligner subaligner -m dual -v video.mp4 -s subtitle.srt
$ docker run -it baxtree/subaligner subaligner -m single -v https://example.com/video.mp4 -s https://example.com/subtitle.srt -o subtitle_aligned.srt
$ docker run -it baxtree/subaligner subaligner -m dual -v https://example.com/video.mp4 -s https://example.com/subtitle.srt -o subtitle_aligned.srt
```
The aligned subtitle will be saved at `subtitle_aligned.srt`. For details on CLI, run `subaligner_1pass -h`, `subaligner_2pass -h` or `subaligner -h`.
Additional utilities can be used after consulting `subaligner_batch -h`, `subaligner_convert -h`, `subaligner_train -h` and `subaligner_tune -h`.
The aligned subtitle will be saved at `subtitle_aligned.srt`. For details on CLIs, run `subaligner -h` or `subaligner_batch -h`, `subaligner_convert -h`, `subaligner_train -h` and `subaligner_tune -h` for additional utilities. `subaligner_1pass` and `subaligner_2pass` are shortcuts for running `subaligner` with `-m single` and `-m dual` options, respectively.

![](figures/screencast.gif)

Expand Down
2 changes: 0 additions & 2 deletions site/source/advanced_usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,8 +64,6 @@ is present, make sure the folder passed in is empty.

(.venv) $ subaligner -m single -v video.mp4 -s subtitle.srt -tod training_output_directory
(.venv) $ subaligner -m dual -v video.mp4 -s subtitle.srt -tod training_output_directory
(.venv) $ subaligner_1pass -v video.mp4 -s subtitle.srt -tod training_output_directory
(.venv) $ subaligner_2pass -v video.mp4 -s subtitle.srt -tod training_output_directory

To apply your trained model to subtitle alignment, pass in the training_output_directory containing training results as
shown above with `-tod` or `--training_output_directory`.
Expand Down
8 changes: 4 additions & 4 deletions site/source/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,11 +53,11 @@ Installation
$ pipenv install 'subaligner[stretch]'
$ pipenv install 'subaligner[dev]'

**Use dockerised installation**::
**Container Support**::

$ docker run -v `pwd`:`pwd` -w `pwd` -it baxtree/subaligner bash

The following builds are available on dockerhub for several Linux distributions: CentOS 7 (latest and VERSION.el7), CentOS 8 (VERSION.el8), Ubuntu 18 (VERSION.u18), Ubuntu 20 (VERSION.u20), Debian 10 (VERSION.deb10), Fedora 31 (VERSION.fed31) and ArchLinux (VERSION.arch).
Users may prefer using a containerised environment over installing everything locally. The following builds are available on dockerhub for several Linux distributions: CentOS 7 (latest and VERSION.el7), CentOS 8 (VERSION.el8), Ubuntu 18 (VERSION.u18), Ubuntu 20 (VERSION.u20), Debian 10 (VERSION.deb10), Fedora 31 (VERSION.fed31) and ArchLinux (VERSION.arch).

You can also download the latest
release on `GitHub <https://github.com/baxtree/subaligner>`_ and follow the steps down below
Expand All @@ -72,8 +72,8 @@ to create a virtual environment and set up all the dependencies:
**Subaligner CLI should be on your PATH now**::

(.venv) $ subaligner --help
(.venv) $ subaligner_1pass --help
(.venv) $ subaligner_2pass --help
(.venv) $ subaligner_1pass --help # shortcut for "subaligner -m single"
(.venv) $ subaligner_2pass --help # shortcut for "subaligner -m dual"
(.venv) $ subaligner_batch --help
(.venv) $ subaligner_convert --help
(.venv) $ subaligner_train --help
Expand Down
25 changes: 6 additions & 19 deletions site/source/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,18 +14,11 @@ Make sure you have got the virtual environment activated upfront.

**Single-stage alignment (high-level shift with lower latency)**::

(.venv) $ subaligner_1pass -v video.mp4 -s subtitle.srt
(.venv) $ subaligner_1pass -v https://example.org/video.mp4 -s https://example.org/subtitle.srt -o subtitle_aligned.srt
(.venv) $ subaligner -m single -v video.mp4 -s subtitle.srt
(.venv) $ subaligner -m single -v https://example.org/video.mp4 -s https://example.org/subtitle.srt -o subtitle_aligned.srt

**Dual-stage alignment (low-level shift with higher latency)**::

(.venv) $ subaligner_2pass -v video.mp4 -s subtitle.srt
(.venv) $ subaligner_2pass -v https://example.org/video.mp4 -s https://example.org/subtitle.srt -o subtitle_aligned.srt

**Pass in single-stage or dual-stage as the alignment mode**::

(.venv) $ subaligner -m single -v video.mp4 -s subtitle.srt
(.venv) $ subaligner -m single -v https://example.org/video.mp4 -s https://example.org/subtitle.srt -o subtitle_aligned.srt
(.venv) $ subaligner -m dual -v video.mp4 -s subtitle.srt
(.venv) $ subaligner -m dual -v https://example.org/video.mp4 -s https://example.org/subtitle.srt -o subtitle_aligned.srt

Expand Down Expand Up @@ -72,10 +65,10 @@ Make sure you have got the virtual environment activated upfront.
**Run alignments with the docker image**::

$ docker pull baxtree/subaligner
$ docker run -v `pwd`:`pwd` -w `pwd` -it baxtree/subaligner subaligner_1pass -v video.mp4 -s subtitle.srt
$ docker run -v `pwd`:`pwd` -w `pwd` -it baxtree/subaligner subaligner_2pass -v video.mp4 -s subtitle.srt
$ docker run -it baxtree/subaligner subaligner_1pass -v https://example.com/video.mp4 -s https://example.com/subtitle.srt -o subtitle_aligned.srt
$ docker run -it baxtree/subaligner subaligner_2pass -v https://example.com/video.mp4 -s https://example.com/subtitle.srt -o subtitle_aligned.srt
$ docker run -v `pwd`:`pwd` -w `pwd` -it baxtree/subaligner subaligner -m single -v video.mp4 -s subtitle.srt
$ docker run -v `pwd`:`pwd` -w `pwd` -it baxtree/subaligner subaligner -m dual -v video.mp4 -s subtitle.srt
$ docker run -it baxtree/subaligner subaligner -m single -v https://example.com/video.mp4 -s https://example.com/subtitle.srt -o subtitle_aligned.srt
$ docker run -it baxtree/subaligner subaligner -m dual -v https://example.com/video.mp4 -s https://example.com/subtitle.srt -o subtitle_aligned.srt

**Run alignments with pipx**::

Expand All @@ -86,22 +79,16 @@ Make sure you have got the virtual environment activated upfront.

$ python -m subaligner -m single -v video.mp4 -s subtitle.srt
$ python -m subaligner -m dual -v video.mp4 -s subtitle.srt
$ python -m subaligner.subaligner_1pass -v video.mp4 -s subtitle.srt
$ python -m subaligner.subaligner_2pass -v video.mp4 -s subtitle.srt

Currently the stretching is experimental and make sure subaligner[stretch] is installed before switching it on with `-so`
or `--stretch_on` as shown below.

**Switch on stretching when aligning subtitles**::

(.venv) $ subaligner_2pass -v video.mp4 -s subtitle.srt -so
or
(.venv) $ subaligner -m dual -v video.mp4 -s subtitle.srt -so

**Save the aligned subtitle to a specific location**::

(.venv) $ subaligner_2pass -v video.mp4 -s subtitle.srt -o /path/to/the/output/subtitle.srt
or
(.venv) $ subaligner -m dual -v video.mp4 -s subtitle.srt -o /path/to/the/output/subtitle.srt

**On Windows**::
Expand Down
5 changes: 5 additions & 0 deletions subaligner/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
import os
import warnings
import multiprocessing as mp
from ._version import __version__

__all__ = ["__version__"]

warnings.filterwarnings("ignore")
warnings.simplefilter("ignore")

mp.set_start_method("spawn", force=True)
os.environ["KMP_WARNINGS"] = "0"
44 changes: 34 additions & 10 deletions subaligner/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,15 @@
[-sil {afr,amh,ara,arg,asm,aze,ben,bos,bul,cat,ces,cmn,cym,dan,deu,ell,eng,epo,est,eus,fas,fin,fra,gla,gle,glg,grc,grn,guj,heb,hin,hrv,hun,hye,ina,ind,isl,ita,jbo,jpn,kal,kan,kat,kir,kor,kur,lat,lav,lfn,lit,mal,mar,mkd,mlt,msa,mya,nah,nep,nld,nor,ori,orm,pan,pap,pol,por,ron,rus,sin,slk,slv,spa,sqi,srp,swa,swe,tam,tat,tel,tha,tsn,tur,ukr,urd,vie,yue,zho}]
[-fos] [-tod TRAINING_OUTPUT_DIRECTORY] [-o OUTPUT] [-t TRANSLATE] [-os OFFSET_SECONDS]
[-ml {afr,amh,ara,arg,asm,aze,ben,bos,bul,cat,ces,cmn,cym,dan,deu,ell,eng,epo,est,eus,fas,fin,fra,gla,gle,glg,grc,grn,guj,heb,hin,hrv,hun,hye,ina,ind,isl,ita,jbo,jpn,kal,kan,kat,kir,kor,kur,lat,lav,lfn,lit,mal,mar,mkd,mlt,msa,mya,nah,nep,nld,nor,ori,orm,pan,pap,pol,por,ron,rus,sin,slk,slv,spa,sqi,srp,swa,swe,tam,tat,tel,tha,tsn,tur,ukr,urd,vie,yue,zho}]
[-mr {whisper}] [-mf {tiny,tiny.en,small,medium,medium.en,base,base.en,large-v1,large-v2,large}] [-lgs] [-d] [-q] [-ver]
[-mr {whisper}] [-mf {tiny,tiny.en,small,medium,medium.en,base,base.en,large-v1,large-v2,large}] [-tr {helsinki-nlp,whisper}] [-tf TRANSLATION_FLAVOUR] [-lgs]
[-d] [-q] [-ver]
Subaligner command line interface
optional arguments:
-h, --help show this help message and exit
-s SUBTITLE_PATH [SUBTITLE_PATH ...], --subtitle_path SUBTITLE_PATH [SUBTITLE_PATH ...]
File path or URL to the subtitle file (Extensions of supported subtitles: .ssa, .vtt, .srt, .txt, .smi, .ytt, .sub, .xml, .sbv, .ass, .sami, .scc, .tmp, .stl, .ttml, .dfxp) or selector for the embedded subtitle (e.g., embedded:page_num=888 or embedded:stream_index=0)
File path or URL to the subtitle file (Extensions of supported subtitles: .ttml, .sub, .ytt, .smi, .sami, .tmp, .txt, .ssa, .vtt, .stl, .xml, .ass, .scc, .dfxp, .sbv, .srt) or selector for the embedded subtitle (e.g., embedded:page_num=888 or embedded:stream_index=0)
-l MAX_LOGLOSS, --max_logloss MAX_LOGLOSS
Max global log loss for alignment
-so, --stretch_on Switch on stretch on subtitles)
Expand All @@ -32,7 +33,11 @@
-mr {whisper}, --llm_recipe {whisper}
LLM recipe used for transcribing video files
-mf {tiny,tiny.en,small,medium,medium.en,base,base.en,large-v1,large-v2,large}, --llm_flavour {tiny,tiny.en,small,medium,medium.en,base,base.en,large-v1,large-v2,large}
Flavour variation for a specific LLM recipe
Flavour variation for a specific LLM recipe supporting transcription
-tr {helsinki-nlp,whisper}, --translation_recipe {helsinki-nlp,whisper}
LLM recipe used for translating subtitles
-tf TRANSLATION_FLAVOUR, --translation_flavour TRANSLATION_FLAVOUR
Flavour variation for a specific LLM recipe supporting translation
-lgs, --languages Print out language codes used for stretch and translation
-d, --debug Print out debugging information
-q, --quiet Switch off logging information
Expand Down Expand Up @@ -152,21 +157,40 @@ def main():
choices=Utils.get_stretch_language_codes(),
help="Target video's main language as an ISO 639-3 language code [https://en.wikipedia.org/wiki/List_of_ISO_639-3_codes]",
)
from subaligner.llm import TranscriptionRecipe
from subaligner.llm import WhisperFlavour
parser.add_argument(
"-mr",
"--llm_recipe",
type=str.lower,
default="whisper",
choices=["whisper"],
default=TranscriptionRecipe.WHISPER.value,
choices=[r.value for r in TranscriptionRecipe],
help="LLM recipe used for transcribing video files"
)
parser.add_argument(
"-mf",
"--llm_flavour",
type=str.lower,
default="small",
choices=["tiny", "tiny.en", "small", "medium", "medium.en", "base", "base.en", "large-v1", "large-v2", "large"],
help="Flavour variation for a specific LLM recipe"
default=WhisperFlavour.SMALL.value,
choices=[wf.value for wf in WhisperFlavour],
help="Flavour variation for a specific LLM recipe supporting transcription"
)
from subaligner.llm import TranslationRecipe
from subaligner.llm import HelsinkiNLPFlavour
parser.add_argument(
"-tr",
"--translation_recipe",
type=str.lower,
default=TranslationRecipe.HELSINKI_NLP.value,
choices=[r.value for r in TranslationRecipe],
help="LLM recipe used for translating subtitles"
)
parser.add_argument(
"-tf",
"--translation_flavour",
type=str.lower,
default=None,
help="Flavour variation for a specific LLM recipe supporting translation"
)
parser.add_argument("-lgs", "--languages", action="store_true",
help="Print out language codes used for stretch and translation")
Expand Down Expand Up @@ -312,8 +336,8 @@ def main():
if FLAGS.translate is not None:
from subaligner.translator import Translator
source, target = FLAGS.translate.split(",")
translator = Translator(source, target)
aligned_subs = translator.translate(aligned_subs)
translator = Translator(src_language=source, tgt_language=target, recipe=FLAGS.translation_recipe, flavour=FLAGS.translation_flavour)
aligned_subs = translator.translate(aligned_subs, local_video_path)
Subtitle.save_subs_as_target_format(aligned_subs, local_subtitle_path, aligned_subtitle_path,
frame_rate, "utf-8")
elif FLAGS.mode == "transcribe":
Expand Down
2 changes: 1 addition & 1 deletion subaligner/_version.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
"""The semver for the current release."""
__version__ = "0.3.0"
__version__ = "0.3.1"
4 changes: 4 additions & 0 deletions subaligner/exception.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,9 @@ class NoFrameRateException(Exception):
""" An exception raised due to frame rate not found."""


class TranslationException(Exception):
""" An exception raised due to translation failures."""


class TranscriptionException(Exception):
""" An exception raised due to transcription failures."""
29 changes: 29 additions & 0 deletions subaligner/llm.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
from enum import Enum


class TranscriptionRecipe(Enum):
WHISPER = "whisper"


class TranslationRecipe(Enum):
HELSINKI_NLP = "helsinki-nlp"
WHISPER = "whisper"


class WhisperFlavour(Enum):
TINY = "tiny"
TINY_EN = "tiny.en"
SMALL = "small"
MEDIUM = "medium"
MEDIUM_EN = "medium.en"
BASE = "base"
BASE_EN = "base.en"
LARGE_V1 = "large-v1"
LARGE_V2 = "large-v2"
LARGE = "large"


class HelsinkiNLPFlavour(Enum):
OPUS_MT = "Helsinki-NLP/opus-mt-{}-{}"
OPUS_MT_TC_BIG = "Helsinki-NLP/opus-mt-tc-big-{}-{}"
OPUS_TATOEBA = "Helsinki-NLP/opus-tatoeba-{}-{}"
Loading

0 comments on commit f05f4d7

Please sign in to comment.