Skip to content

Update .cnb.yml to fetch en-zh model URLs from new models.json API#14

Merged
Aalivexy merged 2 commits intomainfrom
copilot/update-cnb-ci-file
Jan 30, 2026
Merged

Update .cnb.yml to fetch en-zh model URLs from new models.json API#14
Aalivexy merged 2 commits intomainfrom
copilot/update-cnb-ci-file

Conversation

Copy link
Contributor

Copilot AI commented Jan 30, 2026

The hardcoded model download URLs in .cnb.yml are no longer valid. Mozilla moved the translation models to a new location with a dynamic manifest at https://storage.googleapis.com/moz-fx-translations-data--303e-prod-translations-data/db/models.json.

Changes

  • Dynamic URL resolution: Fetch models.json and extract baseUrl + file paths for the en-zh model using jq
  • Error handling: Added set -e, curl -fsSL, and validation for null/empty JSON values
  • Preserved output structure: Files still extracted to models-enzh/enzh/ with same filenames
# Extract paths from models.json
model_data=$(jq -r '.models."en-zh"[0]' models.json)
lex_path=$(echo "$model_data" | jq -r '.files.lexicalShortlist.path')
model_path=$(echo "$model_data" | jq -r '.files.model.path')
src_vocab_path=$(echo "$model_data" | jq -r '.files.srcVocab.path')
trg_vocab_path=$(echo "$model_data" | jq -r '.files.trgVocab.path')
Original prompt

On .cnb.yml, .cnb.yml中是cnb平台的ci文件,目前里面的连接已经失效了,新的模型下载在下面连接中:

https://storage.googleapis.com/moz-fx-translations-data--303e-prod-translations-data/db/models.json

你自己下载下来读一读,它的结构有点复杂。只需要en-zh的版本就行,路径应该是.models."en-zh"

cat models.json | jq .models.\"en-zh\"
[
  {
    "architecture": "base",
    "releaseStatus": null,
    "sourceLanguage": "en",
    "targetLanguage": "zh",
    "files": {
      "lexicalShortlist": {
        "path": "models/en-zh/cjk_split_vocab_e3B-g-FeQSyTW33DUj2Btw/exported/lex.50.50.enzh.s2t.bin.gz"
      },
      "model": {
        "path": "models/en-zh/cjk_split_vocab_e3B-g-FeQSyTW33DUj2Btw/exported/model.enzh.intgemm.alphas.bin.gz",
        "uncompressedSize": 42992955,
        "uncompressedHash": "ce4486f728641a36269a245248dcb53405e76d96d9eba68dcb4172f29521e092"
      },
      "srcVocab": {
        "path": "models/en-zh/cjk_split_vocab_e3B-g-FeQSyTW33DUj2Btw/exported/srcvocab.enzh.spm.gz"
      },
      "trgVocab": {
        "path": "models/en-zh/cjk_split_vocab_e3B-g-FeQSyTW33DUj2Btw/exported/trgvocab.enzh.spm.gz"
      }
    },
    "modelStatistics": {
      "parameters": 42686774,
      "encoderParameters": 18914340,
      "decoderParameters": 7388432
    },
    "metrics": {
      "flores200-plus": {
        "chrf": 35.0897,
        "chrfpp": 26.6246,
        "comet22": 0.8556,
        "spbleu": 34.223,
        "llm-ref": 4.266,
        "metricx24-qe": 2.1428,
        "metricx24": 2.4542
      }
    }
  },
  {
    "architecture": "base-memory",
    "releaseStatus": "Release",
    "sourceLanguage": "en",
    "targetLanguage": "zh",
    "files": {
      "lexicalShortlist": {
        "path": "models/en-zh/zh_hant_llmaat_finetune10M_qe8_f2_aQ8azdOMQOSBVjBDOVDIZQ/exported/lex.50.50.enzh.s2t.bin.gz"
      },
      "model": {
        "path": "models/en-zh/zh_hant_llmaat_finetune10M_qe8_f2_aQ8azdOMQOSBVjBDOVDIZQ/exported/model.enzh.intgemm.alphas.bin.gz",
        "uncompressedSize": 43849787,
        "uncompressedHash": "559ab90d723a58c1f1e2ab7cc12137bc667af5ba3e325e3eb30b5cdc930db520"
      },
      "srcVocab": {
        "path": "models/en-zh/zh_hant_llmaat_finetune10M_qe8_f2_aQ8azdOMQOSBVjBDOVDIZQ/exported/srcvocab.enzh.spm.gz"
      },
      "trgVocab": {
        "path": "models/en-zh/zh_hant_llmaat_finetune10M_qe8_f2_aQ8azdOMQOSBVjBDOVDIZQ/exported/trgvocab.enzh.spm.gz"
      }
    },
    "modelStatistics": {
      "parameters": 43536965,
      "encoderParameters": 22934820,
      "decoderParameters": 20602145
    }
  }
]

大概是这样,你需要把CI改成从这个url里获取连接并下载解压


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: Aalivexy <136234776+Aalivexy@users.noreply.github.com>
Copilot AI changed the title [WIP] Update .cnb.yml with new model download link Update .cnb.yml to fetch en-zh model URLs from new models.json API Jan 30, 2026
Copilot AI requested a review from Aalivexy January 30, 2026 06:22
@Aalivexy Aalivexy marked this pull request as ready for review January 30, 2026 06:27
Copilot AI review requested due to automatic review settings January 30, 2026 06:27
@Aalivexy Aalivexy merged commit 276286c into main Jan 30, 2026
5 checks passed
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the CNB CI pipeline to stop using hardcoded/expired Firefox Translations model URLs by resolving the en-zh model artifact paths dynamically from Mozilla’s models.json manifest.

Changes:

  • Fetches models.json and derives baseUrl plus en-zh file paths via jq
  • Adds stricter download behavior (set -e, curl -fsSL) and basic validation for missing/null JSON values
  • Preserves the existing extracted output layout under models-enzh/enzh/

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +27 to +30
lex_path=$(echo "$model_data" | jq -r '.files.lexicalShortlist.path')
model_path=$(echo "$model_data" | jq -r '.files.model.path')
src_vocab_path=$(echo "$model_data" | jq -r '.files.srcVocab.path')
trg_vocab_path=$(echo "$model_data" | jq -r '.files.trgVocab.path')
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

echo "$model_data" | jq ... can be unsafe because echo behavior is shell-dependent (it may interpret backslashes or -n/-e sequences), which can corrupt JSON before it reaches jq. Prefer extracting fields directly with jq from models.json (or use printf '%s' "$model_data").

Suggested change
lex_path=$(echo "$model_data" | jq -r '.files.lexicalShortlist.path')
model_path=$(echo "$model_data" | jq -r '.files.model.path')
src_vocab_path=$(echo "$model_data" | jq -r '.files.srcVocab.path')
trg_vocab_path=$(echo "$model_data" | jq -r '.files.trgVocab.path')
lex_path=$(printf '%s' "$model_data" | jq -r '.files.lexicalShortlist.path')
model_path=$(printf '%s' "$model_data" | jq -r '.files.model.path')
src_vocab_path=$(printf '%s' "$model_data" | jq -r '.files.srcVocab.path')
trg_vocab_path=$(printf '%s' "$model_data" | jq -r '.files.trgVocab.path')

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +23
# Get the first en-zh model (architecture: base)
model_data=$(jq -r '.models."en-zh"[0]' models.json)
if [ -z "$model_data" ] || [ "$model_data" = "null" ]; then
echo "No en-zh model found in models.json"
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Selecting the en-zh model via .models."en-zh"[0] is order-dependent, but the manifest contains multiple en-zh entries (e.g., different architectures/release statuses). Filter explicitly (e.g., by architecture == "base" and/or releaseStatus) so the pipeline consistently downloads the intended model even if the array order changes.

Suggested change
# Get the first en-zh model (architecture: base)
model_data=$(jq -r '.models."en-zh"[0]' models.json)
if [ -z "$model_data" ] || [ "$model_data" = "null" ]; then
echo "No en-zh model found in models.json"
# Select the en-zh model with architecture "base" and releaseStatus "released"
model_data=$(jq -r '.models["en-zh"] | map(select(.architecture == "base" and .releaseStatus == "released")) | .[0]' models.json)
if [ -z "$model_data" ] || [ "$model_data" = "null" ]; then
echo "No matching en-zh model (architecture=base, releaseStatus=released) found in models.json"

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants