Update .cnb.yml to fetch en-zh model URLs from new models.json API#14
Update .cnb.yml to fetch en-zh model URLs from new models.json API#14
Conversation
Co-authored-by: Aalivexy <136234776+Aalivexy@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Updates the CNB CI pipeline to stop using hardcoded/expired Firefox Translations model URLs by resolving the en-zh model artifact paths dynamically from Mozilla’s models.json manifest.
Changes:
- Fetches
models.jsonand derivesbaseUrlplus en-zh file paths viajq - Adds stricter download behavior (
set -e,curl -fsSL) and basic validation for missing/null JSON values - Preserves the existing extracted output layout under
models-enzh/enzh/
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| lex_path=$(echo "$model_data" | jq -r '.files.lexicalShortlist.path') | ||
| model_path=$(echo "$model_data" | jq -r '.files.model.path') | ||
| src_vocab_path=$(echo "$model_data" | jq -r '.files.srcVocab.path') | ||
| trg_vocab_path=$(echo "$model_data" | jq -r '.files.trgVocab.path') |
There was a problem hiding this comment.
echo "$model_data" | jq ... can be unsafe because echo behavior is shell-dependent (it may interpret backslashes or -n/-e sequences), which can corrupt JSON before it reaches jq. Prefer extracting fields directly with jq from models.json (or use printf '%s' "$model_data").
| lex_path=$(echo "$model_data" | jq -r '.files.lexicalShortlist.path') | |
| model_path=$(echo "$model_data" | jq -r '.files.model.path') | |
| src_vocab_path=$(echo "$model_data" | jq -r '.files.srcVocab.path') | |
| trg_vocab_path=$(echo "$model_data" | jq -r '.files.trgVocab.path') | |
| lex_path=$(printf '%s' "$model_data" | jq -r '.files.lexicalShortlist.path') | |
| model_path=$(printf '%s' "$model_data" | jq -r '.files.model.path') | |
| src_vocab_path=$(printf '%s' "$model_data" | jq -r '.files.srcVocab.path') | |
| trg_vocab_path=$(printf '%s' "$model_data" | jq -r '.files.trgVocab.path') |
| # Get the first en-zh model (architecture: base) | ||
| model_data=$(jq -r '.models."en-zh"[0]' models.json) | ||
| if [ -z "$model_data" ] || [ "$model_data" = "null" ]; then | ||
| echo "No en-zh model found in models.json" |
There was a problem hiding this comment.
Selecting the en-zh model via .models."en-zh"[0] is order-dependent, but the manifest contains multiple en-zh entries (e.g., different architectures/release statuses). Filter explicitly (e.g., by architecture == "base" and/or releaseStatus) so the pipeline consistently downloads the intended model even if the array order changes.
| # Get the first en-zh model (architecture: base) | |
| model_data=$(jq -r '.models."en-zh"[0]' models.json) | |
| if [ -z "$model_data" ] || [ "$model_data" = "null" ]; then | |
| echo "No en-zh model found in models.json" | |
| # Select the en-zh model with architecture "base" and releaseStatus "released" | |
| model_data=$(jq -r '.models["en-zh"] | map(select(.architecture == "base" and .releaseStatus == "released")) | .[0]' models.json) | |
| if [ -z "$model_data" ] || [ "$model_data" = "null" ]; then | |
| echo "No matching en-zh model (architecture=base, releaseStatus=released) found in models.json" |
The hardcoded model download URLs in
.cnb.ymlare no longer valid. Mozilla moved the translation models to a new location with a dynamic manifest athttps://storage.googleapis.com/moz-fx-translations-data--303e-prod-translations-data/db/models.json.Changes
models.jsonand extractbaseUrl+ file paths for the en-zh model usingjqset -e,curl -fsSL, and validation for null/empty JSON valuesmodels-enzh/enzh/with same filenamesOriginal prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.