Azure
diff --git a/‎sdk/python/foundation-models/system/inference/automatic-speech-recognition/asr-batch-endpoint.ipynb
+92-20 b/‎sdk/python/foundation-models/system/inference/automatic-speech-recognition/asr-batch-endpoint.ipynb
+92-20
@@ -13,11 +13,11 @@
     "`automatic-speech-recognition` (ASR) converts a speech signal to text, mapping a sequence of audio inputs to text outputs. Virtual assistants like Siri and Alexa use ASR models to help users everyday, and there are many other useful user-facing applications like live captioning and note-taking during meetings.\n",
     "\n",
     "### Model\n",
-    "Models that can perform the `automatic-speech-recognition` task are tagged with `task: automatic-speech-recognition`. We will use the `openai-whisper-large` model in this notebook. If you opened this notebook from a specific model card, remember to replace the specific model name. If you don't find a model that suits your scenario or domain, you can discover and [import models from HuggingFace hub](../../import/import-model-from-huggingface.ipynb) and then use them for inference. \n",
+    "Models that can perform the `automatic-speech-recognition` task are tagged with `task: automatic-speech-recognition`. We will use the `openai-whisper-large` model in this notebook. If you opened this notebook from a specific model card, remember to replace the specific model name. If you don't find a model that suits your scenario or domain, you can discover and [import models from HuggingFace hub](../../import/import_model_into_registry.ipynb) and then use them for inference. \n",
     "\n",
     "### Inference data\n",
-    "We will use custom audio files that have been uploaded to the cloud. \\\n",
-    "You can replace the links with any audio file stored on the cloud and verify inference.\n",
+    "We will use the [Librispeech ASR](https://huggingface.co/datasets/librispeech_asr/viewer/clean/test) dataset. \\\n",
+    "You can use also use custom audio files stored on the cloud and verify inference.\n",
     "- Most common audio formats (m4a, wav, flac, wma, mp3, etc.) are supported.\n",
     "- The whisper model can process only 30 seconds of data at a time, so if the file you upload is longer than 30 seconds, only the first 30 seconds will be transcribed. This can be circumvented by splitting the file into 30 second chunks.\n",
     "\n",
@@ -51,7 +51,9 @@
    "source": [
     "# Import packages used by the following code snippets\n",
     "import csv\n",
+    "import json\n",
     "import os\n",
+    "import requests\n",
     "import time\n",
     "\n",
     "import pandas as pd\n",
@@ -174,11 +176,14 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 3. Prepare data for inference.\n",
+    "### 3. Prepare data for inference\n",
     "\n",
-    "A copy of a small subset from the [Librispeech ASR](https://huggingface.co/datasets/librispeech_asr) is available in the [librispeech-dataset](./librispeech-dataset/) folder.  The next few cells show basic data preparation:\n",
-    "* Visualize some data rows\n",
-    "* We want this sample to run quickly, so save a smaller dataset containing a fraction of the original."
+    "We will test with a small subset from the [Librispeech ASR](https://huggingface.co/datasets/librispeech_asr/viewer/clean/test) dataset, saving the sample in the `librispeech-dataset` folder.  The next few cells show basic data preparation:\n",
+    "* Download the data.\n",
+    "* Visualize some data rows.\n",
+    "* Save the data.\n",
+    "\n",
+    "We want this sample to run quickly, so we are using a smaller dataset containing a fraction of the original."
    ]
   },
   {
@@ -189,35 +194,102 @@
    "source": [
     "# Define directories and filenames as variables\n",
     "dataset_dir = \"librispeech-dataset\"\n",
-    "training_datafile = \"train_clean_100.csv\"\n",
+    "test_datafile = \"test_100.csv\"\n",
     "\n",
     "batch_dir = \"batch\"\n",
     "batch_inputs_dir = os.path.join(batch_dir, \"inputs\")\n",
     "batch_input_file = \"batch_input.csv\"\n",
+    "os.makedirs(dataset_dir, exist_ok=True)\n",
     "os.makedirs(batch_dir, exist_ok=True)\n",
     "os.makedirs(batch_inputs_dir, exist_ok=True)"
    ]
   },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### 3.1 Download the data\n",
+    "We want to get the URLs for the audio files. To do this, pull down the data using `requests` instead of the Huggingface `datasets` module. The MLflow model's signature specifies the input should be a column named `\"audio\"` and a column named `\"language\"`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "testdata = requests.get(\n",
+    "    \"https://datasets-server.huggingface.co/first-rows?dataset=librispeech_asr&config=clean&split=test&offset=0&limit=100\"\n",
+    ").text\n",
+    "testdata = json.loads(testdata)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "audio_urls_and_text = [\n",
+    "    (row[\"row\"][\"audio\"][0][\"src\"], row[\"row\"][\"text\"]) for row in testdata[\"rows\"]\n",
+    "]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "test_df = pd.DataFrame(data=audio_urls_and_text, columns=[\"audio\", \"text\"])"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "These are all spoken in English, so create a `language` column of all `'en'`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "test_df[\"language\"] = \"en\"\n",
+    "test_df.to_csv(os.path.join(\".\", dataset_dir, test_datafile), index=False)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### 3.2 Visualize a few rows of data"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Load the ./librispeech-dataset/train_clean_100.csv file into a pandas dataframe and show the first 5 rows\n",
     "pd.set_option(\n",
     "    \"display.max_colwidth\", 0\n",
     ")  # Set the max column width to 0 to display the full text\n",
-    "train_df = pd.read_csv(os.path.join(\".\", dataset_dir, training_datafile))\n",
-    "train_df.head()"
+    "test_df.head()"
    ]
   },
   {
    "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Save the input data to files of smaller batches for testing. The MLflow model's signature specifies the input should be a column named `\"audio\"` and a column named `\"language\"`."
+    "#### 3.3 Save the data\n",
+    "Save the input data to files of smaller batches for testing."
    ]
   },
   {
@@ -226,7 +298,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "batch_df = train_df[[\"audio\", \"language\"]]\n",
+    "batch_df = test_df[[\"audio\", \"language\"]]\n",
     "\n",
     "# Divide this into files of 10 rows each\n",
     "batch_size_per_predict = 10\n",
@@ -300,9 +372,9 @@
     "    instance_count=1,\n",
     "    logging_level=\"info\",\n",
     "    max_concurrency_per_instance=1,\n",
-    "    mini_batch_size=10,\n",
+    "    mini_batch_size=2,\n",
     "    output_file_name=\"predictions.csv\",\n",
-    "    retry_settings=BatchRetrySettings(max_retries=3, timeout=300),\n",
+    "    retry_settings=BatchRetrySettings(max_retries=3, timeout=600),\n",
     ")\n",
     "workspace_ml_client.begin_create_or_update(deployment).result()"
    ]
@@ -334,7 +406,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 5. Run a batch inference job.\n",
+    "### 5. Run a batch inference job\n",
     "Invoke the batch endpoint with the input parameter pointing to the folder containing the batch inference input. This creates a pipeline job using the default deployment in the endpoint. Wait for the job to complete."
    ]
   },
@@ -358,7 +430,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 6. Review inference predictions. \n",
+    "### 6. Review inference predictions\n",
     "Download the predictions from the job output and review the predictions using a dataframe."
    ]
   },
@@ -390,7 +462,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Record the input file name and set the original index value in the `'index'` column for each input file. Join the `train_df` with ground truth into the input dataframe."
+    "Record the input file name and set the original index value in the `'index'` column for each input file. Join the `test_df` containing ground truth into the input dataframe."
    ]
   },
   {
@@ -408,7 +480,7 @@
     "    input_df.append(input)\n",
     "input_df = pd.concat(input_df)\n",
     "input_df.set_index(\"index\", inplace=True)\n",
-    "input_df = input_df.join(train_df.drop(columns=[\"audio\", \"language\"]))\n",
+    "input_df = input_df.join(test_df.drop(columns=[\"audio\", \"language\"]))\n",
     "\n",
     "input_df.head()"
    ]
@@ -471,7 +543,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.13"
+   "version": "3.9.12"
   },
   "vscode": {
    "interpreter": {