diff --git a/examples/ingest-from-db-example/ingest-from-db-example-notebook/Ingesting_data_from_DB_into_Database.ipynb b/examples/ingest-from-db-example/ingest-from-db-example-notebook/Ingesting_data_from_DB_into_Database.ipynb
new file mode 100644
index 0000000000..5153ae0cae
--- /dev/null
+++ b/examples/ingest-from-db-example/ingest-from-db-example-notebook/Ingesting_data_from_DB_into_Database.ipynb
@@ -0,0 +1,1123 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "colab": {
+ "provenance": []
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# Ingesting data from DB into Database\n"
+ ],
+ "metadata": {
+ "id": "SiXBOvrboX9e"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "In this notebook we will use the [Versatile Data Kit (VDK)](https://github.com/vmware/versatile-data-kit) to develop an ingestion Data Job. This job will read data from one local SQLite database and write it into another local SQLite database, thus creating a backup for a table."
+ ],
+ "metadata": {
+ "id": "FpCoerAj0hau"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "\n",
+ "## 1. Prerequisites"
+ ],
+ "metadata": {
+ "id": "f9XXCBpI1eMW"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### 1.1 Good to Know Before Your Start"
+ ],
+ "metadata": {
+ "id": "q_pbljVH2cFr"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "This tutorial can be easily understood if you are familiar with:\n",
+ "\n",
+ "- **Python and SQL**: Basic commands and queries\n",
+ "- **Tools**: Comfort with command line and Jupyter Notebook"
+ ],
+ "metadata": {
+ "id": "jYmcEhgE2zNn"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### 1.2 Useful notebook shortcuts"
+ ],
+ "metadata": {
+ "id": "CKYPWV8-3XE8"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "* Click the **Play icon** in the left gutter of the cell;\n",
+ "* Type **Cmd/Ctrl+Enter** to run the cell in place;\n",
+ "* Type **Shift+Enter** to run the cell and move focus to the next cell (adding one if none exists); or\n",
+ "* Type **Alt+Enter** to run the cell and insert a new code cell immediately below it.\n",
+ "\n",
+ "There are additional options for running some or all cells in the **Runtime** menu on top."
+ ],
+ "metadata": {
+ "id": "Mh2kLwz73mk3"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### 1.3 Install Versatile Data Kit and required plugins\n",
+ "\n",
+ "\n"
+ ],
+ "metadata": {
+ "id": "5s_z56EF3v3o"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 35,
+ "metadata": {
+ "id": "1tN44vxJoTLa",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "338e8e60-be63-4746-8a41-ac4cde583eac"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Requirement already satisfied: quickstart-vdk in /usr/local/lib/python3.11/dist-packages (0.2.1643867476)\n",
+ "Requirement already satisfied: vdk-notebook in /usr/local/lib/python3.11/dist-packages (0.1.1431637373)\n",
+ "Requirement already satisfied: vdk-ipython==0.2.5 in /usr/local/lib/python3.11/dist-packages (0.2.5)\n",
+ "Requirement already satisfied: vdk-data-sources in /usr/local/lib/python3.11/dist-packages (0.1.1431637373)\n",
+ "Requirement already satisfied: vdk-singer in /usr/local/lib/python3.11/dist-packages (0.1.1431637373)\n",
+ "Requirement already satisfied: tap-rest-api-msdk in /usr/local/lib/python3.11/dist-packages (1.4.1)\n",
+ "Requirement already satisfied: vdk-core in /usr/local/lib/python3.11/dist-packages (from vdk-ipython==0.2.5) (0.3.1466514302)\n",
+ "Requirement already satisfied: iPython in /usr/local/lib/python3.11/dist-packages (from vdk-ipython==0.2.5) (7.34.0)\n",
+ "Requirement already satisfied: pandas in /usr/local/lib/python3.11/dist-packages (from vdk-ipython==0.2.5) (2.2.2)\n",
+ "Requirement already satisfied: vdk-plugin-control-cli in /usr/local/lib/python3.11/dist-packages (from quickstart-vdk) (0.1.1431637373)\n",
+ "Requirement already satisfied: vdk-sqlite in /usr/local/lib/python3.11/dist-packages (from quickstart-vdk) (0.1.1431637373)\n",
+ "Requirement already satisfied: vdk-ingest-http in /usr/local/lib/python3.11/dist-packages (from quickstart-vdk) (0.2.1431637373)\n",
+ "Requirement already satisfied: vdk-ingest-file in /usr/local/lib/python3.11/dist-packages (from quickstart-vdk) (0.1.1431637373)\n",
+ "Requirement already satisfied: vdk-server in /usr/local/lib/python3.11/dist-packages (from quickstart-vdk) (0.1.1431637373)\n",
+ "Requirement already satisfied: vdk-structlog in /usr/local/lib/python3.11/dist-packages (from quickstart-vdk) (0.1.1431637373)\n",
+ "Requirement already satisfied: toml in /usr/local/lib/python3.11/dist-packages (from vdk-data-sources) (0.10.2)\n",
+ "Requirement already satisfied: vdk-control-cli in /usr/local/lib/python3.11/dist-packages (from vdk-data-sources) (1.3.1406932984)\n",
+ "Requirement already satisfied: simplejson in /usr/local/lib/python3.11/dist-packages (from vdk-singer) (3.19.3)\n",
+ "Requirement already satisfied: pytz in /usr/local/lib/python3.11/dist-packages (from vdk-singer) (2024.2)\n",
+ "Requirement already satisfied: atomicwrites<2.0.0,>=1.4.0 in /usr/local/lib/python3.11/dist-packages (from tap-rest-api-msdk) (1.4.1)\n",
+ "Requirement already satisfied: boto3<2.0.0,>=1.26.156 in /usr/local/lib/python3.11/dist-packages (from tap-rest-api-msdk) (1.36.7)\n",
+ "Requirement already satisfied: genson<2.0.0,>=1.2.2 in /usr/local/lib/python3.11/dist-packages (from tap-rest-api-msdk) (1.3.0)\n",
+ "Requirement already satisfied: requests<3.0.0,>=2.25.1 in /usr/local/lib/python3.11/dist-packages (from tap-rest-api-msdk) (2.32.3)\n",
+ "Requirement already satisfied: requests-aws4auth<2.0.0,>=1.2.3 in /usr/local/lib/python3.11/dist-packages (from tap-rest-api-msdk) (1.3.1)\n",
+ "Requirement already satisfied: singer-sdk<0.41.0,>=0.40.0 in /usr/local/lib/python3.11/dist-packages (from tap-rest-api-msdk) (0.40.0)\n",
+ "Requirement already satisfied: botocore<1.37.0,>=1.36.7 in /usr/local/lib/python3.11/dist-packages (from boto3<2.0.0,>=1.26.156->tap-rest-api-msdk) (1.36.7)\n",
+ "Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /usr/local/lib/python3.11/dist-packages (from boto3<2.0.0,>=1.26.156->tap-rest-api-msdk) (1.0.1)\n",
+ "Requirement already satisfied: s3transfer<0.12.0,>=0.11.0 in /usr/local/lib/python3.11/dist-packages (from boto3<2.0.0,>=1.26.156->tap-rest-api-msdk) (0.11.2)\n",
+ "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests<3.0.0,>=2.25.1->tap-rest-api-msdk) (3.4.1)\n",
+ "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests<3.0.0,>=2.25.1->tap-rest-api-msdk) (3.10)\n",
+ "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests<3.0.0,>=2.25.1->tap-rest-api-msdk) (1.26.20)\n",
+ "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests<3.0.0,>=2.25.1->tap-rest-api-msdk) (2024.12.14)\n",
+ "Requirement already satisfied: PyYAML>=6.0 in /usr/local/lib/python3.11/dist-packages (from singer-sdk<0.41.0,>=0.40.0->tap-rest-api-msdk) (6.0.2)\n",
+ "Requirement already satisfied: backoff>=2.0.0 in /usr/local/lib/python3.11/dist-packages (from singer-sdk<0.41.0,>=0.40.0->tap-rest-api-msdk) (2.2.1)\n",
+ "Requirement already satisfied: click<9.0,>=8.0 in /usr/local/lib/python3.11/dist-packages (from singer-sdk<0.41.0,>=0.40.0->tap-rest-api-msdk) (8.1.8)\n",
+ "Requirement already satisfied: fs>=2.4.16 in /usr/local/lib/python3.11/dist-packages (from singer-sdk<0.41.0,>=0.40.0->tap-rest-api-msdk) (2.4.16)\n",
+ "Requirement already satisfied: importlib-metadata<9.0.0 in /usr/local/lib/python3.11/dist-packages (from singer-sdk<0.41.0,>=0.40.0->tap-rest-api-msdk) (8.6.1)\n",
+ "Requirement already satisfied: inflection>=0.5.1 in /usr/local/lib/python3.11/dist-packages (from singer-sdk<0.41.0,>=0.40.0->tap-rest-api-msdk) (0.5.1)\n",
+ "Requirement already satisfied: joblib>=1.3.0 in /usr/local/lib/python3.11/dist-packages (from singer-sdk<0.41.0,>=0.40.0->tap-rest-api-msdk) (1.4.2)\n",
+ "Requirement already satisfied: jsonpath-ng>=1.5.3 in /usr/local/lib/python3.11/dist-packages (from singer-sdk<0.41.0,>=0.40.0->tap-rest-api-msdk) (1.7.0)\n",
+ "Requirement already satisfied: jsonschema>=4.16.0 in /usr/local/lib/python3.11/dist-packages (from singer-sdk<0.41.0,>=0.40.0->tap-rest-api-msdk) (4.23.0)\n",
+ "Requirement already satisfied: packaging>=23.1 in /usr/local/lib/python3.11/dist-packages (from singer-sdk<0.41.0,>=0.40.0->tap-rest-api-msdk) (24.2)\n",
+ "Requirement already satisfied: python-dotenv>=0.20 in /usr/local/lib/python3.11/dist-packages (from singer-sdk<0.41.0,>=0.40.0->tap-rest-api-msdk) (1.0.1)\n",
+ "Requirement already satisfied: referencing>=0.30.0 in /usr/local/lib/python3.11/dist-packages (from singer-sdk<0.41.0,>=0.40.0->tap-rest-api-msdk) (0.36.1)\n",
+ "Requirement already satisfied: simpleeval>=0.9.13 in /usr/local/lib/python3.11/dist-packages (from singer-sdk<0.41.0,>=0.40.0->tap-rest-api-msdk) (1.0.3)\n",
+ "Requirement already satisfied: sqlalchemy<3.0,>=1.4 in /usr/local/lib/python3.11/dist-packages (from singer-sdk<0.41.0,>=0.40.0->tap-rest-api-msdk) (2.0.37)\n",
+ "Requirement already satisfied: typing-extensions>=4.5.0 in /usr/local/lib/python3.11/dist-packages (from singer-sdk<0.41.0,>=0.40.0->tap-rest-api-msdk) (4.12.2)\n",
+ "Requirement already satisfied: setuptools>=18.5 in /usr/local/lib/python3.11/dist-packages (from iPython->vdk-ipython==0.2.5) (75.1.0)\n",
+ "Requirement already satisfied: jedi>=0.16 in /usr/local/lib/python3.11/dist-packages (from iPython->vdk-ipython==0.2.5) (0.19.2)\n",
+ "Requirement already satisfied: decorator in /usr/local/lib/python3.11/dist-packages (from iPython->vdk-ipython==0.2.5) (4.4.2)\n",
+ "Requirement already satisfied: pickleshare in /usr/local/lib/python3.11/dist-packages (from iPython->vdk-ipython==0.2.5) (0.7.5)\n",
+ "Requirement already satisfied: traitlets>=4.2 in /usr/local/lib/python3.11/dist-packages (from iPython->vdk-ipython==0.2.5) (5.7.1)\n",
+ "Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /usr/local/lib/python3.11/dist-packages (from iPython->vdk-ipython==0.2.5) (3.0.50)\n",
+ "Requirement already satisfied: pygments in /usr/local/lib/python3.11/dist-packages (from iPython->vdk-ipython==0.2.5) (2.18.0)\n",
+ "Requirement already satisfied: backcall in /usr/local/lib/python3.11/dist-packages (from iPython->vdk-ipython==0.2.5) (0.2.0)\n",
+ "Requirement already satisfied: matplotlib-inline in /usr/local/lib/python3.11/dist-packages (from iPython->vdk-ipython==0.2.5) (0.1.7)\n",
+ "Requirement already satisfied: pexpect>4.3 in /usr/local/lib/python3.11/dist-packages (from iPython->vdk-ipython==0.2.5) (4.9.0)\n",
+ "Requirement already satisfied: numpy>=1.23.2 in /usr/local/lib/python3.11/dist-packages (from pandas->vdk-ipython==0.2.5) (1.26.4)\n",
+ "Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.11/dist-packages (from pandas->vdk-ipython==0.2.5) (2.8.2)\n",
+ "Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.11/dist-packages (from pandas->vdk-ipython==0.2.5) (2025.1)\n",
+ "Requirement already satisfied: click-log==0.* in /usr/local/lib/python3.11/dist-packages (from vdk-control-cli->vdk-data-sources) (0.4.0)\n",
+ "Requirement already satisfied: click-spinner==0.* in /usr/local/lib/python3.11/dist-packages (from vdk-control-cli->vdk-data-sources) (0.1.10)\n",
+ "Requirement already satisfied: pluggy in /usr/local/lib/python3.11/dist-packages (from vdk-control-cli->vdk-data-sources) (1.5.0)\n",
+ "Requirement already satisfied: vdk-control-service-api==1.0.13 in /usr/local/lib/python3.11/dist-packages (from vdk-control-cli->vdk-data-sources) (1.0.13)\n",
+ "Requirement already satisfied: tabulate in /usr/local/lib/python3.11/dist-packages (from vdk-control-cli->vdk-data-sources) (0.9.0)\n",
+ "Requirement already satisfied: requests-oauthlib>=1.0 in /usr/local/lib/python3.11/dist-packages (from vdk-control-cli->vdk-data-sources) (1.3.1)\n",
+ "Requirement already satisfied: vdk-control-api-auth in /usr/local/lib/python3.11/dist-packages (from vdk-control-cli->vdk-data-sources) (0.1.1431637373)\n",
+ "Requirement already satisfied: pydantic<2,>=1.10.5 in /usr/local/lib/python3.11/dist-packages (from vdk-control-service-api==1.0.13->vdk-control-cli->vdk-data-sources) (1.10.21)\n",
+ "Requirement already satisfied: aenum in /usr/local/lib/python3.11/dist-packages (from vdk-control-service-api==1.0.13->vdk-control-cli->vdk-data-sources) (3.1.15)\n",
+ "Requirement already satisfied: click-plugins in /usr/local/lib/python3.11/dist-packages (from vdk-core->vdk-ipython==0.2.5) (1.1.1)\n",
+ "Requirement already satisfied: tenacity in /usr/local/lib/python3.11/dist-packages (from vdk-core->vdk-ipython==0.2.5) (9.0.0)\n",
+ "Requirement already satisfied: docker in /usr/local/lib/python3.11/dist-packages (from vdk-server->quickstart-vdk) (7.1.0)\n",
+ "Requirement already satisfied: kubernetes in /usr/local/lib/python3.11/dist-packages (from vdk-server->quickstart-vdk) (32.0.0)\n",
+ "Requirement already satisfied: python-json-logger in /usr/local/lib/python3.11/dist-packages (from vdk-structlog->quickstart-vdk) (3.2.1)\n",
+ "Requirement already satisfied: appdirs~=1.4.3 in /usr/local/lib/python3.11/dist-packages (from fs>=2.4.16->singer-sdk<0.41.0,>=0.40.0->tap-rest-api-msdk) (1.4.4)\n",
+ "Requirement already satisfied: six~=1.10 in /usr/local/lib/python3.11/dist-packages (from fs>=2.4.16->singer-sdk<0.41.0,>=0.40.0->tap-rest-api-msdk) (1.17.0)\n",
+ "Requirement already satisfied: zipp>=3.20 in /usr/local/lib/python3.11/dist-packages (from importlib-metadata<9.0.0->singer-sdk<0.41.0,>=0.40.0->tap-rest-api-msdk) (3.21.0)\n",
+ "Requirement already satisfied: parso<0.9.0,>=0.8.4 in /usr/local/lib/python3.11/dist-packages (from jedi>=0.16->iPython->vdk-ipython==0.2.5) (0.8.4)\n",
+ "Requirement already satisfied: ply in /usr/local/lib/python3.11/dist-packages (from jsonpath-ng>=1.5.3->singer-sdk<0.41.0,>=0.40.0->tap-rest-api-msdk) (3.11)\n",
+ "Requirement already satisfied: attrs>=22.2.0 in /usr/local/lib/python3.11/dist-packages (from jsonschema>=4.16.0->singer-sdk<0.41.0,>=0.40.0->tap-rest-api-msdk) (24.3.0)\n",
+ "Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /usr/local/lib/python3.11/dist-packages (from jsonschema>=4.16.0->singer-sdk<0.41.0,>=0.40.0->tap-rest-api-msdk) (2024.10.1)\n",
+ "Requirement already satisfied: rpds-py>=0.7.1 in /usr/local/lib/python3.11/dist-packages (from jsonschema>=4.16.0->singer-sdk<0.41.0,>=0.40.0->tap-rest-api-msdk) (0.22.3)\n",
+ "Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.11/dist-packages (from pexpect>4.3->iPython->vdk-ipython==0.2.5) (0.7.0)\n",
+ "Requirement already satisfied: wcwidth in /usr/local/lib/python3.11/dist-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->iPython->vdk-ipython==0.2.5) (0.2.13)\n",
+ "Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.11/dist-packages (from requests-oauthlib>=1.0->vdk-control-cli->vdk-data-sources) (3.2.2)\n",
+ "Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.11/dist-packages (from sqlalchemy<3.0,>=1.4->singer-sdk<0.41.0,>=0.40.0->tap-rest-api-msdk) (3.1.1)\n",
+ "Requirement already satisfied: google-auth>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from kubernetes->vdk-server->quickstart-vdk) (2.27.0)\n",
+ "Requirement already satisfied: websocket-client!=0.40.0,!=0.41.*,!=0.42.*,>=0.32.0 in /usr/local/lib/python3.11/dist-packages (from kubernetes->vdk-server->quickstart-vdk) (1.8.0)\n",
+ "Requirement already satisfied: durationpy>=0.7 in /usr/local/lib/python3.11/dist-packages (from kubernetes->vdk-server->quickstart-vdk) (0.9)\n",
+ "Requirement already satisfied: PyJWT in /usr/local/lib/python3.11/dist-packages (from vdk-control-api-auth->vdk-control-cli->vdk-data-sources) (2.10.1)\n",
+ "Requirement already satisfied: cachetools<6.0,>=2.0.0 in /usr/local/lib/python3.11/dist-packages (from google-auth>=1.0.1->kubernetes->vdk-server->quickstart-vdk) (5.5.1)\n",
+ "Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.11/dist-packages (from google-auth>=1.0.1->kubernetes->vdk-server->quickstart-vdk) (0.4.1)\n",
+ "Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.11/dist-packages (from google-auth>=1.0.1->kubernetes->vdk-server->quickstart-vdk) (4.9)\n",
+ "Requirement already satisfied: pyasn1<0.7.0,>=0.4.6 in /usr/local/lib/python3.11/dist-packages (from pyasn1-modules>=0.2.1->google-auth>=1.0.1->kubernetes->vdk-server->quickstart-vdk) (0.6.1)\n"
+ ]
+ }
+ ],
+ "source": [
+ "!pip install quickstart-vdk vdk-notebook vdk-ipython==0.2.5 vdk-data-sources vdk-singer tap-rest-api-msdk"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "The relevant Data Job code is in the upcoming cells.\n",
+ "
Alternatively, you can see the implementation of the data job here"
+ ],
+ "metadata": {
+ "id": "Orkqbxmq6b8V"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## 2. Database"
+ ],
+ "metadata": {
+ "id": "9QrNfgmB7Xul"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "We will be using the chinook SQLite database. Here we can download it using the following commands."
+ ],
+ "metadata": {
+ "id": "LU5Dl2ZS7e0t"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "!curl https://www.sqlitetutorial.net/wp-content/uploads/2018/03/chinook.zip >> chinook.zip\n",
+ "!unzip chinook.zip\n",
+ "!rm -r chinook.zip"
+ ],
+ "metadata": {
+ "id": "DzZpnw3Zp7F8",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "c9850585-67ea-4df7-8473-20a686ff5caa"
+ },
+ "execution_count": 36,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ " % Total % Received % Xferd Average Speed Time Time Time Current\n",
+ " Dload Upload Total Spent Left Speed\n",
+ "\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r100 298k 100 298k 0 0 1995k 0 --:--:-- --:--:-- --:--:-- 2002k\n",
+ "Archive: chinook.zip\n",
+ " inflating: chinook.db \n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "chinook.db' should now be located in the same directory where the original zip file was downloaded."
+ ],
+ "metadata": {
+ "id": "wmbZGGZp754v"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## 3. Configuration"
+ ],
+ "metadata": {
+ "id": "S9YY589K8KTh"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "We have previously installed Versatile Data Kit and the plugins required for the example. Now the path to the database we just downloaded must be declared as an environment variable.\n"
+ ],
+ "metadata": {
+ "id": "isMdKlfv8RjJ"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "%env VDK_SQLITE_FILE=chinook.db\n",
+ "%env DB_DEFAULT_TYPE=sqlite\n",
+ "%env INGEST_METHOD_DEFAULT=sqlite\n",
+ "%env INGESTER_WAIT_TO_FINISH_AFTER_EVERY_SEND=true"
+ ],
+ "metadata": {
+ "id": "bVt_UlegyLQE",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "a2a7c3be-f547-4f66-f9e8-9b1cc459ba90"
+ },
+ "execution_count": 37,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "env: VDK_SQLITE_FILE=chinook.db\n",
+ "env: DB_DEFAULT_TYPE=sqlite\n",
+ "env: INGEST_METHOD_DEFAULT=sqlite\n",
+ "env: INGESTER_WAIT_TO_FINISH_AFTER_EVERY_SEND=true\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "vdk.plugin.ipython extension introduces a magic command for Jupyter.\n",
+ "\n",
+ "The command enables the user to load VDK for the current notebook.\n",
+ "VDK provides the job_input API, which has methods for:\n",
+ "* executing queries to an OLAP database;\n",
+ "* ingesting data into a database;\n",
+ "* processing data into a database.\n",
+ "\n",
+ "Type help(job_input) to see its documentation."
+ ],
+ "metadata": {
+ "id": "e3Ty8lAITGeZ"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# NOTE: The CELL may fail when run the first time. Run it again and it shoud suceeds.\n",
+ "\n",
+ "%reload_ext vdk.plugin.ipython\n",
+ "%reload_VDK\n",
+ "job_input = VDK.get_initialized_job_input()"
+ ],
+ "metadata": {
+ "id": "r9jI-n3NoMLZ",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "84c66fdc-e967-4ec3-eaed-bf98ab9a6074"
+ },
+ "execution_count": 38,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "2025-01-28 12:01:38,340 [VDK] [INFO ] vdk.plugin.control_cli_plugin. properties_plugin.py :30 initialize_job - Control Service REST API URL is not configured. Will not initialize Control Service based Properties client implementation.\n",
+ "2025-01-28 12:01:38,346 [VDK] [INFO ] vdk.plugin.control_cli_plugin. execution_skip.py :105 _skip_job_if_nec - Checking if job should be skipped:\n",
+ "2025-01-28 12:01:38,347 [VDK] [INFO ] vdk.plugin.control_cli_plugin. execution_skip.py :106 _skip_job_if_nec - Job : content, Team : None, Log config: LOCAL, execution_id: 2fd9278f-abb2-41b5-b903-6c1762fa8cab-1738065698\n"
+ ]
+ },
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## 4. Data Job"
+ ],
+ "metadata": {
+ "id": "wDPpw8xe8wBP"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "The structure of our Data Job in following cells is as follows:
\n",
+ "**ingest-from-db-example-job**
\n",
+ "├── 1-Drop Table
\n",
+ "├── 2-Create Table
\n",
+ "├── 3-Ingest to Table
\n",
+ "\n",
+ "The purpose of our Data Job ***ingest-from-db-example-job*** to demonstrate how the user can query data from a source database and then ingest it to the target database
\n",
+ "\n",
+ "Our Data Job consists of three SQL steps. Using ***%%vdksql*** cell magic command we will be running each query in our notebook.
\n",
+ "\n",
+ "**Each SQL step is a separate query:**\n",
+ "\n",
+ "- The first step deletes the backup table if it exists. This query only serves to make the Data Job repeatable;\n",
+ "- The second step creates the backup table we will be inserting data into;\n",
+ "-The third step makes a connection to the source database, queries data from it, and then ingests the returned data into the destination_table in the target database.\n",
+ "\n",
+ "
\n",
+ "Run each of the following cells in order to observe the job in action.\n"
+ ],
+ "metadata": {
+ "id": "CFrCNMbL80lX"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### Step 1: Drop Table"
+ ],
+ "metadata": {
+ "id": "k6DvRJegVFBB"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "%%vdksql\n",
+ "DROP TABLE IF EXISTS backup_employees;"
+ ],
+ "metadata": {
+ "id": "ox6Y6rYHUna2",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 194
+ },
+ "outputId": "ba949f7c-79a7-43f7-b168-060e8f0279bf"
+ },
+ "execution_count": 39,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "2025-01-28 12:01:38,361 [VDK] [INFO ] vdk.plugin.sqlite.sqlite_conne sqlite_connection.py :29 new_connection - Creating new connection against local file database located at: chinook.db\n",
+ "2025-01-28 12:01:38,364 [VDK] [INFO ] vdk.plugin.sqlite.sqlite_conne sqlite_connection.py :29 new_connection - Creating new connection against local file database located at: chinook.db\n",
+ "2025-01-28 12:01:38,367 [VDK] [INFO ] vdk.internal.builtin_plugins.c managed_cursor.py :201 _execute_operati - Executing query:\n",
+ "-- job_name: content\n",
+ "-- op_id: 2fd9278f-abb2-41b5-b903-6c1762fa8cab-1738065698\n",
+ "DROP TABLE IF EXISTS backup_employees;\n",
+ "\n",
+ "2025-01-28 12:01:38,372 [VDK] [INFO ] vdk.internal.builtin_plugins.c managed_cursor.py :103 execute - Executing query SUCCEEDED. Query duration 00h:00m:00s\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "'Query statement executed successfully.'"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ }
+ },
+ "metadata": {},
+ "execution_count": 39
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### Step 2: Create Table"
+ ],
+ "metadata": {
+ "id": "4R4Hlkn9VWv7"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "%%vdksql\n",
+ "CREATE TABLE backup_employees (\n",
+ " EmployeeId INTEGER,\n",
+ " LastName NVARCHAR,\n",
+ " FirstName NVARCHAR,\n",
+ " Title NVARCHAR,\n",
+ " ReportsTo INTEGER,\n",
+ " BirthDate NVARCHAR,\n",
+ " HireDate NVARCHAR,\n",
+ " Address NVARCHAR,\n",
+ " City NVARCHAR,\n",
+ " State NVARCHAR,\n",
+ " Country NVARCHAR,\n",
+ " PostalCode NVARCHAR,\n",
+ " Phone NVARCHAR,\n",
+ " Fax NVARCHAR,\n",
+ " Email NVARCHAR\n",
+ ");"
+ ],
+ "metadata": {
+ "id": "IFGyO8VQU3Rp",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 524
+ },
+ "outputId": "6fd97b32-efe5-4203-be8e-d8ef40efeaa4"
+ },
+ "execution_count": 40,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "2025-01-28 12:01:38,397 [VDK] [INFO ] vdk.internal.builtin_plugins.c managed_cursor.py :201 _execute_operati - Executing query:\n",
+ "-- job_name: content\n",
+ "-- op_id: 2fd9278f-abb2-41b5-b903-6c1762fa8cab-1738065698\n",
+ " select 1 -- Testing if connection is alive.\n",
+ "2025-01-28 12:01:38,406 [VDK] [INFO ] vdk.internal.builtin_plugins.c managed_cursor.py :103 execute - Executing query SUCCEEDED. Query duration 00h:00m:00s\n",
+ "2025-01-28 12:01:38,408 [VDK] [INFO ] vdk.internal.builtin_plugins.c managed_cursor.py :201 _execute_operati - Executing query:\n",
+ "-- job_name: content\n",
+ "-- op_id: 2fd9278f-abb2-41b5-b903-6c1762fa8cab-1738065698\n",
+ "CREATE TABLE backup_employees (\n",
+ " EmployeeId INTEGER,\n",
+ " LastName NVARCHAR,\n",
+ " FirstName NVARCHAR,\n",
+ " Title NVARCHAR,\n",
+ " ReportsTo INTEGER,\n",
+ " BirthDate NVARCHAR,\n",
+ " HireDate NVARCHAR,\n",
+ " Address NVARCHAR,\n",
+ " City NVARCHAR,\n",
+ " State NVARCHAR,\n",
+ " Country NVARCHAR,\n",
+ " PostalCode NVARCHAR,\n",
+ " Phone NVARCHAR,\n",
+ " Fax NVARCHAR,\n",
+ " Email NVARCHAR\n",
+ ");\n",
+ "\n",
+ "2025-01-28 12:01:38,437 [VDK] [INFO ] vdk.internal.builtin_plugins.c managed_cursor.py :103 execute - Executing query SUCCEEDED. Query duration 00h:00m:00s\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "'Query statement executed successfully.'"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ }
+ },
+ "metadata": {},
+ "execution_count": 40
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### Step 3: Ingest to Table"
+ ],
+ "metadata": {
+ "id": "Eit_CLguVd71"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "import sqlite3\n",
+ "\n",
+ "db_connection = sqlite3.connect(\n",
+ " \"chinook.db\"\n",
+ " ) # if chinook.db file is not in your current directory, replace \"chinook.db\" with the path to your chinook.db file\n",
+ "cursor = db_connection.cursor()\n",
+ "cursor.execute(\"SELECT * FROM employees\")\n",
+ "job_input.send_tabular_data_for_ingestion(\n",
+ " cursor,\n",
+ " column_names=[column_info[0] for column_info in cursor.description],\n",
+ " destination_table=\"backup_employees\",\n",
+ ")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "kfWieRris_ll",
+ "outputId": "d04a99ae-b9b1-4da6-98e3-f0318faba615"
+ },
+ "execution_count": 41,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "2025-01-28 12:01:38,457 [VDK] [INFO ] vdk.internal.builtin_plugins.i ingester_router.py :105 send_tabular_dat - Sending tabular data for ingestion with method: sqlite and target: None\n",
+ "2025-01-28 12:01:40,467 [VDK] [INFO ] vdk.plugin.sqlite.ingest_to_sq ingest_to_sqlite.py :76 ingest_payload - Ingesting payloads for target: chinook.db; collection_id: content|2fd9278f-abb2-41b5-b903-6c1762fa8cab-1738065698\n",
+ "2025-01-28 12:01:40,470 [VDK] [INFO ] vdk.plugin.sqlite.sqlite_conne sqlite_connection.py :29 new_connection - Creating new connection against local file database located at: chinook.db\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## 5. Results"
+ ],
+ "metadata": {
+ "id": "MGbX1PnhAWMb"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "After running the Data Job, we can check whether the new table was populated correctly by using the **sqlite-query** command afforded to us by the **vdk-sqlite** plugin, which we can use to execute queries against the configured SQLite database without having to set up a Data Job:"
+ ],
+ "metadata": {
+ "id": "S7RiHPv7AcvJ"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "%%vdksql\n",
+ "SELECT * FROM backup_employees"
+ ],
+ "metadata": {
+ "id": "HXIoXqknz9Lh",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 1000
+ },
+ "outputId": "3460d989-19d3-434b-810d-33468248d8c7"
+ },
+ "execution_count": 42,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "2025-01-28 12:01:40,561 [VDK] [INFO ] vdk.internal.builtin_plugins.c managed_cursor.py :201 _execute_operati - Executing query:\n",
+ "-- job_name: content\n",
+ "-- op_id: 2fd9278f-abb2-41b5-b903-6c1762fa8cab-1738065698\n",
+ " select 1 -- Testing if connection is alive.\n",
+ "2025-01-28 12:01:40,571 [VDK] [INFO ] vdk.internal.builtin_plugins.c managed_cursor.py :103 execute - Executing query SUCCEEDED. Query duration 00h:00m:00s\n",
+ "2025-01-28 12:01:40,577 [VDK] [INFO ] vdk.internal.builtin_plugins.c managed_cursor.py :201 _execute_operati - Executing query:\n",
+ "-- job_name: content\n",
+ "-- op_id: 2fd9278f-abb2-41b5-b903-6c1762fa8cab-1738065698\n",
+ "SELECT * FROM backup_employees\n",
+ "\n",
+ "2025-01-28 12:01:40,579 [VDK] [INFO ] vdk.internal.builtin_plugins.c managed_cursor.py :103 execute - Executing query SUCCEEDED. Query duration 00h:00m:00s\n",
+ "2025-01-28 12:01:40,585 [VDK] [INFO ] vdk.internal.builtin_plugins.c managed_cursor.py :239 fetchall - Fetching all results from query ...\n",
+ "2025-01-28 12:01:40,586 [VDK] [INFO ] vdk.internal.builtin_plugins.c managed_cursor.py :242 fetchall - Fetching all results from query SUCCEEDED.\n",
+ "2025-01-28 12:01:40,591 [VDK] [INFO ] vdk.internal.builtin_plugins.c managed_cursor.py :249 close - Closing DB cursor ...\n",
+ "2025-01-28 12:01:40,593 [VDK] [INFO ] vdk.internal.builtin_plugins.c managed_cursor.py :251 close - Closing DB cursor SUCCEEDED.\n",
+ "2025-01-28 12:01:40,599 [VDK] [INFO ] vdk.plugin.ipython.sql sql.py :82 prepare_result_c - ipyaggrid is not installed. If installed result would be formatted in a interactive grid.\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " EmployeeId LastName FirstName Title ReportsTo \\\n",
+ "0 1 Adams Andrew General Manager NaN \n",
+ "1 2 Edwards Nancy Sales Manager 1.0 \n",
+ "2 3 Peacock Jane Sales Support Agent 2.0 \n",
+ "3 4 Park Margaret Sales Support Agent 2.0 \n",
+ "4 5 Johnson Steve Sales Support Agent 2.0 \n",
+ "5 6 Mitchell Michael IT Manager 1.0 \n",
+ "6 7 King Robert IT Staff 6.0 \n",
+ "7 8 Callahan Laura IT Staff 6.0 \n",
+ "\n",
+ " BirthDate HireDate Address \\\n",
+ "0 1962-02-18 00:00:00 2002-08-14 00:00:00 11120 Jasper Ave NW \n",
+ "1 1958-12-08 00:00:00 2002-05-01 00:00:00 825 8 Ave SW \n",
+ "2 1973-08-29 00:00:00 2002-04-01 00:00:00 1111 6 Ave SW \n",
+ "3 1947-09-19 00:00:00 2003-05-03 00:00:00 683 10 Street SW \n",
+ "4 1965-03-03 00:00:00 2003-10-17 00:00:00 7727B 41 Ave \n",
+ "5 1973-07-01 00:00:00 2003-10-17 00:00:00 5827 Bowness Road NW \n",
+ "6 1970-05-29 00:00:00 2004-01-02 00:00:00 590 Columbia Boulevard West \n",
+ "7 1968-01-09 00:00:00 2004-03-04 00:00:00 923 7 ST NW \n",
+ "\n",
+ " City State Country PostalCode Phone Fax \\\n",
+ "0 Edmonton AB Canada T5K 2N1 +1 (780) 428-9482 +1 (780) 428-3457 \n",
+ "1 Calgary AB Canada T2P 2T3 +1 (403) 262-3443 +1 (403) 262-3322 \n",
+ "2 Calgary AB Canada T2P 5M5 +1 (403) 262-3443 +1 (403) 262-6712 \n",
+ "3 Calgary AB Canada T2P 5G3 +1 (403) 263-4423 +1 (403) 263-4289 \n",
+ "4 Calgary AB Canada T3B 1Y7 1 (780) 836-9987 1 (780) 836-9543 \n",
+ "5 Calgary AB Canada T3B 0C5 +1 (403) 246-9887 +1 (403) 246-9899 \n",
+ "6 Lethbridge AB Canada T1K 5N8 +1 (403) 456-9986 +1 (403) 456-8485 \n",
+ "7 Lethbridge AB Canada T1H 1Y8 +1 (403) 467-3351 +1 (403) 467-8772 \n",
+ "\n",
+ " Email \n",
+ "0 andrew@chinookcorp.com \n",
+ "1 nancy@chinookcorp.com \n",
+ "2 jane@chinookcorp.com \n",
+ "3 margaret@chinookcorp.com \n",
+ "4 steve@chinookcorp.com \n",
+ "5 michael@chinookcorp.com \n",
+ "6 robert@chinookcorp.com \n",
+ "7 laura@chinookcorp.com "
+ ],
+ "text/html": [
+ "\n",
+ "
| \n", + " | EmployeeId | \n", + "LastName | \n", + "FirstName | \n", + "Title | \n", + "ReportsTo | \n", + "BirthDate | \n", + "HireDate | \n", + "Address | \n", + "City | \n", + "State | \n", + "Country | \n", + "PostalCode | \n", + "Phone | \n", + "Fax | \n", + "|
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", + "1 | \n", + "Adams | \n", + "Andrew | \n", + "General Manager | \n", + "NaN | \n", + "1962-02-18 00:00:00 | \n", + "2002-08-14 00:00:00 | \n", + "11120 Jasper Ave NW | \n", + "Edmonton | \n", + "AB | \n", + "Canada | \n", + "T5K 2N1 | \n", + "+1 (780) 428-9482 | \n", + "+1 (780) 428-3457 | \n", + "andrew@chinookcorp.com | \n", + "
| 1 | \n", + "2 | \n", + "Edwards | \n", + "Nancy | \n", + "Sales Manager | \n", + "1.0 | \n", + "1958-12-08 00:00:00 | \n", + "2002-05-01 00:00:00 | \n", + "825 8 Ave SW | \n", + "Calgary | \n", + "AB | \n", + "Canada | \n", + "T2P 2T3 | \n", + "+1 (403) 262-3443 | \n", + "+1 (403) 262-3322 | \n", + "nancy@chinookcorp.com | \n", + "
| 2 | \n", + "3 | \n", + "Peacock | \n", + "Jane | \n", + "Sales Support Agent | \n", + "2.0 | \n", + "1973-08-29 00:00:00 | \n", + "2002-04-01 00:00:00 | \n", + "1111 6 Ave SW | \n", + "Calgary | \n", + "AB | \n", + "Canada | \n", + "T2P 5M5 | \n", + "+1 (403) 262-3443 | \n", + "+1 (403) 262-6712 | \n", + "jane@chinookcorp.com | \n", + "
| 3 | \n", + "4 | \n", + "Park | \n", + "Margaret | \n", + "Sales Support Agent | \n", + "2.0 | \n", + "1947-09-19 00:00:00 | \n", + "2003-05-03 00:00:00 | \n", + "683 10 Street SW | \n", + "Calgary | \n", + "AB | \n", + "Canada | \n", + "T2P 5G3 | \n", + "+1 (403) 263-4423 | \n", + "+1 (403) 263-4289 | \n", + "margaret@chinookcorp.com | \n", + "
| 4 | \n", + "5 | \n", + "Johnson | \n", + "Steve | \n", + "Sales Support Agent | \n", + "2.0 | \n", + "1965-03-03 00:00:00 | \n", + "2003-10-17 00:00:00 | \n", + "7727B 41 Ave | \n", + "Calgary | \n", + "AB | \n", + "Canada | \n", + "T3B 1Y7 | \n", + "1 (780) 836-9987 | \n", + "1 (780) 836-9543 | \n", + "steve@chinookcorp.com | \n", + "
| 5 | \n", + "6 | \n", + "Mitchell | \n", + "Michael | \n", + "IT Manager | \n", + "1.0 | \n", + "1973-07-01 00:00:00 | \n", + "2003-10-17 00:00:00 | \n", + "5827 Bowness Road NW | \n", + "Calgary | \n", + "AB | \n", + "Canada | \n", + "T3B 0C5 | \n", + "+1 (403) 246-9887 | \n", + "+1 (403) 246-9899 | \n", + "michael@chinookcorp.com | \n", + "
| 6 | \n", + "7 | \n", + "King | \n", + "Robert | \n", + "IT Staff | \n", + "6.0 | \n", + "1970-05-29 00:00:00 | \n", + "2004-01-02 00:00:00 | \n", + "590 Columbia Boulevard West | \n", + "Lethbridge | \n", + "AB | \n", + "Canada | \n", + "T1K 5N8 | \n", + "+1 (403) 456-9986 | \n", + "+1 (403) 456-8485 | \n", + "robert@chinookcorp.com | \n", + "
| 7 | \n", + "8 | \n", + "Callahan | \n", + "Laura | \n", + "IT Staff | \n", + "6.0 | \n", + "1968-01-09 00:00:00 | \n", + "2004-03-04 00:00:00 | \n", + "923 7 ST NW | \n", + "Lethbridge | \n", + "AB | \n", + "Canada | \n", + "T1H 1Y8 | \n", + "+1 (403) 467-3351 | \n", + "+1 (403) 467-8772 | \n", + "laura@chinookcorp.com | \n", + "