mebauer
diff --git a/‎1-reading-writing-files.ipynb
Lines changed: 70 additions & 48 deletions b/‎1-reading-writing-files.ipynb
Lines changed: 70 additions & 48 deletions
@@ -47,7 +47,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "**Goal:** In this notebook, we review various ways to read (load) and write (save) data from NYC Open Data. Specifically, we will focus on reading our data into a pandas dataframe.\n",
+    "**Goal:** In this notebook, we review various ways to read (load) and write (save) data from NYC Open Data. Specifically, we focus on reading our data into a pandas dataframe.\n",
     "\n",
     "**Main Library:** [pandas](https://pandas.pydata.org/) is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language."
    ]
@@ -112,6 +112,7 @@
     }
    ],
    "source": [
+    "# watermark\n",
     "%reload_ext watermark\n",
     "%watermark -v -p numpy,pandas,geopandas,matplotlib,json,requests,sodapy"
    ]
@@ -399,7 +400,7 @@
     "\n",
     "2) We use pandas `.shape` method to print the dimensions of the dataframe (i.e. number of rows, number of columns).\n",
     "\n",
-    "We will use these two methods throughout the examples."
+    "We will use these two methods extensively throughout the examples."
    ]
   },
   {
@@ -833,7 +834,7 @@
    "metadata": {},
    "source": [
     "## 1.4 Reading in a Shapefile\n",
-    "Information about [Shapefiles](https://desktop.arcgis.com/en/arcmap/latest/manage-data/shapefiles/what-is-a-shapefile.htm#:~:text=A%20shapefile%20is%20a%20simple,%2C%20or%20polygons%20(areas).)."
+    "Information about [Shapefiles](https://desktop.arcgis.com/en/arcmap/latest/manage-data/shapefiles/what-is-a-shapefile.htm#:~:text=A%20shapefile%20is%20a%20simple,%2C%20or%20polygons%20(areas).). A popular Geospatial file format."
    ]
   },
   {
@@ -1310,7 +1311,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 2.1 Unzipping and reading in data as CSV to local folder\n",
+    "## 2.1 Unzipping and Reading In Data as CSV to Local Folder\n",
     "\n",
     "We will retrieve, unzip and read data in our downloads folder. Note: I'm using a Mac.\n",
     "\n",
@@ -1614,7 +1615,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 2.2 Unzipping and reading in data as CSV from local folder"
+    "## 2.2 Unzipping and Reading In Data as CSV from Local Folder"
    ]
   },
   {
@@ -1648,7 +1649,7 @@
     }
    ],
    "source": [
-    "# list files in this file path\n",
+    "# list files in this folder path\n",
     "%ls data/unzipped-data/"
    ]
   },
@@ -1862,7 +1863,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 2.3 Unzipping and reading in data as a CSV in-memory"
+    "## 2.3 Unzipping and Reading In Data as a CSV In-Memory"
    ]
   },
   {
@@ -2106,7 +2107,7 @@
     }
    ],
    "source": [
-    "# read our csv data into a dataframe from our ZIP file\n",
+    "# read our CSV data into a dataframe from our ZIP file\n",
     "file = zf.open('pluto_20v1.csv')\n",
     "pluto_data = pd.read_csv(file, low_memory=False)\n",
     "\n",
@@ -2136,14 +2137,14 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# 3. Reading in data from NYC Open Data"
+    "# 3. Reading In Data from NYC Open Data"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 3.1 Reading in data as CSV in static form"
+    "## 3.1 Reading In Data as CSV in Static Form"
    ]
   },
   {
@@ -2365,8 +2366,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 3.2 Reading in data as JSON in static form\n",
-    "Note: I do not read data on NYC Open Data this way, but reading JSON in static form does come up, especially when you're working with JSON data from the web. Here, I demonstrate a sample workflow.\n",
+    "## 3.2 Reading In Data as JSON in Static Form\n",
+    "Note: I do not read data on NYC Open Data this way, but reading JSON in static form does come up, especially when you're working with JSON data from the web. Understanding the structure of JSON data is a good skill. Here, I demonstrate a sample workflow.\n",
     "![building_footprints_csv](images/building-footprints-json.png)"
    ]
   },
@@ -2636,6 +2637,8 @@
     "\n",
     "# loop through columns and save to list\n",
     "for col in cols:\n",
+    "    \n",
+    "    # sanity check\n",
     "    print(col['fieldName'])\n",
     "    \n",
     "    # append column name to list\n",
@@ -3072,7 +3075,7 @@
     }
    ],
    "source": [
-    "# sanity check\n",
+    "# column info sanity check\n",
     "df_json.info()"
    ]
   },
@@ -3100,7 +3103,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 3.3 Reading in Shapefile data \n",
+    "## 3.3 Reading In Shapefile Data \n",
     "![borough-boundaries-shp](images/borough-boundaries-shp.png)"
    ]
   },
@@ -3236,14 +3239,32 @@
    "cell_type": "code",
    "execution_count": 38,
    "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "<class 'geopandas.geodataframe.GeoDataFrame'>\n"
+     ]
+    }
+   ],
+   "source": [
+    "# preview object type\n",
+    "print(type(gdf))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 39,
+   "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
        "<AxesSubplot: >"
       ]
      },
-     "execution_count": 38,
+     "execution_count": 39,
      "metadata": {},
      "output_type": "execute_result"
     },
@@ -3296,7 +3317,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 39,
+   "execution_count": 40,
    "metadata": {},
    "outputs": [
     {
@@ -3311,7 +3332,7 @@
      "output_type": "stream",
      "text": [
       "client.__dict__:\n",
-      "{'domain': 'data.cityofnewyork.us', 'session': <requests.sessions.Session object at 0x1675f24d0>, 'uri_prefix': 'https://', 'timeout': 10}\n"
+      "{'domain': 'data.cityofnewyork.us', 'session': <requests.sessions.Session object at 0x160c734d0>, 'uri_prefix': 'https://', 'timeout': 10}\n"
      ]
     }
    ],
@@ -3331,7 +3352,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 40,
+   "execution_count": 41,
    "metadata": {},
    "outputs": [
     {
@@ -3497,14 +3518,14 @@
        "4  {525F2C24-616B-4F29-98A3-8FEA5D4B1A7D}  "
       ]
      },
-     "execution_count": 40,
+     "execution_count": 41,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "# we set the limit at 1,000 rows\n",
-    "# increase limit if you'd like, but there might be a limit if you haven't signed up for an app token\n",
+    "# we manually set the limit at 1,000 rows\n",
+    "# increase limit if you'd like, but there *might* be a limit if you haven't signed up for an app token\n",
     "limit = 1_000\n",
     "\n",
     "# get data by passing dataset id and limit number\n",
@@ -3522,7 +3543,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 41,
+   "execution_count": 42,
    "metadata": {},
    "outputs": [
     {
@@ -3542,13 +3563,13 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# 4. Writing out data\n",
+    "# 4. Writing Out Data\n",
     "Save data to specified destination."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 42,
+   "execution_count": 43,
    "metadata": {},
    "outputs": [
     {
@@ -3720,7 +3741,7 @@
        "4  Other (Man  {BB58FD7B-CC22-4896-901D-F8BAFF4AC129}  "
       ]
      },
-     "execution_count": 42,
+     "execution_count": 43,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -3738,25 +3759,25 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 4.1 Writing to a CSV file"
+    "## 4.1 Writing to a CSV File"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 43,
+   "execution_count": 44,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "README.md                      sample-data.geojson\r\n",
-      "building-footprints-pluto.csv  sample-data.gpkg\r\n",
+      "README.md                      sample-data.csv\r\n",
+      "building-footprints-pluto.csv  sample-data.geojson\r\n",
+      "nta-shape.geojson              sample-data.gpkg\r\n",
       "output.csv                     sample-data.json\r\n",
       "output.json                    sample-data.xlsx\r\n",
       "output.xlsx                    \u001b[34mshapefile\u001b[m\u001b[m/\r\n",
-      "sample-buildings.zip           \u001b[34munzipped-data\u001b[m\u001b[m/\r\n",
-      "sample-data.csv\r\n"
+      "sample-buildings.zip           \u001b[34munzipped-data\u001b[m\u001b[m/\r\n"
      ]
     }
    ],
@@ -3773,25 +3794,25 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 4.2 Writing to an Excel (xlsx) file"
+    "## 4.2 Writing to an Excel (xlsx) File"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 44,
+   "execution_count": 45,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "README.md                      sample-data.geojson\r\n",
-      "building-footprints-pluto.csv  sample-data.gpkg\r\n",
+      "README.md                      sample-data.csv\r\n",
+      "building-footprints-pluto.csv  sample-data.geojson\r\n",
+      "nta-shape.geojson              sample-data.gpkg\r\n",
       "output.csv                     sample-data.json\r\n",
       "output.json                    sample-data.xlsx\r\n",
       "output.xlsx                    \u001b[34mshapefile\u001b[m\u001b[m/\r\n",
-      "sample-buildings.zip           \u001b[34munzipped-data\u001b[m\u001b[m/\r\n",
-      "sample-data.csv\r\n"
+      "sample-buildings.zip           \u001b[34munzipped-data\u001b[m\u001b[m/\r\n"
      ]
     }
    ],
@@ -3813,20 +3834,20 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 45,
+   "execution_count": 46,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "README.md                      sample-data.geojson\r\n",
-      "building-footprints-pluto.csv  sample-data.gpkg\r\n",
+      "README.md                      sample-data.csv\r\n",
+      "building-footprints-pluto.csv  sample-data.geojson\r\n",
+      "nta-shape.geojson              sample-data.gpkg\r\n",
       "output.csv                     sample-data.json\r\n",
       "output.json                    sample-data.xlsx\r\n",
       "output.xlsx                    \u001b[34mshapefile\u001b[m\u001b[m/\r\n",
-      "sample-buildings.zip           \u001b[34munzipped-data\u001b[m\u001b[m/\r\n",
-      "sample-data.csv\r\n"
+      "sample-buildings.zip           \u001b[34munzipped-data\u001b[m\u001b[m/\r\n"
      ]
     }
    ],
@@ -3848,7 +3869,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 46,
+   "execution_count": 47,
    "metadata": {},
    "outputs": [
     {
@@ -4033,7 +4054,7 @@
        "4  POLYGON ((-73.84769 40.87912, -73.84784 40.879...  "
       ]
      },
-     "execution_count": 46,
+     "execution_count": 47,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -4049,16 +4070,17 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 47,
+   "execution_count": 48,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "output.cpg       output.shp       sample-data.dbf  sample-data.shx\r\n",
-      "output.dbf       output.shx       sample-data.prj\r\n",
-      "output.prj       sample-data.cpg  sample-data.shp\r\n"
+      "nta-shape.cpg    nta-shape.shx    output.shp       sample-data.prj\r\n",
+      "nta-shape.dbf    output.cpg       output.shx       sample-data.shp\r\n",
+      "nta-shape.prj    output.dbf       sample-data.cpg  sample-data.shx\r\n",
+      "nta-shape.shp    output.prj       sample-data.dbf\r\n"
      ]
     }
    ],