From c15e6e8c75773751449dc58775ab730f6e097f6e Mon Sep 17 00:00:00 2001
From: Asabeneh <asabeneh@gmail.com>
Date: Fri, 9 Jul 2021 02:10:08 +0300
Subject: [PATCH] pandas

---
 24_Day_Statistics/24_statistics.md |   2 +-
 25_Day_Pandas/25_pandas.md         | 196 ++++++++---------------------
 2 files changed, 52 insertions(+), 146 deletions(-)
diff --git a/24_Day_Statistics/24_statistics.md b/24_Day_Statistics/24_statistics.md
index cd9f4d32..4978e3c2 100644
--- a/24_Day_Statistics/24_statistics.md
+++ b/24_Day_Statistics/24_statistics.md
@@ -1182,7 +1182,7 @@ np_arr + 2
 
 array([ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
 
-We use linear equation for quatities which have linear relationship. Let's see the example below:
+We use linear equation for quantities which have linear relationship. Let's see the example below:
 
 ```python
 temp = np.array([1,2,3,4,5])
diff --git a/25_Day_Pandas/25_pandas.md b/25_Day_Pandas/25_pandas.md
index d3acd118..4f1c8216 100644
--- a/25_Day_Pandas/25_pandas.md
+++ b/25_Day_Pandas/25_pandas.md
@@ -9,9 +9,9 @@
 
   <sub>Author:
   <a href="https://www.linkedin.com/in/asabeneh/" target="_blank">Asabeneh Yetayeh</a><br>
-  <small> First Edition: Nov 22 - Dec 22, 2019</small>
+  <small>Second Edition: July, 2021</small>
   </sub>
-</div>
+
 </div>
 
 [<< Day 24](../24_Day_Statistics/24_statistics.md) | [Day 26 >>](../26_Day_Python_web/26_python_web.md)
@@ -20,12 +20,13 @@
 
 - [📘 Day 25](#-day-25)
   - [Pandas](#pandas)
-  - [Importing Pandas](#importing-pandas)
+    - [Installing Pandas](#installing-pandas)
+    - [Importing Pandas](#importing-pandas)
     - [Creating Pandas Series with Default Index](#creating-pandas-series-with-default-index)
-    - [Creating  Pandas Series with custom index](#creating-pandas-series-with-custom-index)
+    - [Creating  Pandas Series with custom index](#creating--pandas-series-with-custom-index)
     - [Creating Pandas Series from a Dictionary](#creating-pandas-series-from-a-dictionary)
     - [Creating a Constant Pandas Series](#creating-a-constant-pandas-series)
-    - [Creating a  Pandas Series Using Linspace](#creating-a-pandas-series-using-linspace)
+    - [Creating a  Pandas Series Using Linspace](#creating-a--pandas-series-using-linspace)
   - [DataFrames](#dataframes)
     - [Creating DataFrames from List of Lists](#creating-dataframes-from-list-of-lists)
     - [Creating DataFrame Using Dictionary](#creating-dataframe-using-dictionary)
@@ -40,12 +41,24 @@
   - [Checking data types of Column values](#checking-data-types-of-column-values)
     - [Boolean Indexing](#boolean-indexing)
   - [Exercises: Day 25](#exercises-day-25)
+  
 # 📘 Day 25
+
 ## Pandas
 
 Pandas is an open source, high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
-Pandas adds data structures and tools designed to work with table-like data which is Series and Data Frames.
-Pandas provides tools for data manipulation: reshaping, merging, sorting, slicing, aggregation and imputation.
+Pandas adds data structures and tools designed to work with table-like data which is *Series* and *Data Frames*.
+Pandas provides tools for data manipulation: 
+
+- reshaping
+- merging
+- sorting
+- slicing
+- aggregation
+- imputation.
+If you are using anaconda, you do not have install pandas.
+
+### Installing Pandas
 
 For Mac:
 ```py
@@ -59,9 +72,10 @@ pip install conda
 pip install pandas
 ```
 
-Pandas data structure is based on *Series* and *DataFrames*
-A series is a column and a DataFrame is a multidimensional table made up of collection of series. In order to create a pandas series we should use numpy to create a one dimensional arrays or a python list.
-Let's see an example of a series:
+Pandas data structure is based on *Series* and *DataFrames*. 
+
+A *series* is a *column* and a DataFrame is a *multidimensional table* made up of collection of *series*. In order to create a pandas series we should use numpy to create a one dimensional arrays or a python list.
+Let us see an example of a series:
 
 Names Pandas Series
 
@@ -77,19 +91,17 @@ Cities Series
 
 As you can see, pandas series is just one column of data. If we want to have multiple columns we use data frames. The example below shows pandas DataFrames.
 
-Let's see, an example of a pandas data frame:
+Let us see, an example of a pandas data frame:
 
 ![Pandas data frame](../images/pandas-dataframe-1.png)
 
 Data frame is a collection of rows and columns. Look at the table below; it has many more columns than the example above:
 
-
 ![Pandas data frame](../images/pandas-dataframe-2.png)
 
 Next, we will see how to import pandas and how to create Series and DataFrames using pandas
 
-## Importing Pandas
-
+### Importing Pandas
 
 ```python
 import pandas as pd # importing pandas as pd
@@ -98,14 +110,12 @@ import numpy  as np # importing numpy as np
 
 ### Creating Pandas Series with Default Index
 
-
 ```python
 nums = [1, 2, 3, 4,5]
 s = pd.Series(nums)
 print(s)
 ```
 
-
 ```sh
     0    1
     1    2
@@ -115,19 +125,14 @@ print(s)
     dtype: int64
 ```
 
-
 ### Creating  Pandas Series with custom index
 
-
 ```python
 nums = [1, 2, 3, 4, 5]
 s = pd.Series(nums, index=[1, 2, 3, 4, 5])
 print(s)
-
 ```
 
-
-
 ```sh
     1    1
     2    2
@@ -137,39 +142,30 @@ print(s)
     dtype: int64
 ```
 
-
-
 ```python
-fruits = ['Orange','Banana','Mangao']
+fruits = ['Orange','Banana','Mango']
 fruits = pd.Series(fruits, index=[1, 2, 3])
 print(fruits)
 ```
 
-
-
 ```sh
     1    Orange
     2    Banana
-    3    Mangao
+    3    Mango
     dtype: object
 ```
 
-
 ### Creating Pandas Series from a Dictionary
 
-
 ```python
 dct = {'name':'Asabeneh','country':'Finland','city':'Helsinki'}
 ```
 
-
 ```python
 s = pd.Series(dct)
 print(s)
 ```
 
-
-
 ```sh
     name       Asabeneh
     country     Finland
@@ -177,17 +173,13 @@ print(s)
     dtype: object
 ```
 
-
 ### Creating a Constant Pandas Series
 
-
 ```python
-s = pd.Series(10, index = [1, 2,3])
+s = pd.Series(10, index = [1, 2, 3])
 print(s)
 ```
 
-
-
 ```sh
     1    10
     2    10
@@ -195,17 +187,13 @@ print(s)
     dtype: int64
 ```
 
-
 ### Creating a  Pandas Series Using Linspace
 
-
 ```python
 s = pd.Series(np.linspace(5, 20, 10)) # linspace(starting, end, items)
 print(s)
 ```
 
-
-
 ```sh
     0     5.000000
     1     6.666667
@@ -226,7 +214,6 @@ Pandas data frames can be created in different ways.
 
 ### Creating DataFrames from List of Lists
 
-
 ```python
 data = [
     ['Asabeneh', 'Finland', 'Helsink'], 
@@ -270,7 +257,6 @@ print(df)
 
 ### Creating DataFrame Using Dictionary
 
-
 ```python
 data = {'Name': ['Asabeneh', 'David', 'John'], 'Country':[
     'Finland', 'UK', 'Sweden'], 'City': ['Helsiki', 'London', 'Stockholm']}
@@ -278,7 +264,6 @@ df = pd.DataFrame(data)
 print(df)
 ```
 
-
 <table border="1" class="dataframe">
   <thead>
     <tr style="text-align: right;">
@@ -310,10 +295,8 @@ print(df)
   </tbody>
 </table>
 
-
 ### Creating DataFrames from a List of Dictionaries
 
-
 ```python
 data = [
     {'Name': 'Asabeneh', 'Country': 'Finland', 'City': 'Helsinki'},
@@ -323,7 +306,6 @@ df = pd.DataFrame(data)
 print(df)
 ```
 
-
 <table border="1" class="dataframe">
   <thead>
     <tr style="text-align: right;">
@@ -355,17 +337,16 @@ print(df)
   </tbody>
 </table>
 
-
-
-
 ## Reading CSV File Using Pandas
 
-To download the csv file, needed in this example, console/command line is enough:
+To download the CSV file, what is needed in this example, console/command line is enough:
 
 ```sh
 curl -O https://raw.githubusercontent.com/Asabeneh/30-Days-Of-Python/master/data/weight-height.csv
 ```
 
+Put the downloaded file in your working directory.
+
 ```python
 import pandas as pd
 
@@ -374,8 +355,8 @@ print(df)
 ```
 
 ### Data Exploration
-Let's read only the first 5 rows using head()
 
+Let us read only the first 5 rows using head()
 
 ```python
 print(df.head()) # give five rows we can increase the number of rows by passing argument to the head() method
@@ -425,45 +406,12 @@ print(df.head()) # give five rows we can increase the number of rows by passing
   </tbody>
 </table>
 
-
-
-As you can see the csv file has three rows: Gender, Height and Weight. But we don't know the number of rows. Let's use shape meathod.
-
-
-```python
-print(df.shape) # as you can see 10000 rows and three columns
-```
-
-
-
-
-    (10000, 3)
-
-
-
-Let's get all the columns using columns.
-
-
-
-```python
-print(df.columns)
-```
-
-
-
-
-    Index(['Gender', 'Height', 'Weight'], dtype='object')
-
-
-
-Let's read only the last 5 rows using tail()
-
+Let us also explore the last recordings of the dataframe using the tail() methods.
 
 ```python
 print(df.tail()) # tails give the last five rows, we can increase the rows by passing argument to tail method
 ```
 
-
 <table border="1" class="dataframe">
   <thead>
     <tr style="text-align: right;">
@@ -507,21 +455,31 @@ print(df.tail()) # tails give the last five rows, we can increase the rows by pa
   </tbody>
 </table>
 
+As you can see the csv file has three rows: Gender, Height and Weight. If the DataFrame would have a long rows, it would be hard to know all the columns. Therefore, we should use a method to know the colums.  we do not know the number of rows. Let's use shape meathod.
 
-Now, lets get a specific column using the column key
+```python
+print(df.shape) # as you can see 10000 rows and three columns
+```
 
+    (10000, 3)
 
+Let us get all the columns using columns.
 
 ```python
-heights = df['Height'] # this is now a series
+print(df.columns)
 ```
 
+    Index(['Gender', 'Height', 'Weight'], dtype='object')
+
+Now, let us get a specific column using the column key
 
 ```python
-print(heights)
+heights = df['Height'] # this is now a series
 ```
 
-
+```python
+print(heights)
+```
 
 ```sh
     0       73.847017
@@ -538,18 +496,14 @@ print(heights)
     Name: Height, Length: 10000, dtype: float64
 ```
 
-
-
 ```python
 weights = df['Weight'] # this is now a series
 ```
 
-
 ```python
 print(weights)
 ```
 
-
 ```sh
     0       241.893563
     1       162.310473
@@ -565,25 +519,18 @@ print(weights)
     Name: Weight, Length: 10000, dtype: float64
 ```
 
-
-
 ```python
 print(len(heights) == len(weights))
 ```
 
-
-
-
     True
 
-
-
+The describe() method provides a descriptive statistical values of a dataset.
 
 ```python
 print(heights.describe()) # give statisical information about height data
 ```
 
-
 ```sh
     count    10000.000000
     mean        66.367560
@@ -596,14 +543,10 @@ print(heights.describe()) # give statisical information about height data
     Name: Height, dtype: float64
 ```
 
-
-
 ```python
 print(weights.describe())
 ```
 
-
-
 ```sh
     count    10000.000000
     mean       161.440357
@@ -616,13 +559,10 @@ print(weights.describe())
     Name: Weight, dtype: float64
 ```
 
-
-
 ```python
 print(df.describe())  # describe can also give statistical information from a dataFrame
 ```
 
-
 <table border="1" class="dataframe">
   <thead>
     <tr style="text-align: right;">
@@ -675,11 +615,10 @@ print(df.describe())  # describe can also give statistical information from a da
   </tbody>
 </table>
 
+Similar to describe(), the info() method also give information about the dataset.
 
 ## Modifying a DataFrame
 
-
-
 Modifying a DataFrame:
     * We can create a new DataFrame
     * We can create a new column and add it to the DataFrame, 
@@ -691,7 +630,6 @@ Modifying a DataFrame:
 
 As always, first we import the necessary packages. Now, lets import pandas and numpy, two best friends ever.
 
-
 ```python
 import pandas as pd
 import numpy as np
@@ -734,14 +672,13 @@ print(df)
   </tbody>
 </table>
 
-
 Adding a column to a DataFrame is like adding a key to a dictionary.
 
 First let's use the previous example to create a DataFrame. After we create the DataFrame, we will start modifying the columns and column values.
 
 ### Adding a New Column
-Let's add a weight column in the DataFrame
 
+Let's add a weight column in the DataFrame
 
 ```python
 weights = [74, 78, 69]
@@ -749,7 +686,6 @@ df['Weight'] = weights
 df
 ```
 
-
 <table border="1" class="dataframe">
   <thead>
     <tr style="text-align: right;">
@@ -787,14 +723,12 @@ df
 
 Let's add a height column into the DataFrame aswell
 
-
 ```python
 heights = [173, 175, 169]
 df['Height'] = heights
 print(df)
 ```
 
-
 <table border="1" class="dataframe">
   <thead>
     <tr style="text-align: right;">
@@ -840,7 +774,6 @@ As you can see, the height is in centimeters, so we shoud change it to meters. L
 
 ### Modifying column values
 
-
 ```python
 df['Height'] = df['Height'] * 0.01
 df
@@ -949,19 +882,15 @@ df
   </tbody>
 </table>
 
-
-
 ### Formating DataFrame columns
 
 The BMI column values of the DataFrame are float with many significant digits after decimal. Let's change it to one significant digit after point.
 
-
 ```python
 df['BMI'] = round(df['BMI'], 1)
 print(df)
 ```
 
-
 <table border="1" class="dataframe">
   <thead>
     <tr style="text-align: right;">
@@ -1007,7 +936,6 @@ print(df)
 
 The information in the DataFrame seems not yet complete, let's add birth year and current year columns.
 
-
 ```python
 birth_year = ['1769', '1985', '1990']
 current_year = pd.Series(2020, index=[0, 1,2])
@@ -1016,7 +944,6 @@ df['Current Year'] = current_year
 df
 ```
 
-
 <table border="1" class="dataframe">
   <thead>
     <tr style="text-align: right;">
@@ -1068,36 +995,26 @@ df
   </tbody>
 </table>
 
-
 ## Checking data types of Column values
 
-
 ```python
 print(df.Weight.dtype)
 ```
 
-
-
 ```sh
     dtype('int64')
 ```
 
-
-
 ```python
 df['Birth Year'].dtype # it gives string object , we should change this to number
 
 ```
 
-
-
 ```python
 df['Birth Year'] = df['Birth Year'].astype('int')
 print(df['Birth Year'].dtype) # let's check the data type now
 ```
 
-
-
 ```sh
     dtype('int32')
 ```
@@ -1113,32 +1030,23 @@ df['Current Year'].dtype
     dtype('int32')
 ```
 
-
 Now, the column values of birth year and current year are integers. We can calculate the age.
 
-
 ```python
 ages = df['Current Year'] - df['Birth Year']
 ages
 ```
 
-
-
-
     0    251
     1     35
     2     30
     dtype: int32
 
-
-
-
 ```python
 df['Ages'] = ages
 print(df)
 ```
 
-
 <table border="1" class="dataframe">
   <thead>
     <tr style="text-align: right;">
@@ -1198,13 +1106,11 @@ The person in the first row lived so far for 251 years. It is unlikely for someo
 
 mean = (35 + 30)/ 2
 
-
 ```python
 mean = (35 + 30)/ 2
 print('Mean: ',mean)	#it is good to add some description to the output, so we know what is what
 ```
 
-
 ```sh
    Mean:  32.5
 ```