diff --git a/numpy.md b/numpy.md index 84e317d4..97298e67 100644 --- a/numpy.md +++ b/numpy.md @@ -1,4 +1,4 @@ -- [📘 Day 24](#%f0%9f%93%98-day-24) +- [📘 Day 24](#-day-24) - [Python for Statistical Analysis](#python-for-statistical-analysis) - [Statistics](#statistics) - [Data](#data) @@ -30,7 +30,7 @@ - [Slicing Numpy array](#slicing-numpy-array) - [How to reverse the rows and the whole array?](#how-to-reverse-the-rows-and-the-whole-array) - [Reverse the row and column positions](#reverse-the-row-and-column-positions) - - [How to represent missing values ?](#how-to-represent-missing-values) + - [How to represent missing values ?](#how-to-represent-missing-values-) - [Generating Random Numbers](#generating-random-numbers) - [Generationg random numbers](#generationg-random-numbers) - [Numpy and Statistics](#numpy-and-statistics) @@ -44,20 +44,23 @@ - [Linear Algebra](#linear-algebra) - [NumPy Matrix Multiplication with np.matmul()](#numpy-matrix-multiplication-with-npmatmul) - [Summery](#summery) - - [💻 Exercises: Day 24](#%f0%9f%92%bb-exercises-day-24) + - [💻 Exercises: Day 24](#-exercises-day-24) + # 📘 Day 24 + ## Python for Statistical Analysis + ## Statistics Statistics is the discipline that studies the _collection_, _organization_, _displaying_, _analysis_, _interpretation_ and _presentation_ of data. -Statistics is a branch of mathematics that is recommended to be a prerequisite for data science and machine learning. Statistics is a very broad field but we will focus in this section only on the most relevant part. -After completing this challenge, you may go to web development, data analysis, machine learning and data science path. Whatever path you may follow, at some point in your career you will get data which you may work on. Having some statistical knowledge will help you to make decision based on data, *data tells as they say*. +Statistics is a branch of mathematics that is recommended to be a prerequisite for data science and machine learning. Statistics is a very broad field but we will focus in this section only on the most relevant part. +After completing this challenge, you may go to web development, data analysis, machine learning and data science path. Whatever path you may follow, at some point in your career you will get data which you may work on. Having some statistical knowledge will help you to make decision based on data, _data tells as they say_. ## Data What is data? Data is any set of characters that is gathered and translated for some purpose, usually analysis. It can be any character, including text and numbers, pictures, sound, or video. If data is not put into context, it doesn't give any sense to a human or computer. To make sense from data we need to work on the data using different tools. -The work flow of data analysis, data science or machine learning starts from data. Data can be provided from some data source or it can be created. There are structured and and unstructure data. +The work flow of data analysis, data science or machine learning starts from data. Data can be provided from some data source or it can be created. There are structured and and unstructure data. Data can be found as small or big data format. Most of the data types we will get have been covered in the file handling section. @@ -76,6 +79,7 @@ So far, we have been using vscode but from now on I would recommend using Jupyte ```sh asabeneh@Asabeneh:~/Desktop/30DaysOfPython$ pip install numpy ``` + ## Importing NumPy Jupyter notebook is available if your are in favor of [jupyter notebook](https://github.com/Asabeneh/data-science-for-everyone/blob/master/numpy/numpy.ipynb) @@ -90,6 +94,7 @@ Jupyter notebook is available if your are in favor of [jupyter notebook](https:/ ``` ## Creating numpy array using + ### Creating int numpy arrays ```py @@ -98,7 +103,7 @@ Jupyter notebook is available if your are in favor of [jupyter notebook](https:/ # Checking data types print('Type:', type (python_list)) # - # + # print(python_list) # [1, 2, 3, 4, 5] two_dimensional_list = [[0,1,2], [3,4,5], [6,7,8]] @@ -107,13 +112,15 @@ Jupyter notebook is available if your are in favor of [jupyter notebook](https:/ # Creating Numpy(Numerical Python) array from python list - numpy_array_from_list = np.array(python_list) + numpy_array_from_list = np.array(python_list) print(type (numpy_array_from_list)) # print(numpy_array_from_list) # array([1, 2, 3, 4, 5]) ``` ### Creating float numpy arrays + Creating a float numpy array from list with a float data type parameter + ```py # Python list python_list = [1,2,3,4,5] @@ -123,14 +130,17 @@ Creating a float numpy array from list with a float data type parameter ``` ### Creating boolean numpy arrays -Creating a boolean a numpy array from list + +Creating a boolean a numpy array from list + ```py numpy_bool_array = np.array([0, 1, -1, 0, 0], dtype=bool) print(numpy_bool_array) # array([False, True, True, False, False]) ``` ### Creating multidimensional array using numpy -A numpy array may have one or multiple rors and columns + +A numpy array may have one or multiple rows and columns ```py two_dimensional_list = [[0,1,2], [3,4,5], [6,7,8]] @@ -138,6 +148,7 @@ A numpy array may have one or multiple rors and columns print(type (numpy_two_dimensional_list)) print(numpy_two_dimensional_list) ``` + ```sh [[0 1 2] @@ -154,6 +165,7 @@ print(type (np_to_list)) print('one dimensional array:', np_to_list) print('two dimensional array: ', numpy_two_dimensional_list.tolist()) ``` + ```sh one dimensional array: [1, 2, 3, 4, 5] @@ -161,6 +173,7 @@ print('two dimensional array: ', numpy_two_dimensional_list.tolist()) ``` ### Creating numpy array from tuple + ```py # Numpy array from tuple # Creating tuple in Python @@ -174,6 +187,7 @@ print('numpy_array_from_tuple: ', numpy_array_from_tuple) # numpy_array_from_tup ``` ### Shape of numpy array + The shape method provide the shape of the array as a tuple. The first is the row and the second is the column. If the array is just one dimensional it returns the size of the array. ```py @@ -187,6 +201,7 @@ The shape method provide the shape of the array as a tuple. The first is the row [8,9,10, 11]]) print(three_by_four_array.shape) ``` + ```sh [1 2 3 4 5] shape of nums: (5,) @@ -198,7 +213,9 @@ The shape method provide the shape of the array as a tuple. The first is the row ``` ### Data type of numpy array + Type of data types: str, int, float, complex, bool, list, None + ```py int_lists = [-3, -2, -1, 0, 1, 2,3] int_array = np.array(int_lists) @@ -209,6 +226,7 @@ print(int_array.dtype) print(float_array) print(float_array.dtype) ``` + ```sh [-3 -2 -1 0 1 2 3] int64 @@ -217,7 +235,9 @@ print(float_array.dtype) ``` ### Size of a numpy array + In numpy to know the number of items in a numpy array list we use size + ```py numpy_array_from_list = np.array([1, 2, 3, 4, 5]) two_dimensional_list = np.array([[0, 1, 2], @@ -228,6 +248,7 @@ print('The size:', numpy_array_from_list.size) # 5 print('The size:', two_dimensional_list.size) # 3 ``` + ```sh The size: 5 The size: 9 @@ -237,13 +258,14 @@ print('The size:', two_dimensional_list.size) # 3 Numpy array is not like exactly like python list. To do mathematical operation in pyhton list we have to loop through the items but numpy can allow to do any mathematical operation without looping. Mathematical Operation: -* Addition (+) -* Subtraction (-) -* Multiplication (*) -* Division (/) -* Modules (%) -* Floor Division(//) -* Exponential(**) + +- Addition (+) +- Subtraction (-) +- Multiplication (\*) +- Division (/) +- Modules (%) +- Floor Division(//) +- Exponential(\*\*) ### Addition @@ -256,6 +278,7 @@ ten_plus_original = numpy_array_from_list + 10 print(ten_plus_original) ``` + ```sh original array: [1 2 3 4 5] [11 12 13 14 15] @@ -270,6 +293,7 @@ print('original array: ', numpy_array_from_list) ten_minus_original = numpy_array_from_list - 10 print(ten_minus_original) ``` + ```sh original array: [1 2 3 4 5] [-9 -8 -7 -6 -5] @@ -284,6 +308,7 @@ print('original array: ', numpy_array_from_list) ten_times_original = numpy_array_from_list * 10 print(ten_times_original) ``` + ```sh original array: [1 2 3 4 5] [10 20 30 40 50] @@ -298,6 +323,7 @@ print('original array: ', numpy_array_from_list) ten_times_original = numpy_array_from_list / 10 print(ten_times_original) ``` + ```sh original array: [1 2 3 4 5] [0.1 0.2 0.3 0.4 0.5] @@ -312,12 +338,14 @@ print('original array: ', numpy_array_from_list) ten_times_original = numpy_array_from_list % 3 print(ten_times_original) ``` + ```sh original array: [1 2 3 4 5] [1 2 0 1 2] ``` ### Floor Division + ```py # Floor division: the division result without the remainder numpy_array_from_list = np.array([1, 2, 3, 4, 5]) @@ -327,6 +355,7 @@ print(ten_times_original) ``` ### Exponential + ```py # Exponential is finding some number the power of another: numpy_array_from_list = np.array([1, 2, 3, 4, 5]) @@ -334,6 +363,7 @@ print('original array: ', numpy_array_from_list) ten_times_original = numpy_array_from_list ** 2 print(ten_times_original) ``` + ```sh original array: [1 2 3 4 5] [ 1 4 9 16 25] @@ -351,6 +381,7 @@ print(numpy_int_arr.dtype) print(numpy_float_arr.dtype) print(numpy_bool_arr.dtype) ``` + ```sh int64 float64 @@ -360,12 +391,14 @@ print(numpy_bool_arr.dtype) ### Converting types We can convert the data types of numpy array + 1. Int to Float ```py numpy_int_arr = np.array([1,2,3,4], dtype = 'float') numpy_int_arr ``` + array([1., 2., 3., 4.]) 2. Float to Int @@ -374,6 +407,7 @@ numpy_int_arr numpy_int_arr = np.array([1., 2., 3., 4.], dtype = 'int') numpy_int_arr ``` + ```sh array([1, 2, 3, 4]) ``` @@ -384,6 +418,7 @@ numpy_int_arr np.array([-3, -2, 0, 1,2,3], dtype='bool') ``` + ```sh array([ True, True, False, True, True, True]) ``` @@ -393,6 +428,7 @@ np.array([-3, -2, 0, 1,2,3], dtype='bool') ```py numpy_float_list.astype('int').astype('str') ``` + ```sh array(['1', '2', '3'], dtype=' [[1 2 3] @@ -417,6 +454,7 @@ print('Data type:', two_dimension_array.dtype) Size: 9 Data type: int64 ``` + ### Getting items from a numpy array ```py @@ -429,6 +467,7 @@ print('First row:', first_row) print('Second row:', second_row) print('Third row: ', third_row) ``` + ```sh First row: [1 2 3] Second row: [4 5 6] @@ -445,6 +484,7 @@ print('Third column: ', third_column) print(two_dimension_array) ``` + ```sh First column: [1 4 7] Second column: [2 5 8] @@ -455,6 +495,7 @@ print(two_dimension_array) ``` ## Slicing Numpy array + Slicing in numpy is similar to slicing in python list ```py @@ -462,12 +503,14 @@ two_dimension_array = np.array([[1,2,3],[4,5,6], [7,8,9]]) first_two_rows_and_columns = two_dimension_array[0:2, 0:2] print(first_two_rows_and_columns) ``` + ```sh [[1 2] [4 5]] ``` ### How to reverse the rows and the whole array? + ```py two_dimension_array[::] ``` @@ -484,6 +527,7 @@ two_dimension_array[::] two_dimension_array = np.array([[1,2,3],[4,5,6], [7,8,9]]) two_dimension_array[::-1,::-1] ``` + ```sh array([[9, 8, 7], [6, 5, 4], @@ -498,6 +542,7 @@ two_dimension_array[::] two_dimension_array[1,2] =44 print(two_dimension_array) ``` + ```sh [[1 2 3] [4 5 6] @@ -513,6 +558,7 @@ two_dimension_array[::] numpy_zeroes = np.zeros((3,3),dtype=int,order='C') numpy_zeroes ``` + ```sh array([[0, 0, 0], [0, 0, 0], @@ -524,6 +570,7 @@ two_dimension_array[::] numpy_ones = np.ones((3,3),dtype=int,order='C') print(numpy_ones) ``` + ```sh [[1 1 1] [1 1 1] @@ -543,6 +590,7 @@ reshaped = first_shape.reshape(3,2) print(reshaped) ``` + ```sh [[1 2 3] [4 5 6]] @@ -555,6 +603,7 @@ print(reshaped) flattened = reshaped.flatten() flattened ``` + ```sh array([1, 2, 3, 4, 5, 6]) ``` @@ -568,6 +617,7 @@ flattened print('Horizontal Append:', np.hstack((np_list_one, np_list_two))) ``` + ```sh [5 7 9] Horizontal Append: [1 2 3 4 5 6] @@ -577,6 +627,7 @@ flattened ## Vertical Stack print('Vertical Append:', np.vstack((np_list_one, np_list_two))) ``` + ```sh Vertical Append: [[1 2 3] [4 5 6]] @@ -589,6 +640,7 @@ flattened random_float = np.random.random() random_float ``` + ```sh 0.018929887384753874 ``` @@ -598,6 +650,7 @@ flattened random_floats = np.random.random(5) random_floats ``` + ```sh array([0.26392192, 0.35842215, 0.87908478, 0.41902195, 0.78926418]) ``` @@ -608,6 +661,7 @@ flattened random_int = np.random.randint(0, 11) random_int ``` + ```sh 4 ``` @@ -617,6 +671,7 @@ flattened random_int = np.random.randint(2,10, size=4) random_int ``` + ```sh array([8, 8, 8, 2]) ``` @@ -626,18 +681,22 @@ flattened random_int = np.random.randint(2,10, size=(3,3)) random_int ``` + ```sh array([[3, 5, 3], [7, 3, 6], [2, 3, 3]]) ``` + ### Generationg random numbers + ```py # np.random.normal(mu, sigma, size) normal_array = np.random.normal(79, 15, 80) normal_array ``` + ```sh array([ 89.49990595, 82.06056961, 107.21445842, 38.69307086, 47.85259157, 93.07381061, 76.40724259, 78.55675184, @@ -669,6 +728,7 @@ import seaborn as sns sns.set() plt.hist(normal_array, color="grey", bins=50) ``` + ```sh (array([2., 0., 0., 0., 1., 2., 2., 0., 2., 0., 0., 1., 2., 2., 1., 4., 3., 4., 2., 7., 2., 2., 5., 4., 2., 4., 3., 2., 1., 5., 3., 0., 3., 2., @@ -688,14 +748,18 @@ plt.hist(normal_array, color="grey", bins=50) 119.20416383, 120.88147826, 122.5587927 ]), ) ``` + ### Matrix in numpy + ```py four_by_four_matrix = np.matrix(np.ones((4,4), dtype=float)) ``` + ```py four_by_four_matrix ``` + ```sh matrix([[1., 1., 1., 1.], [1., 1., 1., 1.], @@ -707,6 +771,7 @@ matrix([[1., 1., 1., 1.], np.asarray(four_by_four_matrix)[2] = 2 four_by_four_matrix ``` + ```sh matrix([[1., 1., 1., 1.], @@ -715,9 +780,10 @@ matrix([[1., 1., 1., 1.], [1., 1., 1., 1.]]) ``` +### Numpy numpy.arange() -### Numpy numpy.arange() #### What is Arrange? + Sometimes, you want to create values that are evenly spaced within a defined interval. For instance, you want to create values from 1 to 10; you can use numpy.arange() function ```py @@ -725,13 +791,16 @@ Sometimes, you want to create values that are evenly spaced within a defined int lst = range(0, 11, 2) lst ``` + ```python range(0, 11, 2) ``` + ```python for l in lst: print(l) ``` + ```sh 0 2 4 @@ -739,17 +808,18 @@ for l in lst: 8 10 ``` + ```py # Similar to range arange numpy.arange(start, stop, step) whole_numbers = np.arange(0, 20, 1) whole_numbers ``` + ```sh array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]) ``` - ```py natural_numbers = np.arange(1, 20, 1) natural_numbers @@ -759,6 +829,7 @@ natural_numbers odd_numbers = np.arange(1, 20, 2) odd_numbers ``` + ```sh array([ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19]) ``` @@ -767,16 +838,20 @@ odd_numbers even_numbers = np.arange(2, 20, 2) even_numbers ``` + ```sh array([ 2, 4, 6, 8, 10, 12, 14, 16, 18]) ``` + ### Creating sequence of numbers using linspace + ```py # numpy.linspace() # numpy.logspace() in Python with Example # For instance, it can be used to create 10 values from 1 to 5 evenly spaced. np.linspace(1.0, 5.0, num=10) ``` + ```sh array([1. , 1.44444444, 1.88888889, 2.33333333, 2.77777778, 3.22222222, 3.66666667, 4.11111111, 4.55555556, 5. ]) @@ -786,6 +861,7 @@ np.linspace(1.0, 5.0, num=10) # not to include the last value in the interval np.linspace(1.0, 5.0, num=5, endpoint=False) ``` + ``` array([1. , 1.8, 2.6, 3.4, 4.2]) ``` @@ -814,6 +890,7 @@ x = np.array([1,2,3], dtype=np.complex128) ```py x ``` + ```sh array([1.+0.j, 2.+0.j, 3.+0.j]) ``` @@ -832,6 +909,7 @@ np_list = np.array([(1,2,3), (4,5,6)]) np_list ``` + ```sh array([[1, 2, 3], [4, 5, 6]]) @@ -842,6 +920,7 @@ print('First row: ', np_list[0]) print('Second row: ', np_list[1]) ``` + ```sh First row: [1 2 3] @@ -854,13 +933,13 @@ print('Second column: ', np_list[:,1]) print('Third column: ', np_list[:,2]) ``` + ```sh First column: [1 4] Second column: [2 5] Third column: [3 6] ``` - ### NumPy Statistical Functions with Example NumPy has quite useful statistical functions for finding minimum, maximum, mean, median, percentile,standard deviation and variance, etc from the given elements in the array. @@ -877,7 +956,6 @@ Numpy is equipped with the robust statistical function as listed below - Percentile - Standard deviation np.std() - ```python np_normal_dis = np.random.normal(5, 0.5, 100) np_normal_dis @@ -894,8 +972,6 @@ print('sd: ', two_dimension_array.std()) mean: 14.777777777777779 sd: 18.913709183069525 - - ```python min: 1 max: 55 @@ -903,7 +979,6 @@ mean: 14.777777777777779 sd: 18.913709183069525 ``` - ```python print(two_dimension_array) print('Column with minimum: ', np.amin(two_dimension_array,axis=0)) @@ -922,10 +997,8 @@ print('Row with maximum: ', np.amax(two_dimension_array,axis=1)) Row with minimum: [1 4 7] Row with maximum: [ 3 55 9] - ### How to create repeating sequences? - ```python a = [1,2,3] @@ -940,10 +1013,8 @@ print('Repeat: ', np.repeat(a, 2)) Tile: [1 2 3 1 2 3] Repeat: [1 1 2 2 3 3] - ### How to generate random numbers? - ```python # One random number between [0,1) one_random_num = np.random.random() @@ -953,20 +1024,12 @@ print(one_random_num) 0.6149403282678213 - - ```python 0.4763968133790438 ``` - - - 0.4763968133790438 - - - ```python # Random numbers between [0,1) of shape 2,3 r = np.random.random(size=[2,3]) @@ -976,76 +1039,48 @@ print(r) [[0.13031737 0.4429537 0.1129527 ] [0.76811539 0.88256594 0.6754075 ]] - - ```python print(np.random.choice(['a', 'e', 'i', 'o', 'u'], size=10)) ``` ['u' 'o' 'o' 'i' 'e' 'e' 'u' 'o' 'u' 'a'] - - ```python ['i' 'u' 'e' 'o' 'a' 'i' 'e' 'u' 'o' 'i'] ``` - - - ['iueoaieuoi'] - - - ```python ## Random numbers between [0, 1] of shape 2, 2 rand = np.random.rand(2,2) rand ``` - - - array([[0.97992598, 0.79642484], [0.65263629, 0.55763145]]) - - - ```python rand2 = np.random.randn(2,2) rand2 ``` - - - array([[ 1.65593322, -0.52326621], [ 0.39071179, -2.03649407]]) - - - ```python # Random integers between [0, 10) of shape 2,5 rand_int = np.random.randint(0, 10, size=[5,3]) rand_int ``` - - - array([[0, 7, 5], [4, 1, 4], [3, 5, 3], [4, 3, 8], [4, 6, 7]]) - - - ```py from scipy import stats np_normal_dis = np.random.normal(5, 0.5, 1000) # mean, standard deviation, number of samples @@ -1058,6 +1093,7 @@ print('median: ', np.median(np_normal_dis)) print('mode: ', stats.mode(np_normal_dis)) print('sd: ', np.std(np_normal_dis)) ``` + ```sh min: 3.557811005458804 @@ -1074,11 +1110,8 @@ plt.hist(np_normal_dis, color="grey", bins=21) plt.show() ``` - ![png](test_files/test_121_0.png) - - ```python # numpy.dot(): Dot Product in Python using Numpy # Dot Product @@ -1089,10 +1122,9 @@ plt.show() # numpy.dot(x, y, out=None) ``` - ### Linear Algebra -1. Dot Product +1. Dot Product ```python ## Linear algebra @@ -1102,7 +1134,7 @@ g = np.array([4,5,3]) ### 1*4+2*5 + 3*6 np.dot(f, g) # 23 ``` - + ### NumPy Matrix Multiplication with np.matmul() ```python @@ -1112,11 +1144,13 @@ i = [[5,6],[7,8]] ### 1*5+2*7 = 19 np.matmul(h, i) ``` + ```sh array([[19, 22], [43, 50]]) ``` + ```py ## Determinant 2*2 matrix ### 5*8-7*6np.linalg.det(i) @@ -1125,8 +1159,8 @@ np.matmul(h, i) ```python np.linalg.det(i) ``` - -1.999999999999999 + -1.999999999999999 ```python Z = np.zeros((8,8)) @@ -1137,6 +1171,7 @@ Z[::2,1::2] = 1 ```python Z ``` + array([[0., 1., 0., 1., 0., 1., 0., 1.], [1., 0., 1., 0., 1., 0., 1., 0.], [0., 1., 0., 1., 0., 1., 0., 1.], @@ -1149,6 +1184,7 @@ Z ```python new_list = [ x + 2 for x in range(0, 11)] ``` + ```python new_list ``` @@ -1158,13 +1194,15 @@ new_list ```python [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] ``` + [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] ```python np_arr = np.array(range(0, 11)) np_arr + 2 ``` - array([ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]) + +array([ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]) We use linear equation for quatities which have linear relationship. Let's see the example below: @@ -1174,8 +1212,7 @@ pressure = temp * 2 + 5 pressure ``` -array([ 7, 9, 11, 13, 15]) - +array([ 7, 9, 11, 13, 15]) ```python plt.plot(temp,pressure) @@ -1188,10 +1225,8 @@ plt.show() ![png](test_files/test_141_0.png) - To draw the Gaussian normal distribution using numpy. As you can see below, the numpy can generate random numbers. To create random sample, we need the mean(mu), sigma(standard deviation), mumber of data points. - ```python mu = 28 sigma = 15 @@ -1216,4 +1251,5 @@ To summarise, the main differences with python lists are: 1. numpy arrays support boolean indexing. ## 💻 Exercises: Day 24 -1. Repeat all the examples \ No newline at end of file + +1. Repeat all the examples