This is the second in set of 3 parts of NumPy tutorials.
|PART 1:||1. What is a NumPy array?|
|2. How to create and inspect NumPy arrays?|
|PART 2:||3. Array indexing/Slicing|
|4. Array Operations|
|PART 3:||5. What is broadcasting?|
|6. Speed test: Lists Vs NumPy array|
In this part (part – 2), Let us see how we can
- Take parts of arrays out.
- How to manipulate an array?
- What kind of logical, arithmetic and mathematical operations one can perform?
Let us cut the chase and start working!
Array slicing is similar to other data structures in Python. We simply pass the index we want and get an element or group of elements out. Similar to regular python, elements of an array are indexed as (0, n-1).
1-D Array Slicing:
One dimensional (1-D) array slicing is same as in python lists. So try these syntaxes to get some practice. ‘ : ’ is used to get a range of values just like in lists. Example: Suppose we have ‘2:5’, NumPy interprets this as a request to pull out elements from 2nd to (5 -1 = 4)th elements.
Observe the notion [2,:] used in the second cell in the picture. It’s asking to print every element from and beyond third element. Try guessing the results for the remaining cells. Hints are already given in the cells itself
2-D Array Slicing:
Multidimensional arrays are indexed using as many indices as the number of dimensions or axes. For instance, to index a 2-D array, you need two indices – array[x, y]. In [x, y], x is for rows and y is for columns. Each axis has an index starting at 0. The following figure shows the axes and their indices for a 2-D array.
Let us create a 2-D array and do some slicing. First cell in the picture creates a 2-D array. Now say, we want a specific element or part of the given array. It’s done using closed brackets and a comma, ‘[ , ]’ . Comma (’ , ’) separates row slicing from column slicing.
Now start guessing the results for the given syntaxes and compare them with the results. ‘ : ’ without mentioning the range is used to retrieve all the elements of a particular row or column.
Guess the outputs for the following:
Operations on NumPy Arrays:
In NumPy arrays, we can perform almost all mathematical and logical operations one can perform on data structures like lists and tuples in python.
On top of that, we can extensively perform linear algebra and trigonometry calculations on array objects. In fact, the purpose of NumPy is to provide scientific computing ability to python.
The learning objectives of this part of the article is broadly classified as
- Manipulating arrays
- Mathematical and Logical operations on arrays
Reshaping arrays: Function np.reshape() is already discussed when creating an ‘arrange()’ array. But it is more appropriate to discuss here. So let us see ‘reshape()’ at length here. By definition, reshape() is used to transform an array from one particular dimension to another dimension.
The only limitation is the product of dimensions in the given array should be equal to the product of dimensions in transformed array. For example if we have a ‘(5,4)’ array, we can transform that into a new array of these 4 dimensions only ‘(2,10)’,’(10,2)’,’(1,20)’,’(20,1)’ because 5*4 equals 2*10 and so on.
Let us see a few syntaxes to understand.
In this example, an array of [0, 11] is taken and reshaped in 3 different ways. You can find last one really interesting. If we know only the number of rows we want, we can simply give reshape(4,-1), NumPy understands we want 4 rows and it automatically calculates the columns using the product rule we discussed earlier. Try the syntax in third cell replacing rows with columns.
Think what will be the easier way to reshape columns to rows and vice –versa? Connect dots with matrices.
Stacking arrays: Stacking is done using the np.hstack() and np.vstack() methods. For horizontal stacking, the number of rows should be the same, while for vertical stacking; the number of columns should be the same.
Try these. While vstack() places array_2 below array_1, hstack() places the array in the second argument below the other. Note: Arrays should be passed as list or tuple of arrays.
Logical operations on arrays: We can also perform conditional and logical selections on arrays using &(AND), | (OR), <, > and == operators to compare the values in the array with the given value.
In the first cell, we have taken values ranging from 5 to 14 in array_logical. When we asked whether array_logical > 10 , result is a boolean array where it compared each element is greater than or equal to 10. Try the syntaxes in the second cell. But before that, guess what will be the result to check your understanding.
Mathematical operations on arrays:
Basic Arithmetic Operations: There is no introduction needed to arithmetic operations. They are simple addition (+), subtraction (-), multiplication (*) and division (/). When two arrays have same size, they are just similar to regular python. Try the following.
Note: Do not mistake these with matrix operations. These are element-wise operations like we do on lists or tuples.
Linear Algebraic Operations: NumPy provides the np.linalg package to apply common linear algebra operations, such as:
- np.linalg.inv: Inverse of a matrix
- np.linalg.det: Determinant of a matrix
- np.linalg.eig: Eigenvalues and eigenvectors of a matrix
These linear algebra functions are largely the basis for most machine learning algorithms. Access to these kinds of functions is the reason why packages like sci-kit learn are built on top of NumPy. ‘np.dot(a,b)’ is used to compute the matrix multiplication.
Let me give you an example, Linear Regression is one of the most talked examples to explain machine learning. It says, given the data of [X,Y] , I can compute relationship between X and Y as ‘A’ of the below equation:
While ML algorithms use statistical techniques to give more precise results, We can simply compute ‘A’ by
Functions like linalg.inv() comes handy here. Not only this, np.linalg.eig() used to compute eigenvalues and vectors is repeatedly used by algorithms to perform principal component analysis (PCA) or in support vector machines (SVM).
Apply User Defined Functions: We can also apply our own functions on arrays. For e.g. applying the function x/(x+1) to each element of an array. One way to do that is by looping through the array, which is the non-numpy way.
We would rather prefer to write vectorised code. The simplest way to do that is to vectorise the function you want, and then apply it on the array. Numpy provides the np.vectorize() method to vectorise functions.
Let’s look at how we do it.
These kind of functions come handy when we are doing any new calculation on the array.
Like any other programing language, NumPy has access to universal functions.
Among these, functions like sum(), std(), count() are repeatedly used while pre-processing data in data science projects.
Summary: In this part, we saw simple ways to
- Slice parts of arrays
- Performing simple to complex mathematical operations using built-in functions.
You are going well. We are almost done.
In the next part (Part-3): We have an interesting topic specific to NumPy. It’s broadcasting. For people who work more on Arrays/Matrices, broadcasting is a gift. As an end touch to it, we will do a speed test for lists Vs NumPy arrays.