This is the first in set of 3 parts of NumPy tutorials.
|PART 1:||1. What is a NumPy array?|
|2. How to create and inspect NumPy arrays?|
|PART 2:||3. Array indexing|
|4. Array manipulations and Operations|
|PART 3:||5. What is broadcasting?|
|6. Speed test: Lists Vs NumPy array|
Here we start very basic. We start learning from what is a NumPy array. By the end of this part, we get an idea about:
- What are the different ways of creating a NumPy array?
- How to inspect an array object?
- Why do we inspect an array?
What is a NumPy array?
The most basic object in NumPy is an ndarray or simply an array. “ndarray” means n dimensional array. It is a homogeneous array which means all the elements of the array have same data type. Typically, data type will be numeric in nature (float or integer).
Before learning how to create a NumPy array, we will see how an array looks like? The most common arrays are one dimensional (1-D) or two dimensional (2-D). One dimensional (1-D) arrays are nothing but vectors. Two or more dimensional arrays in the context of linear algebra are called matrices.
A typical array looks like this. In NumPy terminology, for 2-D arrays:
- axis = 0 refers to the rows
- axis = 1 refers to the columns
Let’s start by importing NumPy into our Jupyter notebooks.
import numpy as np
np is just an alias. One can use any other alias but np is quite standard alias.
Creating NumPy array:
One can create NumPy arrays in multiple ways. A NumPy array can be created from
- Other python data structures like lists, tuples
- Using built in functions
- Simply by giving the values.
There are several built in functions available to create NumPy arrays to make things easy for us. Let us have a look at them one by one.
Creating NumPy arrays from Lists and Tuples:
The most frequently used syntax for creating an array is np.array.
In the above example, we simply converted python list or tuple to a NumPy array object and it is a one dimensional (1-D) array. Similarly, to create a 2-D array from list or tuple, we should have a list of lists or tuple of tuples or tuple of lists.
Only then NumPy understands that we need a 2-D array. We can see that here. Can we give list of tuples as an input? Please do check that.
Or the syntax might simply be like this for (1-D) and (2-D) arrays.
Creating NumPy arrays using built-in functions: Ex: np.ones() and np.zeros()
There are many built in functions available for creating NumPy arrays. The most common built-in functions used to initialize are listed here. These functions can only be used when you know the size of the array.
Try the above syntaxes and see what each syntax results in. In the initialization of ’a’, the tuple (5,3) as a parameter suggests that we want to generate a matrix of all ones with 5 rows and 3 columns.
By default, an array will have a float data type and while initializing ’b’, we explicitly mentioned the data type ‘int’ as parameter. np.zeros() used to initialize ’c’ is also a similar function which gives matrix of all zeros.
Creating NumPy arrays using built-in functions: np.arange()
np.arange() is similar to python built-in range() function. Let us create a NumPy array using arange(). arange(start,stop,step) typically takes 3 parameters similar to range(). Third parameter step is optional, if we do not mention any step size, NumPy takes step = 1 as step size. But in the present query we specified step = 5. So arange() generated numbers from 10 to 100 with a difference of 5.
Along with arange(), an additional function reshape() is used here. As the name suggests, reshape() helps in changing the dimensions of the existing array. reshape() is an array manipulation function.
Note 1: Observe carefully, it created an array with values between 10 and 95 as python takes values from 0 to n-1 elements in general.
Note 2: The reshape() can only convert in dimensions that when multiplied result in the total no of elements. In the example above, numberscontained 18 elements hence can be reshaped to a 3X6 matrix whose product is 18. Can we reshape the ‘numbers’ in any other dimensions?
Creating NumPy arrays using built-in functions: np.linspace()
The linspace() function returns numbers evenly spaced over a specified intervals. Say we want 15 evenly spaced points from 1 to 3, we can easily use:
This gives us a one dimensional vector. Unlike the arange() function which takes the third argument as the number of steps, linspace() takes the third argument as the number of data points to be created.
Creating identity matrices using built-in functions: np.eye():
In linear algebra, identity matrices and its properties are widely used. It’s a square matrix with diagonal elements equal to one and rest all are zeros. Identity matrix usually takes a single argument. Here’s how we create one.
Identity matrix is religiously used in linear algebra. Whenever you try to matrix multiply two dimensionally incompatible arrays, a simple thing to do would be first to transform matrices using identity matrix (np.eye()).
Creating random number arrays using built-in functions:
Random number generator is a separate package in NumPy. We have to call out np.random before asking for a particular type of random number to be generated. The syntax for different types of random numbers would be np.random.rand() or np.random.randn() or np.random.randint(). Each syntax calls for random numbers from a particular distribution. For example, when we use randn(), it calls for normally distributed random numbers with mean = 0 and standard deviation (std) = 1. randint() that we used in the syntax above is distributing numbers 1 to 9 uniformly in the matrix.
Apart from the methods mentioned above, there are a few more NumPy functions that you can use to create special NumPy arrays:
- np.full(): Create a constant array of any number ‘n’
- np.tile(): Create a new array by repeating an existing array for a particular number of times
Inspecting the structure and content of arrays:
Typically, any real time problem that we need to apply ML algorithms will have thousands to lakhs of rows and hundreds of columns. So, it’s helpful to inspect the structure of arrays. We cannot make any sense of the data merely by printing the data and it’s time consuming too. There are few built-in functions to quickly inspect the arrays. Let’s say you are working with a moderately large array of size 1000 x 300.
Here, we cannot make sense of data merely by displaying a 1000 x 300 random numbers. Using simple functions like
- shape: Gives an idea of how many (rows, columns) are there in a given array.
- dtype: To get the data type of the array. (Remember, we discussed that an array will have same data type for all its elements.)
- ndim: To get the dimensionality of the array.
- itemsize: To get the size of the array in ‘kB’.
These functions are the basis for inspection functions we use in Pandas library. We soon get into Pandas library.
While pre-processing data in data science projects, it becomes part of the process to inspect data every time we make data transformations. Let me elaborate it. We generally get unclean data which means some column values are missing or have duplicate values. Every time we delete a duplicate row or fill an empty column, we inspect the array object as part of pre-processing.
Summary: In this part, we learnt 5 to 6 ways of creating an array object and how to inspect them. This extensive learning of array creation will help you make your work simpler and faster while working.
Next, Part-2 of the series talks about “Array Indexing”, “Array Manipulations and Operations”.