NumPy for Beginners

Mo
11 min readSep 10, 2020

In this tutorial, you will learn the basics and various functions of NumPy. A basic understanding of Python or any of the programming languages is recommended.

NumPy is a Python package. It stands for Numerical Python. It is a library consisting of multidimensional array objects and a collection of routines for processing of array. These data structures are efficient in performing large size of arrays.

To use any library in the notebook, you can simply use import keyword. Same to import NumPy library. Here is how we import NumPy in Python:

import numpy as np

Creating Ndarray Object

First, let’s take a look at some examples of how to create arrays in NumPy. In the example below, we create an one-dimensional NumPy array A that has 3 values and regular python array B with same values.

The code block below shows the output of the code above.

a =  [1 2 3]
b = [1, 2, 3]
A = [[1 2] [3 4]]
B = [[1, 2], [3, 4]]
Data type of A: <class 'numpy.ndarray'>
Data type of B: <class 'list'>

As you see, it is similar to how we create array in python.

To create a multi-dimensional array, use the same format as python array with np.array() keyword in front.

NumPy use an N-dimensional array type called ndarray. Each element in ndarray is an object of data-type object called dtype (data-type).

NumPy supports a much greater variety of numerical types than Python does. NumPy numerical types are instances of dtype objects, each having unique characteristics. The dtypes are available as np.int, np.float, np.bool_, etc.

Converting List into NumPy Array

To convert Python list into NumPy array, use np.assary(). The input can be lists, lists of tuples, tuples or tuples of tuples, etc.

<class 'list'> :  [[1, 2], [3, 4]]
<class 'numpy.ndarray'> : [[1 2] [3 4]]

NumPy Data Types

In Python, mixed data types can be used without needing to specify. However, in NumPy you have to declare the specific data type you want to create in the list or it’ll automatically convert all the data types in the array to be the same data type as the first value.

As you see in the example below, A gives you all string values while B gives you the output of a list that contains string, integer(int) and float values.

A =  ['hello' '12' '4.0']
B = ['hello', 12, 4.0]

Let’s take a look at how you can declare data type object in NumPy. Remember that the dtypes are available as np.int, np.float, np.str etc.

Here is the example of how we create a data type object for integer and float.

int64
float64
A = [12 5 9]
B = [12. 5. 9.]

Dtype allows you to define a structured data type that is applied to ndarray object. If you want to know more about structured dtypes visit the tutorial website here: https://www.tutorialspoint.com/numpy/numpy_data_types.htm .

Ndarray Attributes

We have learned about dtypes. Now, let’s look at the various array attributes of NumPy.

ndarray.shape returns a tuple consisting of array dimensions. As we see in the figure below, we have 1D, 2D and 3D arrays. As you see, for 1D array the shape is (4,). We call this type of array rank-one array; we’ll see more about this later in the tutorial. For 2D array, the shape is (2, 3) meaning that the array has 2 rows and 3 columns. And, for 3D array, the shape is (4, 3, 2) so that it has 4 rows, 3 columns and 2 layers of depth (2 layers of 4 rows and 3 columns).

In the example below, A.shape returns (2, 3) as A is a 2D array with 2 rows and 3 columns like in the figure above. Another cool feature of ndarray.shape is that it can be used to resize the array. In the example, we changed the dimensions of A from (2, 3) to (3, 2). When you print A now you’ll see that A has 3 rows and 2 columns with the same values.

A =  [[1 2 3]  [4 5 6]]
The dimension of A is (2, 3)
After reshaping, the shape of A is (3, 2)
A = [[1 2] [3 4] [5 6]]
After reshaping into 6x1, the shape of A is (6, 1)
A = [[1] [2] [3] [4] [5] [6]]

We can also use ndarray.reshape() to resize the dimension of the array. When reshaping the array, we need to be careful of converting dimensions. For example, we can reshape the array of 3x2 into 6x1 or 2x3 but not 4x2 or other dimensions. The reason is that in 3x2 array, there are total of 6 elements and when we convert to either 6x1 or 2x3 the number of elements remain the same. (3 * 2 = 6, 6 * 1 = 6, and 2 * 3 = 6) In summary, we only can reshape ndarrays into the dimensions that give the same number of elements.

See some examples below.

Before reshaping, the shape of B is  (4, 3)
After reshaping into 12x1, the shape of B is (12, 1)
After reshaping into 3x4, the shape of B is (3, 4)
After reshaping into 6x2, the shape of B is (6, 2)

Creating an Empty Ndarray

Unlike regular array, we cannot use append() to add to the Ndarray. One way we can do it to create an empty array. An empty array is an uninitialized array of specified shape and dtype. The following code shows how to create an empty array.

[[       27663120 140561394696192]
[ 0 0]
[ 0 0]]

Note that when you print the empty array, the element in the array show random values since they are not initialized. Another way we can initialize an array is by using np.zeros which creates a new array of specified size filled with zeros or ones. See the next section for details.

Ndarray of All Zeros and Ones

Next, we will learn how to create a new array of specified size, filled with zeros and ones. np.zeros() and np.ones() returns arrays with all zeros and all ones.

See the example below how we can create an array of five zeros and ones.

a1 =  [0. 0. 0. 0. 0.]
a2 = [0 0 0 0 0]
a3 = [1. 1. 1. 1. 1.]

The arrays a1, a2 and a3 you see in example above are called rank-one arrays; they are 1D array and as I mentioned before 1D array has shape of (n,) — n is any number.

See examples below the difference between np.zeros(3) vs np.zeros(3,) vs np.zeros(3,1).

a1 =  [0. 0. 0.]
shape of a1 is (3,)
a2 = [0. 0. 0.]
shape of a2 is (3,)
b1 = [[0. 0. 0.]]
shape of b1 is (1, 3)
b2 = [[0.]
[0.]
[0.]]
shape of B is (3, 1)

a1 and a2 will give you the same one-dimension array of 3 zeros. As you see the shape of a1 and a2 is (3,), meaning it has 3 elements and it only has one dimension.

b1 and b2 meanwhile return the shape of (1, 3) and (3,1).

So, what is the difference between a1 and b1?

To answer this, let’s take a look at some definitions of matrices. A matrix with only one row is called a row vector, and a matrix with one column is called a column vector, but there is no distinction between rows and columns in a one-dimensional array of ndarray.

Only a two-dimensional array is used to clearly indicate that rows or columns are present. Since array b1 and b2 are two-dimensional(2D) arrays, we can say that b1 is a row vector and b2 is a column vector, but not a1 and a2.

If it is confusing for you, don’t worry about it. Just remember these two: ndarray.zeros((n,1)) (n is any number) for row vector and ndarray.zeros((1,n)) for column vector and you can forget about rank-one arrays.

A =  [[0. 0.]
[0. 0.]
[0. 0.]]
B = [[1. 1.]
[1. 1.]]

The example above shows how to create a multi-dimensional array of zeros and ones.

Indexing and Slicing

Ndarray object can be accessed and modified by indexing or slicing, just like Python’s built-in container objects. See the examples below to see how we can slice the one-dimensional and multi-dimensional ndarrays.

Note that similar to Python’s array, we can use negative index to select from the last element in the array.

A =  [0 1 2 3 4 5 6 7 8 9]
A[0] = 0 , A[1] = 1
last element in A: 9
last 3 elements in A: [7 8 9]
A[2:] = [2 3 4 5 6 7 8 9]
A[2:5] = [2 3 4]

See the examples below for slicing 2D array in row-wise.

A = 
[[1 2 3]
[4 5 6]
[7 8 9]]
A[1:] =
[[4 5 6]
[7 8 9]]
A[1:,:] =
[[4 5 6]
[7 8 9]]
A[:2] =
[[1 2 3]
[4 5 6]]
A[:2,:] =
[[1 2 3]
[4 5 6]]
A[1:2] = [[4 5 6]]A[1:2,:] = [[4 5 6]]

Now, see the follwoing examples on how to slice 2D array in column-wise.

A = 
[[1 2 3]
[4 5 6]
[7 8 9]]
A[:,0] = [1 4 7]A[:,1:] =
[[2 3]
[5 6]
[8 9]]
a_corner =
[[1 2]
[4 5]]

Creating an Array Copy

The idea of using copy() is to clone Numpy array. In Numpy using assignment such that a = b will not work. We’ll see why it won’t work in the following example.

It seems alright when we print C. It has the same elements as in A. If you know C++, you know how reference variable works, and same thing happened here when we assign C = A.

Now, let’s change some values in C to zeros.

A =  [[1 2 3]
[4 5 6]]
C = [[1 2 3]
[4 5 6]]
After making changes to C
C = [[0 2 3]
[4 5 0]]
A = [[0 2 3]
[4 5 0]]

As we see in the example above, the values of A have changed!

This is because C is acting as a reference to A. C doesn’t have its own array values on the memory, but it points to where A is so that when we change the values of C, the values of A also changed.

To solve this, we can use copy(). This actually create an array C on memory and copy the whole array A into C.

In the example below, we see that changing the values of B doesn’t change the values of A.

A =  [[1 2 3]
[4 5 6]]
B = [[1 2 3]
[4 5 6]]
After making changes to B
B = [[0 2 3]
[4 5 0]]
A = [[1 2 3]
[4 5 6]]

Iterating Over Array

Unlike Python array, NumPy has an iterator object numpy.nditer that is an efficient multidimensional iterator object to iterate over an array.

# for Python array and using for loop
A = [[1, 2, 3], [4, 5, 6]]
shape of A is ( 2 , 3 )
1
2
3
4
5
6
Time taken to loop all the elements: 0.759124755859375 ms
# for NumPy array and using nditer
A = [[1 2 3]
[4 5 6]]
1
2
3
4
5
6
Time taken to loop all the elements: 0.2589225769042969 ms

As we see in the examples above, when we iterate using for loop in Python array, it takes more time to run the same number of elements in the array than using NumPy iterator object. The time difference may be only about 1 ms now but when we are dealing with very large array size, the time difference become large.

We will see more examples of how we can avoid using for loop and use vectorization (in NumPy) to speed up our code.

Ndarray Operations and Vectorization

Here we will see how vectorization works. This is the most exciting part of the tutorial since it shows why we prefer using NumPy arrays over Python arrays.

In NumPy, arithmetic operations such as addition, subtraction, multiplication on arrays are usually done on corresponding elements. Take a look at the following example. Usually, we need to loop through the number of element in arrays to perform multiplication one element by one. However, NumPy arrays can speed up this by using vectorization. This helps the arrays perform the operations on corresponding elements simultaneously.

a =  [1 2 3 4]
b = [10 20 30 40]
c = [ 10 40 90 160]

Note that NumPy arrays allows you to perform the operations on arrays in one line of code without needing to use for loop. But, some of you may argue that why we need to use NumPy and why not use for loop instead.

We saw that the difference in computation time between using for loop and NumPy’s iterator object. Same reason applies here; when we perform operations on each element in for loop it is much slower. Vectorization allows you to perform the operations on all the corresponding elements at the same time hence it saves a lot of time.

Say we want to compute addition of arrays a and b and save the result in c. Let’s take a look at how we normally do the operation using for loop vs. how we can speed up the code with vectorization.

We can declare an array with random values in NumPy with one line of code by using np.randn(). And, let’s create 2 arrays (a and b) of size 1000000 initialized with some random values, and an array c of same size with all zeros.

Time taken with for loop: 539.0114784240723 ms
Time taken with vectorization: 3.8535594940185547 ms

Notice that the time taken using for loop is more than 100 times slower than that of using vectorized codes. And, as we see that NumPy simplifies the code so that it requires only one line of code. But, of course the dimensions have to match to do any math operations.

We can use NumPy’s Math functions such as np.multiply(), np.dot(), np.sin(), etc.

sin(a) =  [0.04988483 0.80621872 0.56432645 0.0483301  0.67950184 0.60500273
0.2333741 0.8102268 0.12549575 0.41646099] ...

Instead of using for loop to apply sin() function on each element in the array, applying vectorization allows us to apply sin() function on the whole array thus it speeds up the operation.

Broadcasting

Broadcasting is an interesting technique in NumPy. The term broadcasting refers to the ability of NumPy to treat arrays of different shapes during arithmetic operations. You saw how vectorization works in previous examples. If we have two arrays that are of the same shape, then we can perform operations without needing for loop.

But, if the dimensions of two arrays are different, element-to-element operations are not possible. This is how broadcasting comes in handy. Operations on arrays of different shapes are still possible in NumPy, because of the broadcasting capability.

a =  [1 2 3 4]
b = [10]
c = [10 20 30 40]

As we see in the example above, a and b have different shapes but b, which has smaller dimension than a, is broadcast into the same shape as a, so that the operation can continue as usual.

Let’s look at another example with multi-dimensional arrays.

A =  [[ 0  0  0]
[10 10 10]
[20 20 20]
[30 30 30]]
B = [0 1 2]
C = [[ 0 1 2]
[10 11 12]
[20 21 22]
[30 31 32]]

The smaller array is broadcast to the size of the larger array so that they have compatible shapes. The diagram below explains how broadcasting works.

Feel free to make a copy of the notebook provided below and try out your own code to practice. Notice that the code in the notebook is slightly different since the notebook is intended for use of NumPy in Computer Vision tutorials.

If you want to learn more about NumPy, this is the official website: https://www.tutorialspoint.com/numpy/index.htm .

Thank you for reading and I hope you get some knowledge on NumPy from this tutorial.

If you see any mistakes or you have any questions, feel free to comment below and I’ll try to answer as soon as possible.

--

--

Mo

MS. Computer Science. Love to code and share knowledge with others.