# NumPy for Beginners

In this tutorial, you will learn the basics and various functions of **NumPy**. A basic understanding of Python or any of the programming languages is recommended.

**NumPy** is a Python package. It stands for **Numerical Python**. It is a library consisting of multidimensional array objects and a collection of routines for processing of array. These data structures are efficient in performing large size of arrays.

To use any library in the notebook, you can simply use **import** keyword. Same to import **NumPy** library. Here is how we import NumPy in Python:

`import numpy as np`

**Creating Ndarray Object**

First, let’s take a look at some examples of how to create arrays in NumPy. In the example below, we create an one-dimensional NumPy array **A** that has 3 values and regular python array **B** with same values.

The code block below shows the output of the code above.

`a = [1 2 3]`

b = [1, 2, 3]

A = [[1 2] [3 4]]

B = [[1, 2], [3, 4]]

Data type of A: <class 'numpy.ndarray'>

Data type of B: <class 'list'>

As you see, it is similar to how we create array in python.

To create a multi-dimensional array, use the same format as python array with **np.array()** keyword in front.

NumPy use an N-dimensional array type called ndarray. Each element in ndarray is an object of data-type object called dtype (data-type).

NumPy supports a much greater variety of numerical types than Python does. NumPy numerical types are instances of dtype objects, each having unique characteristics. The dtypes are available as np.int, np.float, np.bool_, etc.

**Converting List into NumPy Array**

To convert Python list into NumPy array, use **np.assary()**. The input can be lists, lists of tuples, tuples or tuples of tuples, etc.

`<class 'list'> : [[1, 2], [3, 4]]`

<class 'numpy.ndarray'> : [[1 2] [3 4]]

**NumPy Data Types**

In Python, mixed data types can be used without needing to specify. However, in **NumPy **you have to declare the specific data type you want to create in the list or it’ll automatically convert all the data types in the array to be the same data type as the first value.

As you see in the example below, **A** gives you all string values while **B** gives you the output of a list that contains string, integer(int) and float values.

`A = ['hello' '12' '4.0']`

B = ['hello', 12, 4.0]

Let’s take a look at how you can declare data type object in NumPy. Remember that the dtypes are available as np.int, np.float, np.str etc.

Here is the example of how we create a data type object for integer and float.

`int64`

float64

A = [12 5 9]

B = [12. 5. 9.]

Dtype allows you to define a structured data type that is applied to ndarray object. If you want to know more about structured dtypes visit the tutorial website here: https://www.tutorialspoint.com/numpy/numpy_data_types.htm .

**Ndarray Attributes**

We have learned about dtypes. Now, let’s look at the various array attributes of NumPy.

**ndarray.shape** returns a tuple consisting of array dimensions. As we see in the figure below, we have 1D, 2D and 3D arrays. As you see, for 1D array the shape is (4,). We call this type of array **rank-one** array; we’ll see more about this later in the tutorial. For 2D array, the shape is (2, 3) meaning that the array has **2 rows and 3 columns**. And, for 3D array, the shape is (4, 3, 2) so that it has **4 rows, 3 columns and 2 layers of depth** (2 layers of 4 rows and 3 columns).

In the example below, **A.shape** returns **(2, 3)** as** A** is a 2D array with 2 rows and 3 columns like in the figure above. Another cool feature of **ndarray.shape** is that it can be used to resize the array. In the example, we changed the dimensions of **A** from **(2, 3)** to **(3, 2)**. When you print **A** now you’ll see that **A** has **3 rows and 2 columns with the same values**.

`A = [[1 2 3] [4 5 6]]`

The dimension of A is (2, 3)

After reshaping, the shape of A is (3, 2)

A = [[1 2] [3 4] [5 6]]

After reshaping into 6x1, the shape of A is (6, 1)

A = [[1] [2] [3] [4] [5] [6]]

We can also use **ndarray.reshape()** to resize the dimension of the array. When reshaping the array, we need to be careful of converting dimensions. For example, we can reshape the array of **3x2** into **6x1** or **2x3** but **not 4x2** or other dimensions. The reason is that in **3x2** array, there are **total of 6 elements** and when we convert to either **6x1** or **2x3** the number of elements remain the same. **(3 * 2 = 6, 6 * 1 = 6, and 2 * 3 = 6)** In summary, we only can reshape ndarrays into the dimensions that give the same number of elements.

See some examples below.

`Before reshaping, the shape of B is (4, 3)`

After reshaping into 12x1, the shape of B is (12, 1)

After reshaping into 3x4, the shape of B is (3, 4)

After reshaping into 6x2, the shape of B is (6, 2)

**Creating an Empty Ndarray**

Unlike regular array, we cannot use **append()** to add to the Ndarray. One way we can do it to create an empty array. An empty array is an uninitialized array of specified shape and dtype. The following code shows how to create an empty array.

`[[ 27663120 140561394696192]`

[ 0 0]

[ 0 0]]

Note that when you print the empty array, the element in the array show random values since they are not initialized. Another way we can initialize an array is by using **np.zeros** which creates a new array of specified size filled with zeros or ones. See the next section for details.

**Ndarray of All Zeros and Ones**

Next, we will learn how to create a new array of specified size, filled with zeros and ones. **np.zeros()** and **np.ones()** returns arrays with all zeros and all ones.

See the example below how we can create an array of five zeros and ones.

`a1 = [0. 0. 0. 0. 0.]`

a2 = [0 0 0 0 0]

a3 = [1. 1. 1. 1. 1.]

The arrays **a1**, **a2** and **a3** you see in example above are called **rank-one arrays**; they are 1D array and as I mentioned before 1D array has shape of (n,) — n is any number.

See examples below the difference between **np.zeros(3)** vs **np.zeros(3,)** vs **np.zeros(3,1)**.

a1 = [0. 0. 0.]

shape of a1 is (3,)a2 = [0. 0. 0.]

shape of a2 is (3,)b1 = [[0. 0. 0.]]

shape of b1 is (1, 3)b2 = [[0.]

[0.]

[0.]]

shape of B is (3, 1)

**a1** and **a2** will give you the same **one-dimension** array of **3 zeros**. As you see the shape of **a1** and **a2** is **(3,)**, meaning it has 3 elements and it only has one dimension.

**b1** and **b2** meanwhile return the shape of **(1, 3)** and **(3,1)**.

So, what is the difference between **a1** and **b1**?

To answer this, let’s take a look at some definitions of matrices. A matrix with only one row is called a row vector, and a matrix with one column is called a column vector, but there is no distinction between rows and columns in a one-dimensional array of ndarray.

Only a two-dimensional array is used to clearly indicate that rows or columns are present. Since array **b1** and **b2** are **two-dimensional(2D)** arrays, we can say that **b1** is a row vector and **b2** is a column vector, but **not a1 and a2**.

If it is confusing for you, don’t worry about it. Just remember these two: ndarray.zeros((n,1)) (n is any number) for row vector and ndarray.zeros((1,n)) for column vector and you can forget about rank-one arrays.

A = [[0. 0.]

[0. 0.]

[0. 0.]]B = [[1. 1.]

[1. 1.]]

The example above shows how to create a multi-dimensional array of zeros and ones.

**Indexing and Slicing**

Ndarray object can be accessed and modified by indexing or slicing, just like Python’s built-in container objects. See the examples below to see how we can slice the one-dimensional and multi-dimensional ndarrays.

Note that similar to Python’s array, we can use negative index to select from the last element in the array.

A = [0 1 2 3 4 5 6 7 8 9]

A[0] = 0 , A[1] = 1last element in A: 9

last 3 elements in A: [7 8 9]A[2:] = [2 3 4 5 6 7 8 9]

A[2:5] = [2 3 4]

See the examples below for slicing 2D array in row-wise.

A =

[[1 2 3]

[4 5 6]

[7 8 9]]A[1:] =

[[4 5 6]

[7 8 9]]A[1:,:] =

[[4 5 6]

[7 8 9]]A[:2] =

[[1 2 3]

[4 5 6]]A[:2,:] =

[[1 2 3]

[4 5 6]]A[1:2] = [[4 5 6]]A[1:2,:] = [[4 5 6]]

Now, see the follwoing examples on how to slice 2D array in column-wise.

A =

[[1 2 3]

[4 5 6]

[7 8 9]]A[:,0] = [1 4 7]A[:,1:] =

[[2 3]

[5 6]

[8 9]]a_corner =

[[1 2]

[4 5]]

**Creating an Array Copy**

The idea of using **copy()** is to clone Numpy array. In Numpy using assignment such that **a = b** will not work. We’ll see why it won’t work in the following example.

It seems alright when we print **C**. It has the same elements as in **A**. If you know C++, you know how reference variable works, and same thing happened here when we assign **C = A**.

Now, let’s change some values in **C** to zeros.

A = [[1 2 3]

[4 5 6]]C = [[1 2 3]

[4 5 6]]After making changes to C

C = [[0 2 3]

[4 5 0]]A = [[0 2 3]

[4 5 0]]

As we see in the example above, the values of **A** have changed!

This is because **C** is acting as a reference to **A**. **C** doesn’t have its own array values on the memory, but it points to where **A** is so that when we change the values of **C**, the values of **A** also changed.

To solve this, we can use **copy()**. This actually create an array **C** on memory and copy the whole array **A** into **C**.

In the example below, we see that changing the values of **B** doesn’t change the values of **A**.

A = [[1 2 3]

[4 5 6]]B = [[1 2 3]

[4 5 6]]After making changes to B

B = [[0 2 3]

[4 5 0]]A = [[1 2 3]

[4 5 6]]

**Iterating Over Array**

Unlike Python array, NumPy has an iterator object **numpy.nditer** that is an efficient multidimensional iterator object to iterate over an array.

# for Python array and using for loop

A = [[1, 2, 3], [4, 5, 6]]

shape of A is ( 2 , 3 )

1

2

3

4

5

6

Time taken to loop all the elements: 0.759124755859375 ms# for NumPy array and using nditer

A = [[1 2 3]

[4 5 6]]

1

2

3

4

5

6

Time taken to loop all the elements: 0.2589225769042969 ms

As we see in the examples above, when we iterate using for loop in Python array, it takes more time to run the same number of elements in the array than using NumPy iterator object. The time difference may be only about 1 ms now but when we are dealing with very large array size, the time difference become large.

We will see more examples of how we can avoid using for loop and use vectorization (in NumPy) to speed up our code.

**Ndarray Operations and Vectorization**

Here we will see how **vectorization **works. This is the most exciting part of the tutorial since it shows why we prefer using **NumPy **arrays over Python arrays.

In **NumPy**, arithmetic operations such as addition, subtraction, multiplication on arrays are usually done on corresponding elements. Take a look at the following example. Usually, we need to loop through the number of element in arrays to perform multiplication one element by one. However, **NumPy **arrays can speed up this by using **vectorization**. This helps the arrays perform the operations on corresponding elements **simultaneously**.

`a = [1 2 3 4]`

b = [10 20 30 40]

c = [ 10 40 90 160]

Note that **NumPy **arrays allows you to perform the operations on arrays in *one line of code without needing to use for loop*. But, some of you may argue that why we need to use **NumPy **and why not use **for loop** instead.

We saw that the difference in computation time between using for loop and NumPy’s iterator object. Same reason applies here; when we perform operations on each element in for loop it is much slower. **Vectorization **allows you to *perform the operations on all the corresponding elements at the same time* hence it saves a lot of time.

Say we want to compute addition of arrays **a** and **b** and save the result in **c**. Let’s take a look at how we normally do the operation using **for loop vs. how we can speed up the code with vectorization**.

We can declare an array with random values in NumPy with one line of code by using **np.randn()**. And, let’s create 2 arrays (**a** and **b**) of size **1000000** initialized with some random values, and an array **c** of same size with all zeros.

`Time taken with for loop: 539.0114784240723 ms`

Time taken with vectorization: 3.8535594940185547 ms

Notice that the time taken using for loop is more than **100 times** slower than that of using vectorized codes. And, as we see that **NumPy **simplifies the code so that it requires only one line of code. But, of course *the dimensions have to match* to do any math operations.

We can use **NumPy’s Math functions** such as **np.multiply()**, **np.dot()**, **np.sin()**, etc.

`sin(a) = [0.04988483 0.80621872 0.56432645 0.0483301 0.67950184 0.60500273`

0.2333741 0.8102268 0.12549575 0.41646099] ...

Instead of using for loop to apply **sin()** function on each element in the array, applying **vectorization **allows us to apply **sin()** function on the whole array thus it speeds up the operation.

**Broadcasting**

**Broadcasting **is an interesting technique in **NumPy**. The term **broadcasting **refers to *the ability of **NumPy **to treat arrays of different shapes during arithmetic operations*. You saw how vectorization works in previous examples. If we have two arrays that are of the same shape, then we can perform operations without needing for loop.

But, if the dimensions of two arrays are different, element-to-element operations are not possible. This is how broadcasting comes in handy. Operations on arrays of different shapes are still possible in NumPy, because of the broadcasting capability.

`a = [1 2 3 4]`

b = [10]

c = [10 20 30 40]

As we see in the example above, **a** and **b** have different shapes but **b**, which has smaller dimension than **a**, is broadcast into the same shape as **a**, so that the operation can continue as usual.

Let’s look at another example with multi-dimensional arrays.

`A = [[ 0 0 0]`

[10 10 10]

[20 20 20]

[30 30 30]]

B = [0 1 2]

C = [[ 0 1 2]

[10 11 12]

[20 21 22]

[30 31 32]]

The smaller array is broadcast to the size of the larger array so that they have compatible shapes. The diagram below explains how broadcasting works.

Feel free to make a copy of the notebook provided below and try out your own code to practice. Notice that the code in the notebook is slightly different since the notebook is intended for use of NumPy in Computer Vision tutorials.

If you want to learn more about **NumPy**, this is the official website: https://www.tutorialspoint.com/numpy/index.htm .

Thank you for reading and I hope you get some knowledge on NumPy from this tutorial.

If you see any mistakes or you have any questions, feel free to comment below and I’ll try to answer as soon as possible.