Power of Python’s NumPy for Matrix Operations

Introduction to NumPy for Scientific Computing

Shitanshu Pandey
Towards Dev

--

NumPy stands as a cornerstone within the Python ecosystem for scientific computing. Its capacity to perform complex mathematical operations efficiently makes it invaluable for data scientists and researchers alike. This article delves into the fundamental aspects of NumPy, focusing on its potent array functionalities which are pivotal for handling large datasets and performing numerical computations with ease.

Why Choose NumPy?

NumPy introduces the ndarray (n-dimensional array) object, which is a fast, flexible container for large data sets in Python. Arrays in NumPy are grids of values, all of the same type, and are indexed by a tuple of nonnegative integers.

  • The number of dimensions is the rank of the array
  • The shape of an array is a tuple of integers giving the size of the array along each dimension.

Here’s why NumPy arrays outshine traditional Python lists:

  • Performance: Implemented in C and optimized for performance, NumPy arrays operate much faster than Python lists.
  • Functionality: NumPy arrays support vectorized operations, broadcasting, and indexing which makes data manipulation more intuitive and less error-prone.
  • Memory Efficiency: Using less memory to store data, NumPy arrays mitigate the overhead of Python’s built-in high-level data types.

Getting Started with NumPy

Before diving into array creation and manipulation, ensure NumPy is accessible by importing it:

import numpy as np

Creating NumPy Arrays

NumPy offers multiple methods to create arrays tailored to your needs:

1. Using np.array():

# Create a 1-D array
a = np.array([1, 2, 3])
print(a)
# Output: [1 2 3]

2. Using np.arange():

# Create an array with a range of elements
b = np.arange(3)
print(b)
# Output: [0 1 2]

c = np.arange(1, 20, 3)
print(c)
# Output: [1 4 7 10 13 16 19]

3. Using np.linspace():

# Create an array of five evenly spaced numbers from 0 to 100
lin_spaced_arr = np.linspace(0, 100, 5)
print(lin_spaced_arr)
# Output: [ 0. 25. 50. 75. 100.]

# Specifying integer data type
lin_spaced_arr_int = np.linspace(0, 100, 5, dtype=int)
print(lin_spaced_arr_int)
# Output: [ 0 25 50 75 100]

More on Array Creation

  • Using np.ones() and np.zeros():
# Arrays of ones and zeros
ones_arr = np.ones(3)
print(ones_arr)
# Output: [1. 1. 1.]

zeros_arr = np.zeros(3)
print(zeros_arr)
# Output: [0. 0. 0.]
  • Using np.empty() and np.random.rand():
# Uninitialized array
empt_arr = np.empty(3)
print(empt_arr)
# Output: [0. 0. 0.]

# Randomly initialized array
rand_arr = np.random.rand(3)
print(rand_arr)
# Output: [0.15318847 0.72138555 0.89898637]

Multidimensional Arrays

NumPy allows the creation of arrays with multiple dimensions. These are similar to data tables where each dimension corresponds to a different axis; for instance, rows and columns in a 2D array.

Creating a 2D Array

You can directly create a two-dimensional array by passing a nested list to the np.array() function. Each sub-list represents a row in the resulting array:

two_dim_arr = np.array([[1,2,3], [4,5,6]])
print(two_dim_arr)
# Output:
# [[1 2 3]
# [4 5 6]]

Another method to create a multidimensional array is by reshaping an existing one-dimensional array. This is done using the np.reshape() function, which organizes the elements of the original array into a specified new shape:

# 1-D array 
one_dim_arr = np.array([1, 2, 3, 4, 5, 6])

# Multidimensional array using reshape()
multi_dim_arr = np.reshape(
one_dim_arr, # the array to be reshaped
(2,3) # dimensions of the new array
)
# Print the new 2-D array with two rows and three columns
print(multi_dim_arr)
# Output:
# [[1 2 3]
# [4 5 6]]

Understanding Array Attributes

Once you have created your arrays, NumPy provides several attributes to help you understand the structure of the arrays:

# For our previously created multi-dimensional array
# multi_dim_arr = [[1 2 3] [4 5 6]]
  • ndarray.ndim - Stores the number dimensions of the array.
# Dimension of the 2-D array multi_dim_arr
print("Dimensions:", multi_dim_arr.ndim)
# Output: Dimensions: 2
  • ndarray.shape - Stores the shape of the array. This attribute returns a tuple representing the dimensions of the array, where each element in the tuple indicates the size of the corresponding dimension.
# Shape of the 2-D array multi_dim_arr
# Returns shape of 2 rows and 3 columns
print("Shape:", multi_dim_arr.shape)
# Output:Shape: (2, 3)
  • ndarray.size - This attribute returns the total number of elements across all dimensions of the array.
# Size of the array multi_dim_arr
# Returns total number of elements
print("Total elements:", multi_dim_arr.size)
# Output: Total elements: 6

These attributes are crucial for data manipulation and analysis, providing essential information about the structure and size of your arrays.

Array math operations

Recall that the addition of Python lists works completely differently, as it appends the lists, thus creating a longer list. Meanwhile, trying to subtract or multiply Python lists simply results in an error.

However, NumPy allows you to quickly perform element-wise addition, subtraction, multiplication, and division for both 1-D and multidimensional arrays. The operations are performed using the mathematical symbols ‘+’, ‘-’, and ‘*’.

arr_1 = np.array([2, 4, 6])
arr_2 = np.array([1, 3, 5])

# Adding two 1-D arrays
addition = arr_1 + arr_2
print(addition)
# Output: [ 3 7 11]

# Subtracting two 1-D arrays
subtraction = arr_1 - arr_2
print(subtraction)
# Output: [1 1 1]

# Multiplying two 1-D arrays elementwise
multiplication = arr_1 * arr_2
print(multiplication)
# Output: [ 2 12 30]

Multiplying vector with a scalar (broadcasting)

Suppose you need to convert miles to kilometers. You can perform an operation between an array (miles) and a single number (the conversion rate, which is a scalar). Since 1 mile equals 1.6 km, NumPy computes the multiplication for each element in the array.

This concept is known as broadcasting, which enables you to perform operations on arrays of different shapes.

vector = np.array([1, 2])
vector * 1.6
# Output: array([1.6, 3.2])

Indexing and slicing

Indexing is very useful as it allows you to select specific elements from an array. It also lets you select entire rows/columns or planes for multidimensional arrays.

Indexing

# Select the third element of the array. Remember the counting starts from 0.
a = np.array([1, 2, 3, 4, 5])
print(a[2])
# Output: 3

# Select the first element of the array.
print(a[0])
# Output: 1

For multidimensional arrays of shape n, to index a specific element, you must input n indices, one for each dimension. There are two common ways to do this, either by using two sets of brackets, or by using a single bracket and separating each index by a comma.

# Indexing on a 2-D array
two_dim = np.array(([1, 2, 3],
[4, 5, 6],
[7, 8, 9]))

# Select element number 8 from the 2-D array using indices i, j and two sets of brackets
print(two_dim[2][1])
# Output: 8

# Select element number 8 from the 2-D array, this time using i and j indexes in a single
# set of brackets, separated by a comma
print(two_dim[2,1])
# Output: 8

Slicing

Slicing gives you a sublist of elements that you specify from the array. The slice notation specifies a start and end value and copies the list from start up to but not including the end (end-exclusive).

The syntax is: array[start:end:step]

If no value is passed to start, it is assumed start = 0, if no value is passed to the end, it is assumed that end = length of array - 1 and if no value is passed to step, it is assumed step = 1.

Note you can use slice notation with multi-dimensional indexing, as in a[0:2, :5]

# Working with 1-D Array
# a = [1, 2, 3, 4, 5]

# Slice the array a to get the array [2,3,4]
sliced_arr = a[1:4]
print(sliced_arr)
# Output: [2 3 4]

# Slice the array a to get the array [1,2,3]
sliced_arr = a[:3]
print(sliced_arr)
# Output: [1 2 3]

# Slice the array a to get the array [3,4,5]
sliced_arr = a[2:]
print(sliced_arr)
# Output: [3 4 5]

# Slice the array a to get the array [1,3,5]
sliced_arr = a[::2]
print(sliced_arr)
# Output: [1 3 5]
# Working with 2-D Array
# two_dim = [[1 2 3]
# [4 5 6]
# [7 8 9]]

# Slice the two_dim array to get the first two rows
sliced_arr_1 = two_dim[0:2]
sliced_arr_1
# Output: array([[1, 2, 3],
# [4, 5, 6]])

# Similarily, slice the two_dim array to get the last two rows
sliced_two_dim_rows = two_dim[1:3]
print(sliced_two_dim_rows)
# Output: [[4 5 6]
# [7 8 9]]

# This example uses slice notation to get every row, and then pulls the second column.
# Notice how this example combines slice notation with the use of multiple indexes
sliced_two_dim_cols = two_dim[:,1]
print(sliced_two_dim_cols)
# Output: [2 5 8]

Stacking

Stacking is a feature of NumPy that leads to increased customization of arrays. It means to join two or more arrays, either horizontally or vertically, meaning that it is done along a new axis.

# We have:
a1 = np.array([[1,1],
[2,2]])
a2 = np.array([[3,3],
[4,4]])

print(f'a1:\n{a1}')
# Output: a1:
# [[1 1]
# [2 2]]

print(f'a2:\n{a2}')
# Output: a2:
# [[3 3]
# [4 4]]
  • np.vstack() - stacks vertically
# Stack the arrays vertically
vert_stack = np.vstack((a1, a2))
print(vert_stack)
# Output:
# [[1 1]
# [2 2]
# [3 3]
# [4 4]]
  • np.hstack() - stacks horizontally
# Stack the arrays horizontally
horz_stack = np.hstack((a1, a2))
print(horz_stack)
# Output:
# [[1 1 3 3]
# [2 2 4 4]]
  • np.hsplit() - splits an array into several smaller arrays
# We have, horz_stack:
# [[1 1 3 3]
# [2 2 4 4]]

# Split the horizontally stacked array into 2 separate arrays of equal size
horz_split_two = np.hsplit(horz_stack,2)
print(horz_split_two)
# Output:
# [array([[1, 1], [2, 2]]),
# array([[3, 3], [4, 4]])]

# Split the horizontally stacked array into 4 separate arrays of equal size
horz_split_four = np.hsplit(horz_stack,4)
print(horz_split_four)
# Output:
# [array([[1], [2]]),
# array([[1], [2]]),
# array([[3], [4]]),
# array([[3], [4]])]

# Split the horizontally stacked array after the first column
horz_split_first = np.hsplit(horz_stack,[1])
print(horz_split_first)
# Output:
# [array([[1], [2]]),
# array([[1, 3, 3], [2, 4, 4]])]
  • np.vsplit() - splits an array into several smaller arrays
# We have, vert_stack:
# [[1 1]
# [2 2]
# [3 3]
# [4 4]]

# Split the vertically stacked array into 2 separate arrays of equal size
vert_split_two = np.vsplit(vert_stack,2)
print(vert_split_two)
# Output:
# [array([[1, 1], [2, 2]]),
# array([[3, 3], [4, 4]])]

# Split the vertically stacked array into 4 separate arrays of equal size
vert_split_four = np.vsplit(vert_stack,4)
print(vert_split_four)
# Output:
# [array([[1, 1]]),
# array([[2, 2]]),
# array([[3, 3]]),
# array([[4, 4]])]

# Split the vertically stacked array after the first and third row
vert_split_first_third = np.vsplit(vert_stack,[1,3])
print(vert_split_first_third)
# Output:
# [array([[1, 1]]),
# array([[2, 2], [3, 3]]),
# array([[4, 4]])]

Conclusion

With NumPy, handling arrays becomes an efficient and intuitive process. Operations such as slicing, indexing, reshaping, and stacking allow for detailed and specific manipulation of data stored in ndarrays, enhancing the capacity to perform high-level mathematical operations and modeling, crucial in fields like machine learning and data science.

Understanding and utilizing the basic functions of NumPy can significantly streamline the processes involved in scientific computing and data analysis within Python, bolstering both the performance and productivity of your computational tasks! Happy Coding 🎉🐍🎉🐍

For more insightful articles on programming and technology follow Shitanshu Pandey on Medium.

--

--