Normpy performance optimization (Speed ​​Matters!)

Original): Nibedita (NS)

Originally published in the direction of artificial intelligence.

Hey, guys! Welcome back in ours Numpy for DS and DA number. It will be the 9th article of this series. We discussed in our previous article Generating random numbers from Numpa. We also saw examples of code to better understand the concepts. So if you haven't read the previous article yet, you can check it first.

So let's get to our topic now.

Performance optimization – matters

When we work with large sets of data, even slight inefficiency can slow down everything. Good news? Numpy is built for speed, and with several intelligent habits we can squeeze even more performance from it.

Let's browse the practical ways to increase the numba operation faster. I will keep it on examples that you can try right away.

1. Use vector operations instead of Python loop

Golden rule: Avoid Python Down Loops as soon as possible.

All right! Let's understand this with a practical example.

import numpy as np
from time import time

We accept help time Module to compare their speed.

arr = np.arange(1_000_000)

loop_time1 = time()
result = ()
for x in arr:
result.append(x * 2)

loop_time2 = time()
print(f"Loop Time: {loop_time2 - loop_time1} seconds")
# Loop time: 0.12406730651855469 seconds

numpy_time1 = time()
result = arr * 2
numpy_time2 = time()
print(f"Numpy Time: {numpy_time2 - numpy_time1} seconds")
# Numpy time: 0.01680159568786621 seconds

Now you can see the difference. How speed is compared to ordinary Python loops.

Why?

Numpy does Mathematics in compiled C Code under the hood. One line of vectorized code can be 10-100x Faster than Python's loop.

2. Select the appropriate data type

Smaller and relevant data typesmeans Less memory. AND Less memory means Faster operations.

arr = np.arange(1_000_000, dtype=np.int32)

It uses less memory than Int64. If you do not need variability precision, do not use float64 By default. In the case of huge boards, this small change can save hundreds of MBS.

3. Preliminary boards instead of cultivating them

Repeatedly, joining the list or board forces Numbers to create new memory blocks. Let's compare three different ways:

Block 1: range X for loops

data = ()
for i in range(1_000_000):
data.append(i)

arr = np.array(data)

BLCOK 2: Numpy Array X for loops

arr = np.empty(1_000_000, dtype=np.int32)
for i in range(1_000_000):
arr(i) = i

Block 3: no loop

Even better, if you know the pattern, completely skip the loop:

arr = np.arange(1_000_000, dtype=np.int32)

This is the fastest and most effective way to create a sequential number of integers, both in terms of speed and memory efficiency.

  • Block 1 It is the slowest, because joining the Python list is not optimized for numerical operations, and the subsequent conversion to NumPy creates an additional copy, which makes it enough for memory.
  • Block 2 It eliminates the list, but still uses a loop at Python level, which cannot compete with the compiled efficiency of internal numpy functions.
  • Block 3 It uses the internal NUMPA implementation, which results in a highly optimized, compiled code for allocation and memory assignment, which makes it much faster and more efficient memory than the other two.

4. Use surgery on site

Operations on the spot modify the board directly, saving time and memory.

arr = np.arange(10**7)
arr *= 2

This is highly optimized because it uses vectorized NumPy operations to efficiently create and modify large boards in memory without a Python loop.

What are the key points of optimization of surgery on site?

  • Vectorization: Operations are at the same time to the entire board, fully using low level optimizations and processor functions.
  • Memory performance: No indirect python lists or assignment for elements, so memory allocation and reuse are optimal.
  • Speed: The combination of continuous storage of data and vectorized instructions leads to performance many times faster than equivalent clean-mython loops.

Look for surgery ending with underlining ARR.Sort () vs. E.g. Sort (ARR) or use operators (+=IN *=) To update the boards on the spot.

5. Work with views, not copies

Cutting creates a view instead of copying, which is much faster.

AND View It is a new plaque object that provides the same data buffer as the original board, but it can have different metadata (such as shape or steps). While Copy He creates a completely new board with its own independent data.

View changes affect the original board, and the changes in the original affect the sight because they relate to the same basic data. But modification of a copy does not affect the original board and vice versa.

Views are usually created by cutting or transforming operations. While copies require additional memory allocation and take more time because the data must be reproduced.

large_arr = np.arange(10**7)
view = large_arr(100:200)

The views are faster and more efficient memory, because no new data is allocated or copied. Only new window For existing data, they are created by saving time and memory.

But as we know that a modifying view changes the original board, we must also watch out for the views. Otherwise, unintentional changes in original data may occur if you modify the view. 😬

7. Transmission lever

Broadcasting allows NumPy to perform operations in various shapes without a loop or additional memory.

matrix = np.ones((3, 3))
vector = np.array((1, 2, 3))

result = matrix + vector # Vector is “broadcast” across rows

Let's take another example:

a = np.array((1, 2, 3)) # shape (3,)
b = np.array(((10), (20), (30))) # shape (3,1)

result = a + b # shape (3,3) broadcasted addition

The smaller board is practically stretched to the shape of a larger board by reinterpretation of steps and metadata, without reproducing data in memory. This avoids the need to allocate a large board filled with repeated data, which actually saves memory.

If you want to learn more BroadcastingYou can also check mine Transmitting and vectorized operations In Numpy. Or check the entire list Numpy for DS and DA.

8. Profile before optimization

You don't know where the slowdown is? Use simple profiling.

Profiling tools measure how long different parts of our code last, how much memory they use and where most of the time is spent.

%timeit arr * 2

%of the time This is the magical command of iPithon built on Python time module. While running %Timit Arr * 2He performs the expression many times ARR * 2 Many times (usually thousands) and records the time of implementation.

%of the time Automatically supports configuration and many mileage to ensure statistically significant time results without manual intervention.

arr = np.arange(int(1e6))
%timeit arr * 2
# 3.99 ms ± 741 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
  • 3.99 ms This is the average time to make the code ARR * 2 once.
  • This average is calculated in many mileage, in which everyone launches many times performs the instructions. In our case it is 100 loops procedure.
  • ± 741 μs (Microsekunds) is a standard deviation, showing how much the time of performance differs between the previous previous ones.
  • The measurement is based on 7 separate course of the time experiment.
  • Each run performs the operation 100 times (100 loops) to collect sufficient data for accuracy.

%of the time It basically provides reliable time, launching multiple code and reporting average and variability in many mileage, which gives a solid estimate of typical performance.

At Jupyter or iPython, you can use it to measure the performance time. Focus on the slowest parts, don't guess.

9. Go even faster with numb or cython

If you absolutely need a higher speed, tools like Numbers It can compile Python code to the machine code.

from numba import njit

@njit
def double(arr):
return arr * 2

Numba is a Just-in-Time (JIT) compiler for Python, which can significantly speed up numerical calculations, especially those including loops above the NumPy boards. Well, I haven't worked personally from Numba yet. Maybe someday, if I ever find time time! 😂

To say, the numbers are widely recognized due to its capacity to compile Python code to a highly optimized machine code, often causing performance improvements outside the standard NumPy operations.

But start with the best numba practices, they are often everything you need.

Key results

  • Vectorize everything you can, Row python loops.😛
  • Choose the smallest relevant data type.
  • Instead of joining boards in front of the allocas.
  • Use surgery and place views to save your memory.
  • Trust the built -in numpy functions, they are fast for a reason.

Speed ​​mattersEspecially with large data sets. A few simple habits can turn a slow code into instant analyzes.

Numpy already gives speed. But thanks to these tricks you will squeeze each last drop of performance and spend more time analyzing data, less time waiting to start the code.

And all this for today!

Check out related letters:

Thanks for reading! 😊

Published via AI

LEAVE A REPLY

Please enter your comment!
Please enter your name here