23
loading...
This website collects cookies to deliver better user experience
yield
statements inside a function.MemoryError
? Perhaps, you have tried reading rows from a super large Excel (or .csv
) file.return
and yield
statements in Python.# Example of using a regular function
import csv
def read_csv_from_regular_fn():
with open('large_dataset.csv', 'r') as f:
reader = csv.reader(f)
return [row for row in reader]
result_1 = read_csv_from_regular_fn()
# Output:
# [['a','b','c', ... ], ['x','y','z', ... ] ... ]
MemoryError
depending on our computers.read_csv_from_regular_fn
would open our CSV file and loads everything in memory in an instance.# Example of using a Generator function
import csv
def read_csv_from_generator_fn():
with open('large_dataset.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
yield row
# To get the same output as result_1,
# We generate a list using our newly created Generator function:
result_2 = [row for row in read_csv_from_generator_fn()]
# Output same as result_1:
# [['a','b','c', ... ], ['x','y','z', ... ] ... ]
read_csv_from_generator_fn
as our Generator function. This new Generator opens our large CSV file, loops through every row, and yields each row at a time rather than all at once.MemoryError
or even any slowness due to memory constraints when reading data from our large_dataset.csv
.import sys
print(sys.getsizeof(read_csv_from_generator_fn())) # 112 bytes
print(sys.getsizeof(read_csv_from_regular_fn())) # 1624056 bytes
# Example 1
nums_list_comprehension = [i * i for i in range(100_000_000)]
sum(nums_list_comprehension) # 333333328333333350000000
MemoryError
or at least a couple of seconds of slowness when evaluating the expression above.yield
statement.# Example 2
nums_generator = (i \* i for i in range(100_000_000))
# <generator object <genexpr> at 0x106ecc580>
sum(nums_generator) # 333333328333333350000000
i ** i
for the entire range of 100_000_000
is being evaluated and stored in memory beforehand. It returns a full list.i ** i
is only evaluated when being iterated, one at a time. It returns a Generator expression.import sys
print(sys.getsizeof(nums_generator)) # 112 bytes
print(sys.getsizeof(nums_list_comprehension)) # 835128600 bytes
# Continuing from Example 2
sum(nums_generator) # 333333328333333350000000
sum(nums_generator) # 0, because it can only be iterated once.
cProfile of summing using List Comprehension vs. Generator Expression:
# List Comprehension
# ------------------
cProfile.run('sum([i * i for i in range(100_000_000)])')
# 5 function calls in 13.956 seconds
# Ordered by: standard name
# ncalls tottime percall cumtime percall filename:lineno(function)
# 1 8.442 8.442 8.442 8.442 <string>:1(<listcomp>)
# 1 0.841 0.841 13.956 13.956 <string>:1(<module>)
# 1 0.000 0.000 13.956 13.956 {built-in method builtins.exec}
# 1 4.672 4.672 4.672 4.672 {built-in method builtins.sum}
# 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
# Generator Expression
# --------------------
cProfile.run('sum((i * i for i in range(100_000_000)))')
# 100000005 function calls in 22.996 seconds
# Ordered by: standard name
# ncalls tottime percall cumtime percall filename:lineno(function)
# 100000001 11.745 0.000 11.745 0.000 <string>:1(<genexpr>)
# 1 0.000 0.000 22.996 22.996 <string>:1(<module>)
# 1 0.000 0.000 22.996 22.996 {built-in method builtins.exec}
# 1 11.251 11.251 22.996 22.996 {built-in method builtins.sum}
# 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
cProfile
result above, we can tell that using list comprehension is a lot faster provided we don’t run into memory constraints.