Iterators and Generators#
These two concepts are often confused. Let’s clarify the difference first:
| Iterable | Iterator |
|---|
| Definition | An object that can be traversed with for | Object that remembers the current position and yields elements one by one |
| Examples | list, str, dict, range | Return values of zip(), map(), filter() |
| Conversion | iter(iterable) → iterator | next(iterator) → next element |
| Reusable | Yes | No (exhausted after use) |
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| nums = [1, 2, 3] # list is an iterable, not an iterator
it = iter(nums) # create an iterator with iter()
print(next(it)) # 1
print(next(it)) # 2
print(next(it)) # 3
print(next(it)) # StopIteration
# list can be iterated multiple times
for n in nums:
print(n)
for n in nums: # works again just fine
print(n)
# iterator is exhausted after use
it2 = iter(nums)
list(it2) # [1, 2, 3]
list(it2) # [] (already exhausted)
|
How for loops really work#
Behind the scenes, a for loop does this:
1
2
3
4
5
6
7
8
9
10
11
| for x in [1, 2, 3]:
print(x)
# Equivalent to:
_iter = iter([1, 2, 3])
while True:
try:
x = next(_iter)
print(x)
except StopIteration:
break
|
Why zip(), map(), filter() need list()#
These functions return iterators, not lists. The advantage of an iterator is lazy evaluation: elements aren’t generated all at once; computed on demand, saving memory.
1
2
3
4
5
6
7
8
9
10
11
12
| # map() returns an iterator
result = map(lambda x: x ** 2, range(10))
print(result) # <map object at 0x...>
print(list(result)) # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
# The advantage is clear when handling a million records
big = map(lambda x: x ** 2, range(1_000_000))
# Does not consume large amounts of memory; computed on demand
# Can directly use in for loops without converting to list
for val in map(lambda x: x ** 2, range(5)):
print(val)
|
Generators#
A generator is a special type of iterator defined using the yield keyword. The function pauses at yield and produces a value; resumes from where it left off on the next call:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| def count_up(start, end):
current = start
while current <= end:
yield current # Pause, yield current
current += 1 # Resume here next time
gen = count_up(1, 5)
print(next(gen)) # 1
print(next(gen)) # 2
print(next(gen)) # 3
# Can also use a for loop
for n in count_up(1, 5):
print(n)
|
Comparison with regular functions#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| # Regular function: loads all results into memory at once
def squares_list(n):
return [x ** 2 for x in range(n)]
# Generator function: computed on demand, saves memory
def squares_gen(n):
for x in range(n):
yield x ** 2
# Both are used the same way
for sq in squares_gen(5):
print(sq) # 0 1 4 9 16
# But memory usage differs greatly (especially noticeable with millions of records)
import sys
print(sys.getsizeof(squares_list(1000))) # ~8056 bytes
print(sys.getsizeof(squares_gen(1000))) # ~104 bytes (only stores the generator object)
|
Generator expressions#
Like list comprehensions but with (), produces a generator directly:
1
2
3
4
5
6
7
8
9
10
11
12
13
| # List comprehension (creates the entire list immediately)
sq_list = [x ** 2 for x in range(10)]
# Generator expression (lazy evaluation)
sq_gen = (x ** 2 for x in range(10))
print(sq_list) # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
print(sq_gen) # <generator object <genexpr> at 0x...>
print(list(sq_gen)) # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
# No need for list() when passing directly to sum(), max(), etc.
total = sum(x ** 2 for x in range(10))
print(total) # 285
|
Advanced yield usage#
Multiple yield statements#
1
2
3
4
5
6
7
8
9
| def weekdays():
yield "Monday"
yield "Tuesday"
yield "Wednesday"
yield "Thursday"
yield "Friday"
for day in weekdays():
print(day)
|
yield from: delegate to another iterable#
1
2
3
4
5
6
| def flatten(nested):
for sublist in nested:
yield from sublist # Equivalent to: for item in sublist: yield item
data = [[1, 2, 3], [4, 5], [6, 7, 8, 9]]
print(list(flatten(data))) # [1, 2, 3, 4, 5, 6, 7, 8, 9]
|
Practical examples#
Infinite sequence#
Generators can produce infinite sequences because they don’t need to create all elements upfront:
1
2
3
4
5
6
7
8
9
10
| def fibonacci():
a, b = 0, 1
while True:
yield a
a, b = b, a + b
# Get first 10 Fibonacci numbers
gen = fibonacci()
fibs = [next(gen) for _ in range(10)]
print(fibs) # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
|
Processing large data in batches#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| def read_in_chunks(data, chunk_size):
"""Split large data into small batches for batch processing"""
for i in range(0, len(data), chunk_size):
yield data[i:i + chunk_size]
records = list(range(1, 101)) # Simulate 100 records
for batch in read_in_chunks(records, chunk_size=10):
print(f"Processing records {batch[0]}–{batch[-1]}")
# Processing records 1–10
# Processing records 11–20
# ...
# Processing records 91–100
|
Moving average#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| def moving_average(prices, window):
"""Calculate moving average with generator, no need to store all results at once"""
buffer = []
for price in prices:
buffer.append(price)
if len(buffer) > window:
buffer.pop(0)
if len(buffer) == window:
yield sum(buffer) / window
prices = [100, 102, 98, 105, 110, 108, 112, 115]
ma3 = list(moving_average(prices, window=3))
print([round(x, 2) for x in ma3])
# [100.0, 101.67, 104.33, 107.67, 110.0, 111.67]
|