Iterators and Generators#

These two concepts are often confused. Let’s clarify the difference first:

	Iterable	Iterator
Definition	An object that can be traversed with `for`	Object that remembers the current position and yields elements one by one
Examples	`list`, `str`, `dict`, `range`	Return values of `zip()`, `map()`, `filter()`
Conversion	`iter(iterable)` → iterator	`next(iterator)` → next element
Reusable	Yes	No (exhausted after use)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
nums = [1, 2, 3]       # list is an iterable, not an iterator

it = iter(nums)        # create an iterator with iter()
print(next(it))  # 1
print(next(it))  # 2
print(next(it))  # 3
print(next(it))  # StopIteration

# list can be iterated multiple times
for n in nums:
    print(n)
for n in nums:         # works again just fine
    print(n)

# iterator is exhausted after use
it2 = iter(nums)
list(it2)              # [1, 2, 3]
list(it2)              # [] (already exhausted)

How for loops really work#

Behind the scenes, a for loop does this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
for x in [1, 2, 3]:
    print(x)

# Equivalent to:
_iter = iter([1, 2, 3])
while True:
    try:
        x = next(_iter)
        print(x)
    except StopIteration:
        break

Why zip(), map(), filter() need list()#

These functions return iterators, not lists. The advantage of an iterator is lazy evaluation: elements aren’t generated all at once; computed on demand, saving memory.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# map() returns an iterator
result = map(lambda x: x ** 2, range(10))
print(result)        # <map object at 0x...>
print(list(result))  # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

# The advantage is clear when handling a million records
big = map(lambda x: x ** 2, range(1_000_000))
# Does not consume large amounts of memory; computed on demand

# Can directly use in for loops without converting to list
for val in map(lambda x: x ** 2, range(5)):
    print(val)

Generators#

A generator is a special type of iterator defined using the yield keyword. The function pauses at yield and produces a value; resumes from where it left off on the next call:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
def count_up(start, end):
    current = start
    while current <= end:
        yield current       # Pause, yield current
        current += 1        # Resume here next time

gen = count_up(1, 5)
print(next(gen))  # 1
print(next(gen))  # 2
print(next(gen))  # 3

# Can also use a for loop
for n in count_up(1, 5):
    print(n)

Comparison with regular functions#

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# Regular function: loads all results into memory at once
def squares_list(n):
    return [x ** 2 for x in range(n)]

# Generator function: computed on demand, saves memory
def squares_gen(n):
    for x in range(n):
        yield x ** 2

# Both are used the same way
for sq in squares_gen(5):
    print(sq)   # 0 1 4 9 16

# But memory usage differs greatly (especially noticeable with millions of records)
import sys
print(sys.getsizeof(squares_list(1000)))   # ~8056 bytes
print(sys.getsizeof(squares_gen(1000)))    # ~104 bytes (only stores the generator object)

Generator expressions#

Like list comprehensions but with (), produces a generator directly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# List comprehension (creates the entire list immediately)
sq_list = [x ** 2 for x in range(10)]

# Generator expression (lazy evaluation)
sq_gen = (x ** 2 for x in range(10))

print(sq_list)  # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
print(sq_gen)   # <generator object <genexpr> at 0x...>
print(list(sq_gen))  # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

# No need for list() when passing directly to sum(), max(), etc.
total = sum(x ** 2 for x in range(10))
print(total)  # 285

Advanced yield usage#

Multiple yield statements#

1
2
3
4
5
6
7
8
9
def weekdays():
    yield "Monday"
    yield "Tuesday"
    yield "Wednesday"
    yield "Thursday"
    yield "Friday"

for day in weekdays():
    print(day)

yield from: delegate to another iterable#

1
2
3
4
5
6
def flatten(nested):
    for sublist in nested:
        yield from sublist   # Equivalent to: for item in sublist: yield item

data = [[1, 2, 3], [4, 5], [6, 7, 8, 9]]
print(list(flatten(data)))  # [1, 2, 3, 4, 5, 6, 7, 8, 9]

Practical examples#

Infinite sequence#

Generators can produce infinite sequences because they don’t need to create all elements upfront:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def fibonacci():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

# Get first 10 Fibonacci numbers
gen = fibonacci()
fibs = [next(gen) for _ in range(10)]
print(fibs)  # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

Processing large data in batches#

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
def read_in_chunks(data, chunk_size):
    """Split large data into small batches for batch processing"""
    for i in range(0, len(data), chunk_size):
        yield data[i:i + chunk_size]

records = list(range(1, 101))   # Simulate 100 records

for batch in read_in_chunks(records, chunk_size=10):
    print(f"Processing records {batch[0]}–{batch[-1]}")

# Processing records 1–10
# Processing records 11–20
# ...
# Processing records 91–100

Moving average#

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
def moving_average(prices, window):
    """Calculate moving average with generator, no need to store all results at once"""
    buffer = []
    for price in prices:
        buffer.append(price)
        if len(buffer) > window:
            buffer.pop(0)
        if len(buffer) == window:
            yield sum(buffer) / window

prices = [100, 102, 98, 105, 110, 108, 112, 115]
ma3 = list(moving_average(prices, window=3))
print([round(x, 2) for x in ma3])
# [100.0, 101.67, 104.33, 107.67, 110.0, 111.67]