Python函数-生成器拓展

一、高级生成器特性

def g():
    try:
        while True:
            received = yield
            print('收到:', received)
    except GeneratorExit:
        print('收到 close')
    except ValueError as e:
        print('收到 throw:', e)
        yield '异常已处理'

gen = g()
next(gen)                      # 激活
gen.send('Hello')              # 输出：收到: Hello
gen.throw(ValueError, 'boom')  # 输出：收到 throw: boom  返回 '异常已处理'
gen.close()                    # 输出：收到 close

send(value) → 把 value 赋给当前 yield 表达式 → 继续跑字节码直到下一个 yield 或 return。
close() → 字节码插入 RAISE_VARARGS GeneratorExit → 如果生成器捕获 GeneratorExit 并 return，则正常结束；否则隐式 StopIteration。
throw() → 字节码插入 RAISE_VARARGS 指定异常，其余同 send。

1、生成器的 `send(value)` 方法

作用：把 value 注入到 当前 yield 表达式 并继续执行到下一个 yield
参数：任意对象
返回值：下一个 yield 产出的值
典型异常：StopIteration

def echo():
    while True:
        received = yield
        print(received)

e = echo()
next(e)  # 启动生成器
e.send('Hello')  # 输出 Hello
e.send('World')  # 输出 World

1.1、概念

send(value) 把 外部数据 通过 yield 表达式注入到生成器内部，并立即恢复生成器执行，直至下一个 yield 或 StopIteration。

1.2、基本语法

def coro():
    received = yield 1          # ① 先产出 1；② 外部 send 的值赋给 received
    print("收到：", received)
    yield 2

gen = coro()
first = next(gen)               # 必须首次激活，进入 ①
print(first)                    # 输出：1
second = gen.send("Hello")      # 把 "Hello" 注入 received，继续到 ② 输出：收到： Hello

1.3、运行步骤逐帧跟踪

行号	生成器代码	外部调用	生成器状态	返回值
1	`received = yield 1`	`next(gen)`	停在 yield 1	1
2	`print(received)`	`gen.send("Hello")`	继续执行	None
3	`yield 2`	同上	停在 yield 2	2
4	函数结束	同上	抛出 StopIteration	-

1.4、`send()` 与 `next()` 的区别

方法	是否注入值	首次启动	典型用途
`next(gen)`	否	✅	仅仅推进
`gen.send(None)`	否	✅	同 `next`
`gen.send(v)`	是	❌	双向通信

1.5、异步场景——“协程雏形”

在 async/await 出现前，生成器 + send() 就是协程的底层实现：

yield 暂停 I/O；
事件循环用 send(result) 把 I/O 结果送回生成器，从而“异步”继续执行。

1.6、完整示例：计数协程 + 外部注入

def counter():
    count = 0
    while True:
        step = yield count      # 产出当前值，等待外部 step
        if step is None:        # 首次 next 时 step 为 None
            step = 1
        count += step

c = counter()
print(next(c), end=',')          # 0
print(c.send(5), end=',')        # 5  (0 + 5)
print(c.send(-2))                # 3  (5 - 2)

输出：0,5,3

1.7、常见陷阱

首次忘记 `next()`

直接gen.send(123)会抛 TypeError，因为生成器尚未启动。

在 `yield` 前 `return`

send() 之后若立即return，将抛出 StopIteration(value)，value 即 return 的值。

循环变量闭包

在列表推导式里使用 send() 时要注意延迟绑定问题，与 lambda 闭包类似。

2、生成器的 `close()` 方法

作用：立即在生成器内部抛出 GeneratorExit，可做清理
参数：无
返回值：None
典型异常：若生成器 yield 后未捕获，则隐式 StopIteration

def limited_counter(n):
    try:
        for i in range(n):
            yield i
    finally:
        print('Generator closed')

lc = limited_counter(3)
print(next(lc))  # 输出 0
lc.close()  # 输出 'Generator closed'

3、生成器的 `throw(type[, value[, traceback]])` 方法

作用：在生成器内部抛出指定异常并继续执行
参数：异常类或实例
返回值：下一个 yield 产出的值
典型异常：同 send，但会抛出你给定的异常

def limited_counter(n):
    try:
        for i in range(n):
            yield i
    except ValueError as e:
        print(f'Caught exception: {e}')

lc = limited_counter(3)
print(next(lc))  # 输出 0
lc.throw(ValueError('Invalid input'))  # 输出 'Caught exception: Invalid input'

4、生成器表达式

# 生成平方数序列
squares = (x**2 for x in range(1000000))  # 几乎不占用内存

# 等价于
def squares_gen():
    for x in range(1000000):
        yield x**2

5、`yield from` 委托

5.1、基本语法

def generator():
    yield from iterable

# iterable 可以是任何可迭代对象，包括生成器、列表、元组等。

5.2、工作原理

当执行到 yield from iterable 时，Python 会：

遍历iterable 中的每个元素。
将每个元素依次 yield 出去。
如果 iterable 是一个生成器，那么 yield from 会捕获生成器的 StopIteration 异常（该异常在生成器耗尽时抛出），并正常结束外层生成器。

5.3、示例

# 从列表中 yield 值

def generator():
    yield from [1, 2, 3, 4, 5]

for value in generator():
    print(value, end=" ") # 1 2 3 4 5

# 从另一个生成器中 yield 值

def inner_generator():
    yield from range(3)
    yield from [10, 20, 30]

def outer_generator():
    yield from inner_generator()

for value in outer_generator():
    print(value, end=" ") # 0 1 2 10 20 30

5.4、高级用法

# 使用 yield from 简化递归生成器

def recursive_generator(n):
    if n > 0:
        yield from recursive_generator(n-1)
    yield n

for value in recursive_generator(3):
    print(value, end=" ") # 0 1 2 3

初始化生成器：
  调用 recursive_generator(3) 创建生成器对象。
  此时，n 的值为 3。

递归调用：
  检查 n > 0 条件，因为 n 为 3，条件为真。
  执行 yield from recursive_generator(n-1)，即 yield from recursive_generator(2)。
  这会导致生成器函数再次被调用，但这次 n 的值为 2。

…

递归基案：
  检查 n > 0 条件，因为 n 为 0，条件为假。
  执行 yield n，即 yield 0，生成器开始返回值。
  recursive_generator(0) 完成，0 被 yield 出去。

递归回溯：
  recursive_generator(1) 接收到 0，继续执行 yield n，即 yield 1。
  recursive_generator(2) 接收到 1，继续执行 yield n，即 yield 2。
  recursive_generator(3) 接收到 2，继续执行 yield n，即 yield 3。

输出结果：
  外层循环 for value in recursive_generator(3) 依次接收到 0、1、2、3，并打印这些值。

# 处理生成器中的错误

def inner_gen():
    try:
        yield 1
        yield 2
        raise Exception('Something went wrong')
    except Exception as e:
        yield 'Error handled'

def outer_gen():
    yield from inner_gen()

for value in outer_gen():
    print(value, end=" ") # 1 2 Error handled

5.5、与 `yield` 的区别

yield：用于从生成器中返回一个值，并在下一次调用 next() 时从上次停止的地方继续执行。
yield from：用于将一个生成器或可迭代对象的输出转发到另一个生成器。它允许生成器委托给另一个生成器，从而简化代码。

二、生成器的实际应用场景

1、处理大型文件/数据集

1.1、概念

生成器把“一次性读入”变成“按需流式读取”，只保留当前一行（或一块）在内存，从而 O(1) 内存 处理 GB 级文件。

1.2、内存模型对比

方式	内存峰值	说明
`f.read()` 或 `list(f)`	文件大小	内容全部加载
`for line in f:`	一行	内建缓冲区
yield 生成器	一行 + 对象	手动控制块大小，可再压缩

普通列表方法：
┌──────────────┐   ┌──────────────┐   ┌──────────────┐
│ 读取全部数据  │──▶│ 处理全部数据  │──▶│ 存储结果列表  │
└──────────────┘   └──────────────┘   └──────────────┘
      ▲                   ▲                   ▲
      └─── 高内存占用 ─────┴─── 高内存占用 ─────┘

生成器管道方法：
┌──────────────┐   ┌──────────────┐   ┌──────────────┐
│ 逐项读取数据  │──▶│ 逐项处理数据  │──▶│ 逐项输出结果  │
└──────────────┘   └──────────────┘   └──────────────┘
      ▲                   ▲                   ▲
      └─── 低内存占用 ─────┴─── 低内存占用 ─────┘

1.3、三种典型场景

逐行文本
分块二进制
数据库 / 网络流

1.4、逐行示例：统计 1 GB 日志里 ERROR 行数

import re
pattern = re.compile(r'ERROR', re.I)

def read_lines(path):
    with open(path, 'r', encoding='utf-8') as f:
        for line in f:
            yield line

def filter_errors(lines):
    for line in lines:
        if pattern.search(line):
            yield line

def count_errors(path):
    return sum(1 for _ in filter_errors(read_lines(path)))

# 使用
print(count_errors('big.log'))   # 内存 < 1 MB

1.5、分块二进制：SHA-256 大文件

import hashlib

def read_chunks(path, size=1024*1024):
    with open(path, 'rb') as f:
        while chunk := f.read(size):
            yield chunk

def sha256_file(path):
    h = hashlib.sha256()
    for chunk in read_chunks(path):
        h.update(chunk)
    return h.hexdigest()

1.6、性能 & 高级技巧

技巧	代码片段	效果
缓冲大小	`f.read(size)`	调整 I/O 块大小
惰性转换	`(int(x) for x in lines)`	零拷贝转换
多文件链	`yield from file_gen`	无缝拼接
并发	`yield from asyncio_stream()`	与协程配合

1.7、完整实战：CSV → 清洗 → 入库（百万行）

import csv
import sqlite3

def read_csv(path):
    with open(path, newline='') as f:
        reader = csv.DictReader(f)
        for row in reader:
            yield row

def clean(rows):
    for r in rows:
        # 去掉空值、类型转换
        yield {
            'id': int(r['id']),
            'name': r['name'].strip(),
            'score': float(r['score'])
        }

def insert_db(rows, db='big.db'):
    conn = sqlite3.connect(db)
    cur = conn.cursor()
    cur.execute('CREATE TABLE IF NOT EXISTS scores(id INT,name TEXT,score REAL)')
    for row in rows:
        cur.execute('INSERT INTO scores VALUES (?,?,?)',
                    (row['id'], row['name'], row['score']))
    conn.commit()
    conn.close()

# 管道
insert_db(clean(read_csv('scores.csv')))

2、无限序列

2.1、概念

把 while True: 和 yield 写在一起，就能得到一个按需生产、永不枯竭的数据流；外部只需按需 next() 或 for x in gen: 拿值，内存始终 O(1)。

2.2、最小骨架

def naturals():
    n = 0
    while True:
        yield n
        n += 1

2.3、内存优势对比

方式	存储	内存峰值
`list(range(10**8))`	一次性加载	≈ 3 GB
无限 `yield`	按需生产	≈ 几十字节

2.4、4 类经典无限序列

自然数

def naturals(start=0):
    while True:
        yield start
        start += 1

斐波那契

def fib():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

等差 / 等比序列

def arithmetic(a0, d):
    while True:
        yield a0
        a0 += d

def geometric(a0, r):
    while True:
        yield a0
        a0 *= r

素数流（埃拉托斯特尼筛）

def primes():
    from itertools import islice
    sieve = {}
    n = 2
    while True:
        if n not in sieve:
            yield n
            sieve[n * n] = n
        else:
            p = sieve.pop(n)
            m = n + p
            while m in sieve:
                m += p
            sieve[m] = p
        n += 1

2.5、截取技巧

需求	工具	示例
前 N 个	`itertools.islice`	`list(islice(gen, 10))`
满足条件的前 N 个	`itertools.takewhile`	`list(takewhile(lambda x: x < 100, gen))`
跳过前 N 个	`itertools.dropwhile`	`list(dropwhile(lambda x: x < 100, gen))`

2.6、实战案例：无限素数流 → 前 10 个 4k+1 型素数

from itertools import islice

def primes_4k1():
    p = primes()
    for n in p:
        if n % 4 == 1:
            yield n

print(list(islice(primes_4k1(), 10)))
# [5, 13, 17, 29, 37, 41, 53, 61, 73, 89]

3、数据管道

3.1、一句话定义

“生成器数据管道”就是把一连串生成器像 Unix 管道一样串起来：上游不断 yield，下游不断 for x in upstream 或 yield from，数据像水流一样逐层传递，内存占用极低。

3.2、最小骨架

# stage1：读取数据
def read_data(source):
    for item in source:
        yield item

# stage2：过滤数据
def filter_data(data):
    for item in data:
        if item % 2 == 0:
            yield item

# stage3：加工数据
def process_data(data):
    for item in data:
        yield item * 2

# 组合生成器管道
pipeline = process_data(filter_data(read_data(range(10))))
print(list(pipeline))  # [0, 4, 8, 12, 16]

3.3、运行步骤逐帧图

read_data         filter_data        process_data
────────────      ────────────       ────────────
yield "0"    ──►   yield "0"   ──►   yield "0"
yield "1"    ──►   跳过             （无）
yield "2"    ──►   yield "2"   ──►   yield "4"
...                ...               ...

数据按需生产，下游用到多少就生产多少。
任何阶段随时 break，上游立即停止，零浪费。

3.4、三种连接方式对比

连接方式	语法	特点
普通 for	`for x in g: yield f(x)`	最通用，可额外逻辑
`yield from`	`yield from g`	一行代理，无额外开销
生成器表达式	`(f(x) for x in g)`	极简，但只能单行表达式

3.5、性能对比

方式	内存峰值	时间
列表推导式	全部加载	高
生成器管道	O(1)	低

4、状态机实现

生成器函数的 执行位置 + 局部变量 被解释器保存在 frame + cell 里，每次 next() / send() 从上次暂停处继续跑，本质就是 “带记忆的 goto”。

def state_machine():
    state = "START"
    while True:
        if state == "START":
            input = yield
            if input == "A":
                state = "STATE_A"
        elif state == "STATE_A":
            input = yield
            if input == "B":
                state = "END"
        else:
            return

三、生成器与协程

生成器在 Python 3.5+ 中被扩展为协程，用于异步编程。虽然生成器和协程在底层共享很多机制，但它们的用途和语义有所不同。

async def async_counter(n):
    for i in range(n):
        yield i

async for i in async_counter(3):
    print(i)  # 输出 0, 1, 2

一、高级生成器特性

1、生成器的 send(value) 方法

1.1、概念

1.2、基本语法

1.3、运行步骤逐帧跟踪

1.4、send() 与 next() 的区别

1.5、异步场景——“协程雏形”

1.6、完整示例：计数协程 + 外部注入

1.7、常见陷阱

首次忘记 next()

在 yield 前 return

循环变量闭包

2、生成器的 close() 方法

3、生成器的 throw(type[, value[, traceback]]) 方法

4、生成器表达式

5、yield from 委托

5.1、基本语法

5.2、工作原理

5.3、示例

5.4、高级用法

5.5、与 yield 的区别

二、生成器的实际应用场景

1、处理大型文件/数据集

1.1、概念

1.2、内存模型对比

1.3、三种典型场景

1.4、逐行示例：统计 1 GB 日志里 ERROR 行数

1.5、分块二进制：SHA-256 大文件

1.6、性能 & 高级技巧

1.7、完整实战：CSV → 清洗 → 入库（百万行）

2、无限序列

2.1、概念

2.2、最小骨架

2.3、内存优势对比

2.4、4 类经典无限序列

2.5、截取技巧

2.6、实战案例：无限素数流 → 前 10 个 4k+1 型素数

3、数据管道

3.1、一句话定义

3.2、最小骨架

3.3、运行步骤逐帧图

3.4、三种连接方式对比

3.5、性能对比

4、状态机实现

三、生成器与协程

1、生成器的 `send(value)` 方法

1.4、`send()` 与 `next()` 的区别

首次忘记 `next()`

在 `yield` 前 `return`

2、生成器的 `close()` 方法

3、生成器的 `throw(type[, value[, traceback]])` 方法

5、`yield from` 委托

5.5、与 `yield` 的区别