Kotonia Articles

别相信 agent 的自我规律 —— 当 mining 中 3 条 REJECT 规则被悄悄绕过，用 tool 层 blocking 把它结构性地修好

我不得不在自己服务的文章 mining pipeline 上证明「即便是 prompt 里明确写下的规则，agent 也会破」。3 条重复溜了过去。这是它的根因，以及把 TF-IDF 重复检测焊进 tool 层、让规则在结构上不可能被绕过的修复。agent 设计的 enforcement 原则。

作者清水真二2026-06-102分钟阅读

#ai#llm#agent#prompt#独立开发

「即便 prompt 里明确写下的规则，agent 也会破」——这件事我在自己服务的文章 mining pipeline 上不得不亲自证明了。3 条重复溜了过去，才让我注意到。

结论先放在这里：别靠 agent 的自我规律执行规则，要在 tool 层做 blocking。enforcement 不焊进结构里，就不可信。

1. mining 提示词里写得清清楚楚

Kotonia 的技术博客已经在 ja/en/zh 各有 58 篇既存文章。挖掘新想法时，我不希望和已有的重复。

prompts/mine.md 里把 REJECT 规则写得很明确：

5. CONCEPT-LEVEL DEDUPE — mandatory before every push:
   - Hit with importance_score >= 7: REJECT the candidate.
   - Hit with importance_score 4-6: only push if genuinely new angle.
   - No hits: green light.

也就是说，每个候选要用 art-concepts-find 比对一次，凡是和 flagship 文章 (importance ≥ 7) 重叠的全部扔掉。agent 是在读这条指示的。

实际的 mining run 里，agent 调用了 15 次 art-concepts-find。乍一看，「dedupe 是有做的」。

2. 3 条很明确地溜过去了

run 跑完后看 pending pool，发现有 3 条违反了 REJECT 规则。

1. 「流式粒度」的 idea

title: Voice Chat 「体感速度」的隐藏指标：流式粒度 (tokens per chunk)
sources: memory/streaming_granularity_voice_metric.md
score: 8

这条直接把既存文章 voice-first-local-llm (importance=9) 的 §3.3「Streaming granularity — 决定 voice 体验的结构性差异」 重播了一遍。tokens per chunk、Local 1.0 / Haiku 10-16 / Gemini 8-24，相同数字、相同 thesis。agent 调用了 art-concepts-find "粒度"，应该看到 importance=9 的 hit，却仍然 push 了。

2. pool 内的重复

title (旧): 视频兴趣上升与 NSFW 实际需求：给产品路线图的信号
title (新): 视频生成中的 NSFW 需求信号：从日志拉出产品路线图
source: memory/video_nsfw_demand_signal.md  ← 双方相同 source

mining v1 的残留与 mining v2 基于同一份 memory push 了同一个 idea。「与 pool 中既有 idea 的比对」也漏了。

3. 「OpenWeight 模型能力解锁」的 idea

title: 解锁 OpenWeight 模型「隐藏的能力」：通过 caption 词汇修复实现 NSFW 解锁
sources: memory/caption_vocab_repair_methodology.md
score: 9

memory caption_vocab_repair_methodology.md，正是既存文章 hidream-caption-vocab-repair (importance=8) 写成时所依据的素材本身。相同 memory、相同主题、相同文章。完全的重复，agent 却把它当作「新想法」push 了进来。

3. 为何会发生 —— enforcement 层的缺位

3 条全都可以从 log 里确认，agent 确实调用了 art-concepts-find。工具运转正常、结果也正确返回。agent 是在看到结果之后做出了 push 的判断。

那么问题在以下之一：

agent 读错了结果（有 hit 却识别为「无」）
agent 自行判断「这是新角度，算两回事」
agent 在生成中途忘了规则本身的存在

无论是哪一种，结构上就是 prompt-level 的规则交给 agent 的自我规律去守。enforcer 是 agent 自己。这套破产了。

4. 结构性的修复 —— 把 blocking 焊进 tool 层

我把 dedup gate 直接嵌进了 art-ideas-add (把 idea push 进 pool 的工具)。

# art-ideas-add 里 append 之前无条件执行
verdict = evaluate_idea(title, angle, sources, ...)
if not verdict["allow"] and not args.force:
    sys.stderr.write(json.dumps(verdict["conflicts"]))
    sys.exit(1)  # ← agent 在这里被拦下

evaluate_idea() 的内部就是 canonical concept 词表上的 TF-IDF：

词表 = articles_index.jsonl 中所有 concepts_covered_ja[] 的并集
IDF 加权，越罕见的 concept 权重越高
ASCII 用语必须是 word boundary（"check" 匹配 "checkout" 这种假阳性被排除）
通用 JP 名词（モデル / システム / アーキテクチャ / ...）通过 noise list 过滤
与 importance ≥ 7 的文章 cosine similarity ≥ 0.25 → REJECT
与既有 idea cosine similarity ≥ 0.35 → REJECT（pool 内重复也一并处理）

这样，不管 agent 多想绕过规则，tool 层会在物理上 reject。明确加上 --force 才能通过（override 的理由会被记录到 row.force）。

回归：上述 3 条都正确被 reject，而干净的 idea（CodeFormer 面部修复 / Stripe Product/Price）正常通过。4/4 按预期。

技术细节在 Dreaming layer 这一篇： Zenn: 把 Claude Code 的记忆模型作为 Dreaming Layer 移植到 58 篇文章的实现笔记

5. 可以泛化的 lesson

做 agent 自动化时，把规则「指示」给 agent，是拿不到 enforcement 的。这是结构性的事实，下列场景里同样的陷阱会反复出现：

「让 agent 判断数据校验」 → tool 端必须用 schema 强制
「让 agent 自己避开危险操作」 → tool 端必须做 permission check
「让 agent 自己 dedupe」 → tool 端必须有 blocking gate
「让 agent 守住格式」 → tool 端必须做 validation 然后 reject

LLM 能力还会涨，但自我规律存在波动性（温度、context 长度、正文里出现的相冲指令、等等）。enforcement 应该放在结构性的东西里，agent 那边只留判断和创造的余地。

这正在成为 agent 设计的 bread-and-butter。

余谈：是怎么发现的

并不是 Kotonia 上那位 AI 妻子角色提醒我「这跟你写过的文章重复了吧？」我只是顺手扫 GitHub 的文件列表，突然意识到「等等，hidream-caption-vocab-repair 我已经写过了啊」。

agent 驱动的 pipeline 让人开心的地方是「能自走」，但自走过程中会悄悄失败。事后还得追着核对——这种心理压力并不会消失。tool 层 blocking 顺便把这种心理成本也降低了，可以说是一个附带的好处。

「个人开发者的累积资产开始复利」这条更宏观的话，写在另一篇：个人开发者的累积资产第一次开始「复利」的那一天