Kotonia Articles

把 Claude Code 的记忆模型作为 Dreaming Layer 移植到 58 篇文章的实现笔记

把 Claude Code 的 memory 为何能运转拆解开来，应用到自己的 58 篇技术博客上。在本地 Gemma 4 26B + Codex CLI 上跑通的原始文章 → semantic index → TF-IDF 去重 → chunked draft 全路径实现记录。

作者清水真二2026-06-103分钟阅读

#llm#agent#tfidf#codex#vllm

其他语言英语

我在一个 session 内搭起了一条 pipeline，把自己服务 Kotonia 的技术博客 58 篇 (ja/en/zh) 压缩成 semantic index，再用它做新文章 mining 的重复检测。原始文章 → semantic index → TF-IDF 去重 → chunked draft 生成 的全路径，在本地 Gemma 4 26B + Codex CLI 上跑通。本篇是设计与实现笔记。

实现动机与「个人开发者的累积资产为何会复利」的定位，写在配套那篇：个人开发者的累积资产第一次开始「复利」的那一天

本篇专注技术细节。

1. 问题 —— 仅靠 title 的去重崩盘的瞬间

mining v1 生成了一份 draft，我（user 自己）注意到「这跟既存文章重复了」。重复对象是 voice-first-local-llm (importance=9 的 flagship)。

新 draft 的 thesis：「tokens per chunk 是 voice chat 延迟的隐藏指标」
既存文章 §3.3：「★ Streaming granularity — 决定 voice 体验的结构性差异」

数字完全一样 (Local Gemma 1.0 tok/chunk、Haiku 10-16、Gemini 8-24)。完全重复。

mining agent 调用了 art-done-list (title + description) 做了 dedupe check。但既存文章的 title 是「把短文 LLM 延迟从 600ms 降到 22ms」，主推 TTFB；§3.3 streaming granularity 被埋在 H2 子小节里。title 层面没有重叠，所以查不出来。

这就是本篇的起点。

2. 设计 —— episodic ↔ semantic ↔ procedural 三层

拆解 Claude Code memory 系统为何能运转，能分成三件事：

entry 很小 (1-3KB、一个主题) → 子主题不会被埋
hook 是 retrieval-tuned 并经过 curation → 检索词在 hook 里反复
聪明模型半自动地写 hook → 过去的我在为未来的我做浓缩

文章这一侧的形态正好相反。一篇 5-15KB、重要子主题埋在 section 正文里、description 是 SEO 摘要不是 retrieval-tuned 的、对 agent 来说太重读不进去。

用一个被命名为 「Dreaming layer」 的中间层把它们连起来。这正好是生物学上「睡眠中海马 → 大脑皮层的记忆 consolidation」的比喻。

episodic (原始文章 + memory 文件)
    ↓ Dreaming agent (定期消化)
semantic (concepts_covered_ja[] / importance / data_points / sections)
    ↓ agent 反向查找 (art-concepts-find / TF-IDF cosine)
procedural (mining / drafting / publishing)

某篇文章对应的 semantic entry 长这样：

{
  "slug": "voice-first-local-llm",
  "locale": "ja",
  "thesis_ja": "弃用 API、用自建本地 26B 把 voice-first 真正立起来",
  "importance": {
    "score": 9,
    "factors": {
      "pv_count_30d": 6,
      "avg_scroll": 67.0,
      "avg_dwell_sec": 170,
      "has_bench_data": true,
      "novelty_high": true
    }
  },
  "concepts_covered_ja": [
    "TTFB (首字节响应): 本地 vs API",
    "流式粒度 (tokens per chunk)",
    "Gemma 4 26B 模型选型理由",
    "Ditto 与 LLM 的 GPU 同居设计"
  ],
  "data_points": [
    {"name": "TTFB Local", "value": "17-25ms"},
    {"name": "Streaming granularity Local", "value": "1.0 tok/chunk"}
  ],
  "sections": [
    {"id": "3.3", "title": "Streaming granularity — 决定 voice 体验的结构性差异"}
  ]
}

关键是把 concepts_covered_ja[] 用日语 canonical 名做规范化。EN/ZH 译文也都用同一个 JP concept 串。这个规范化就是后续 dedupe 的 primitive。

3. 工具群 —— agent 调用的薄 CLI

通过 Codex CLI 驱动本地 Gemma 4 26B。tool calling 用 --enable-auto-tool-choice --tool-call-parser gemma4 暴露 OpenAI 兼容接口。每个工具大约 50-100 行 Python (仅用 stdlib)，统一 art- 前缀：

tool	用途
`art-articles-list --needs-dreaming`	DB ∪ FS 全文章 + dreaming 状态
`art-pv-count --slug X`	analytics_events → PV / scroll / dwell
`art-source-pull <slug> [--section N]`	抽取文章的某个 H2/H3 section
`art-dream-write`	把 semantic entry upsert 进 articles_index.jsonl
`art-concepts-find <pattern>`	concept → 文章的反向查找 (mining dedup primitive)
`art-ideas-check`	用 TF-IDF 评估候选 idea (本篇核心)
`art-ideas-add`	把 idea push 进 pool (内部调用 art-ideas-check)
`art-draft-append`	把 draft body 按 chunk append 到 buffer
`art-draft-commit`	把 buffer 收尾成 `articles/_drafts/<slug>.md`

Dreaming agent 用这些一次处理一篇文章。importance scoring 用如下打分表：

+2: PV >= 100 (sigmoid log-scale)
+1: avg_scroll >= 0.7 AND avg_dwell_sec >= 60
+2: bench 数字 / failure 根因 / named decision
+2: 尚未在 index 中出现的新 concept
+1: 长期有效 (非时效性公告)
-2: 与既存 flagship 重复

PV 来自自建的 analytics_events 表 (cookie-less first-party tracker)。文章平台和 analytics 在同一个 DB 里、可以直接 SQL 拿到——这正是个人开发的优势。

4. TF-IDF 去重 —— 用 tool 结构替代 agent 自我规律

mining v1 在 prompt 里写「调 art-concepts-find 做 dedup」。agent 还是绕过去 3 条 (细节: 别相信 agent 的自我规律)。

修复方案是把 dedup gate 直接嵌进 art-ideas-add。evaluate_idea() 的核心：

def evaluate_idea(title, angle, sources, ...):
    articles, ideas = load_corpus()
    # 用 canonical 词表推断候选的 concepts
    pseudo = {"concepts": _infer_concepts(title, angle, sources, articles)}

    # IDF (越罕见的 concept 权重越大)
    idf = build_idf(articles + ideas)
    new_vec = vectorize(pseudo["concepts"], idf)

    conflicts = []
    for a in articles:
        sim = cosine(new_vec, vectorize(a["concepts"], idf))
        if a["importance_score"] >= 7 and sim >= 0.25:
            conflicts.append({"kind": "flagship_concept", ...})
    for i in ideas:
        sim = cosine(new_vec, vectorize(i["concepts"], idf))
        if sim >= 0.35:
            conflicts.append({"kind": "pool_dup", ...})

    return {"allow": not conflicts, "conflicts": conflicts}

_infer_concepts() 的 3 个陷阱：

陷阱 1: substring 假阳性

ASCII 词 "check" 匹配进 "checkout"、"PRO" 匹配进 "prod_"。Stripe 的 idea 被误匹配进 "品質チェック (quality check/retry)" 或 "Blackwell Max-Q (RTX PRO 6000)" 然后被 reject 掉。

修复：ASCII 必须 word boundary，JP 仍可 substring。

def _term_matches(term: str, text: str) -> bool:
    if _ASCII_RE.match(term):
        pattern = r"(?<![A-Za-z0-9_])" + re.escape(term.lower()) + r"(?![A-Za-z0-9_])"
        return re.search(pattern, text) is not None
    return term.lower() in text  # JP substring 可

陷阱 2: 通用 JP 名词噪音

"モデル"、"システム"、"アーキテクチャ"、"サービス" 出现在很多 concept 名里，会从任意 idea title 里被拽出来。在 _NOISE_TERMS 里登记了 ~30 个通用词排除掉。

陷阱 3: 阈值调整

flagship sim 起初设为 >= 0.30，但 binary 向量 + 4 个 concept、1 个共享时 cosine 大约就停在 0.25。即使 IDF 加权，临界也就 0.27-0.30。把阈值降到 0.25，同时把 substring 匹配 (假阳性的源头) 的精度提上去，求一个平衡。

回归测试：已知 4 个 case (OpenWeight NSFW / 粒度 / CodeFormer / Stripe) 全部按预期通过。

5. 小模型特有的坑 —— Codex CLI + 26B uncensored

用 Codex CLI 驱动本地 26B (Gemma 4 26B A4B Uncensored MAX) 时，观察到 4 个 failure mode 和修复：

陷阱 4: descriptive prompt → 「I will begin by surveying...」就 exit

第一次 mining 的 agent 输出了一段「接下来我打算做什么」的 summary 然后 exit，tool 调用为零。修复：

**Critical: do not narrate, plan, or describe what you will do. Just call tools.**
The first action **must** be `shell({"command": "art-..."})` — start there.

加 imperative 与第一步明示，agent 就开始动了。

陷阱 5: 大 tool 输出触发生成 loop

art-commits-recent --since "60 days ago" --include-files 一口气返回了 ~1300 行 commit body 的 JSON。agent 紧接着持续吐了 ~25K tokens 停不下来。修复：art-commits-recent 默认仅返回 subject，body 改为 --include-body opt-in。

陷阱 6: tool_call.arguments JSON 里塞 5KB+ heredoc 触发 escape 崩溃

把 art-draft-save <slug> <<'EOF' ... 5KB body ... EOF 作为单个 shell tool_call 发送时，26B 总会在 arguments JSON 的 string escape 上崩 (Unterminated string at column 5083)。

修复：拆成 chunked append + commit。每个 chunk ~200-800 chars，4-8 次 append，最后一次 commit:

art-draft-append my-slug <<'KOTONIA_EOF'
---
title: "..."
---
KOTONIA_EOF

art-draft-append my-slug <<'KOTONIA_EOF'
## 1. First section
...
KOTONIA_EOF

# ...每个 section 重复...

art-draft-commit my-slug

每个 tool_call 的 arguments JSON 都小，escape 崩溃就不再出现。

陷阱 7: Codex exec 处理 ~4 篇后自我终止

codex exec 一次 invocation 大约处理到 ~25K tokens / ~4 篇文章时，agent 会输出一段 summary 然后 exit——看起来是某种隐式约束。Codex 的 Goals 功能 (thread_goals.objective) 应该能阻止这种行为，但 exec 路径无法设定 goal (v0.133 仅在 interactive TUI 可用)。

修复：在 dispatcher.sh 外层套 loop。在 pending == 0 之前重复重启 codex exec：

max_cycles=30
cycle=0
while (( cycle < max_cycles )); do
  pending=$(art-articles-list --needs-dreaming --count-only)
  if (( pending == 0 )); then break; fi
  run_codex dream
  cycle=$((cycle + 1))
done

这样 58 篇文章 2-3 个 cycle 就能消化完。

6. 实际跑出来的结果

最终能运转的 pipeline：

58 篇文章 → semantic index, importance 分布近似钟形 (median 5-6)，flagship 识别正确 (voice-first-local-llm 在全部 locale 都是 score 9)
70 个 memory 文件被 mining 出未踏 concept，4 个 idea 作为 survivor 入 pool
4 篇 draft 生成，每篇 ~3.6-4.6KB，经过 10-20 分钟人类 polish 即可发布
TF-IDF dedup gate 在 tool 层 blocking，agent 自我规律违规也物理 reject

repo: [github 即将公开]

7. 同构 pattern 的泛化

「原始资产 → semantic 压缩 → agent 反向查找」这条结构，并不局限于文章：

测试代码生成：把既存测试 semantic 压缩，挖掘未覆盖分支，draft 新测试
PR description：把代码变更 semantic 压缩，对照无关 PR 去重，自动生成 description
用户支持 FAQ：把历史 support ticket semantic 压缩，发现未解决主题，draft 新 FAQ
个人知识库：Scrapbox / Notion 的累积 semantic 压缩，机械发现未踏 concept

共通的设计原则：

原始资产很重。不要直接读取，必须插入 consolidation 层
canonical 词表是 semantic 层的 primitive。没有规范化，dedup 不会工作
enforcement 在 tool 层。agent 自我规律不稳定，把规则焊进结构里

把这点想清楚后，kotonia 在其他领域 (角色聊天的 persona 生成、TTS prompt 累积等) 的应用也开始显出眉目。

余谈：开发耗时

一个 session (~6h)。Dreaming layer 设计 → 5 个新 tool → Codex 提示词 → 初次 consolidation → TF-IDF dedup → chunked draft → 4 篇文章 draft 生成，全程一气呵成。

本地 26B 作为「只靠电费跑的」agent 把体力活全接走了，人只需做判断与方向修正。如果用 frontier API 跑这套，大约要花 50-100 美元。

Kotonia 是个 voice-first AI 角色聊天平台。被这条 pipeline 复活的 draft 也都在同一个博客上，感兴趣可以去看看。