Kotonia
ログイン今すぐ始める

Kotonia Articles

Implementing Claude Code's Memory Model as a Dreaming Layer on 58 Articles

I broke down why Claude Code's memory works, and applied it to my own 58-article tech blog. Implementation notes on the full path raw articles → semantic index → TF-IDF dedup → chunked draft, on local Gemma 4 26B driven through Codex CLI.

By 6 min read
#llm#agent#tfidf#codex

I built a pipeline in a single session that consolidates the 58 tech-blog articles of my service Kotonia (ja/en/zh) into a semantic index, then uses that index to detect duplicates for new article mining. Raw articles → semantic index → TF-IDF dedup → chunked draft generation — full path running on local Gemma 4 26B driven by Codex CLI. Design and implementation notes follow.

The motivation and "how solo developer accumulated assets compound" framing is in the companion piece: The Day a Solo Developer's Accumulated Assets Finally Started to Compound

This piece keeps the technical notes.


1. The Problem — When Title-Only Dedup Broke

Mining v1 produced a draft and I (the user) noticed "this overlaps with an existing article." The overlap target was voice-first-local-llm (importance=9 flagship).

  • New draft thesis: "tokens per chunk is a hidden voice-chat latency driver"
  • Existing article §3.3: "★ Streaming granularity — the structural difference that decides voice experience"

Same numbers (Local Gemma 1.0 tok/chunk, Haiku 10-16, Gemini 8-24). A perfect duplicate.

The mining agent had called art-done-list (title + description) for the dedup check. But the existing article's title is "Cutting short-form LLM latency from 600ms to 22ms," with TTFB as the headline sales pitch; §3.3 streaming granularity is buried in an H2 subsection. At title level, nothing overlapped, so the check came back clean.

That's the starting point for this article.

2. The Design — Three Layers: episodic ↔ semantic ↔ procedural

Breaking down why Claude Code's memory system works:

  1. Entries are small (1-3KB, one topic each) → subtopics don't get buried
  2. Hooks are retrieval-tuned and curated → search terms re-appear in the hook
  3. A smart model writes hooks semi-autonomously → past me distills for future me

Articles have the opposite shape. Each 5-15KB, important subtopics buried in subsection bodies, descriptions are SEO summaries rather than retrieval-tuned, too heavy for an agent.

I bridged them with an intermediate layer named the Dreaming layer. Literally the biological "memory consolidation during sleep — hippocampus to cortex" metaphor.

episodic (raw articles + memory files)
    ↓ Dreaming agent (periodic digestion)
semantic (concepts_covered_ja[] / importance / data_points / sections)
    ↓ agent reverse-lookup (art-concepts-find / TF-IDF cosine)
procedural (mining / drafting / publishing)

A semantic entry for an article looks like:

{
  "slug": "voice-first-local-llm",
  "locale": "ja",
  "thesis_ja": "Ditching API, building voice-first with self-hosted local 26B",
  "importance": {
    "score": 9,
    "factors": {
      "pv_count_30d": 6,
      "avg_scroll": 67.0,
      "avg_dwell_sec": 170,
      "has_bench_data": true,
      "novelty_high": true
    }
  },
  "concepts_covered_ja": [
    "TTFB (time-to-first-byte): local vs API",
    "Streaming granularity (tokens per chunk)",
    "Gemma 4 26B model selection rationale",
    "Ditto + LLM co-residency GPU design"
  ],
  "data_points": [
    {"name": "TTFB Local", "value": "17-25ms"},
    {"name": "Streaming granularity Local", "value": "1.0 tok/chunk"}
  ],
  "sections": [
    {"id": "3.3", "title": "Streaming granularity — the structural difference that decides voice experience"}
  ]
}

The key point: concepts_covered_ja[] must be normalized to Japanese canonical names. Translated EN/ZH articles use the same JP concept strings. That single normalization becomes the dedup primitive downstream.

3. Tools — Thin CLIs the Agent Calls

Codex CLI drives Gemma 4 26B locally. Tool calling via --enable-auto-tool-choice --tool-call-parser gemma4 gives an OpenAI-compatible surface. Each tool is ~50-100 lines of Python (stdlib only), art- prefix:

toolrole
art-articles-list --needs-dreamingDB ∪ FS articles + dreaming state
art-pv-count --slug Xanalytics_events → PV / scroll / dwell
art-source-pull <slug> [--section N]pull just one H2/H3 section of an article
art-dream-writeupsert a semantic entry into articles_index.jsonl
art-concepts-find <pattern>concept → article reverse-lookup (the mining dedup primitive)
art-ideas-checkevaluate a candidate idea via TF-IDF (the core of this article)
art-ideas-addpush an idea to the pool (calls art-ideas-check internally)
art-draft-appendappend a chunk of draft body to a buffer
art-draft-commitfinalize buffer → articles/_drafts/<slug>.md

The Dreaming agent semantically encodes one article at a time using these. Importance scoring uses this rubric:

+2: PV >= 100 (sigmoid log-scale)
+1: avg_scroll >= 0.7 AND avg_dwell_sec >= 60
+2: bench numbers / failure root cause / named decision
+2: novel concept not yet in index
+1: evergreen value (not time-sensitive)
-2: redundant with an already-indexed flagship

PV comes from a homegrown analytics_events table (cookie-less first-party tracker). The fact that the article platform and analytics co-reside in one DB you can hit directly is a solo-dev win.

4. TF-IDF Dedup — Substituting Tool Structure for Agent Self-Discipline

At mining v1 the prompt instructed the agent to call art-concepts-find for dedup. The agent slipped through three duplicates anyway (details: Don't Trust an Agent's Self-Discipline).

The fix: embed a dedup gate directly inside art-ideas-add. The guts of evaluate_idea():

def evaluate_idea(title, angle, sources, ...):
    articles, ideas = load_corpus()
    # infer the candidate's concepts from the canonical vocab
    pseudo = {"concepts": _infer_concepts(title, angle, sources, articles)}

    # IDF (rare concepts weighted more)
    idf = build_idf(articles + ideas)
    new_vec = vectorize(pseudo["concepts"], idf)

    conflicts = []
    for a in articles:
        sim = cosine(new_vec, vectorize(a["concepts"], idf))
        if a["importance_score"] >= 7 and sim >= 0.25:
            conflicts.append({"kind": "flagship_concept", ...})
    for i in ideas:
        sim = cosine(new_vec, vectorize(i["concepts"], idf))
        if sim >= 0.35:
            conflicts.append({"kind": "pool_dup", ...})

    return {"allow": not conflicts, "conflicts": conflicts}

Three traps along the way in _infer_concepts():

Trap 1: substring-match false positives

The ASCII term "check" matches inside "checkout"; "PRO" inside "prod_". The Stripe idea was falsely matched into "品質チェック (quality check/retry)" or "Blackwell Max-Q (RTX PRO 6000)" and rejected.

Fix: ASCII terms require word boundary; JP terms can stay substring.

def _term_matches(term: str, text: str) -> bool:
    if _ASCII_RE.match(term):
        pattern = r"(?<![A-Za-z0-9_])" + re.escape(term.lower()) + r"(?![A-Za-z0-9_])"
        return re.search(pattern, text) is not None
    return term.lower() in text  # JP substring is fine

Trap 2: generic JP noun noise

"モデル" "システム" "アーキテクチャ" "サービス" appear in many concept names; they get picked up from arbitrary idea titles. Registered ~30 generic words in _NOISE_TERMS.

Trap 3: threshold tuning

Started flagship sim >= 0.30, but a binary vector with 4 concepts and 1 shared concept maxes around cosine 0.25. Even with IDF weighting, 0.27-0.30 was the borderline. Dropped to 0.25 and instead tightened the precision of the substring matcher (the false-positive engine).

Regression test: 4/4 across the known 4 cases (OpenWeight NSFW / streaming-granularity / CodeFormer / Stripe).

5. Small-Model Specific Traps — Codex CLI + 26B Uncensored

Driving local 26B (Gemma 4 26B A4B Uncensored MAX) through Codex CLI, I observed 4 failure modes and their fixes:

Trap 4: descriptive prompt → "I will begin by surveying..." then exit

The first mining run had the agent summarize "what I'll do next" and exit with zero tool calls. Fix:

**Critical: do not narrate, plan, or describe what you will do. Just call tools.**
The first action **must** be `shell({"command": "art-..."})` — start there.

Imperative + first-action explicit, and it starts moving.

Trap 5: huge tool output triggers a generation loop

art-commits-recent --since "60 days ago" --include-files returned ~1300 lines of JSON including bodies; the agent then emitted ~25K tokens of output continuously, never stopping. Fix: art-commits-recent defaults to subject-only; body via --include-body opt-in.

Trap 6: 5KB+ heredoc in tool_call.arguments JSON breaks the escape

Sending art-draft-save <slug> <<'EOF' ... 5KB body ... EOF as a single shell tool_call reliably breaks 26B's string escaping inside the arguments JSON (Unterminated string at column 5083).

Fix: split into chunked append + commit. ~200-800 chars per chunk, 4-8 appends, final commit:

art-draft-append my-slug <<'KOTONIA_EOF'
---
title: "..."
---
KOTONIA_EOF

art-draft-append my-slug <<'KOTONIA_EOF'
## 1. First section
...
KOTONIA_EOF

# ...repeat per section...

art-draft-commit my-slug

Each tool_call's arguments JSON stays small, escape break vanishes.

Trap 7: Codex exec self-terminates after ~4 articles

There seems to be an implicit constraint where one codex exec invocation finishes with a summary message after ~25K tokens / ~4 articles. Codex's Goals feature (thread_goals.objective) could prevent that, but you can't set it via exec (only the interactive TUI as of v0.133).

Fix: wrap dispatcher.sh in an external loop. Restart codex exec until pending == 0.

max_cycles=30
cycle=0
while (( cycle < max_cycles )); do
  pending=$(art-articles-list --needs-dreaming --count-only)
  if (( pending == 0 )); then break; fi
  run_codex dream
  cycle=$((cycle + 1))
done

That gets 58 articles digested in 2-3 cycles.

6. What Landed

The working pipeline:

  • 58 articles → semantic index, importance bell-shaped (median 5-6), flagship recognition correct (voice-first-local-llm at score 9 across all locales)
  • 70 memory files mined for unexplored concepts, 4 ideas land in the pool as survivors
  • 4 drafts generated, ~3.6-4.6KB each, publish-ready after 10-20 minutes of human polish
  • TF-IDF dedup gate at the tool layer blocks any agent self-discipline violation

Repo: [github coming soon]

7. Generalization

The structure — raw assets → semantic compression → agent reverse-lookup — generalizes beyond articles:

  • Test generation: semantically compress existing tests, mine uncovered branches, draft new tests
  • PR descriptions: semantically compress the codebase delta, dedupe against unrelated PRs, draft a description
  • Support FAQs: semantically compress past support tickets, surface uncovered topics, draft new FAQs
  • Personal knowledge base: Scrapbox / Notion accumulation → semantic compression → mechanically discover unexplored concepts

Common design principles:

  1. Raw assets are heavy. Don't load them directly — insert a consolidation layer.
  2. The canonical vocabulary is the semantic-layer primitive. Without normalization, dedup doesn't work.
  3. Enforcement belongs at the tool layer. Agent self-discipline is unstable; bake the rule into the structure.

Knowing this opened up application to other domains in kotonia (persona generation in character chat, TTS prompt accumulation, etc.).

Aside: Development Time

One session (~6h). Dreaming layer design → 5 new tools → Codex prompts → first-time consolidation → TF-IDF dedup → chunked draft → 4 article drafts generated, all in one stretch.

Local 26B as the "runs on electricity only" agent absorbed the grinding labor; the human only had to make judgment calls and steering corrections. Doing this on frontier APIs would have cost $50-100.

Kotonia is a voice-first AI character chat platform. The drafts revived by this pipeline live on the same blog if you're curious.

Kotonia brings voice AI, AI chat, image generation, and team collaboration into one AI workspace.

Try Kotonia