Kotonia
ログイン今すぐ始める

Kotonia Articles

Letting the AI agent on my home GPU write code, from a phone on the train — one weekend wiring the kotonia-cli daemon to a web /agent console over WebSocket

I wired the kotonia.ai web UI to a kotonia-cli daemon living on my home PC, with WebSocket carrying tasks one way and event streams the other. Tap a prompt on the phone, the RTX 6000 Blackwell at home spins up a fresh worktree and runs ReAct, and bash output lands in the browser in real time. The technical guts: bridging a sync ApprovalHandler trait to async WebSockets, a Cookie-auth-shaped device-code login, and a SessionRegistry that auto-resumes from disk.

By 7 min read
#agent#websocket#rust#indie-dev#llm#device-code-flow#remote-development
Also inJapaneseChinese

You're on a train. You suddenly remember: I should fix that bug.

Out comes the phone. You open your own site in the browser. A new page, /agent, greets you. You type into the textarea — Reproduce and fix the 500 around /studio in src/router.rs — and hit Run.

At home, an RTX 6000 Blackwell wakes up. A git worktree is created in /tmp/, the agent starts reading code, starts running bash. $ cargo check[exit 0]$ grep -rn '/studio' src/[exit 0] ... 12 lines flows in real time onto the phone screen.

Mid-stream the agent tries to fire git push --force. An "Approval Required" modal pops up. You hit Deny. The agent pivots and just cuts a PR branch instead. By the time the train pulls in you see ── done after 6 iters — ✓ ──. On the platform you check the PR URL.

That's what shipped this weekend. The code never leaves my PC, the bill is zero, and not a single byte of source went anywhere external.

Three-minute video of it actually running (the recording is on a PC browser — the UI is identical on a phone, so the operating feel transfers as-is). Throw a task at the /agent web page, the daemon at home calls bash, events stream back.


1. What I built

The kotonia.ai web UI (my personal product) and a kotonia-cli daemon running on my home PC, joined two-way over WebSocket.

[phone / desktop browser]
   /agent page
        │
        │  HTTPS / SSE (events stream back)
        ▼
[kotonia.ai backend (Rust + Axum)]
        │
        │  WS keepalive every 25s
        ▼
[home PC: kotonia-cli daemon]
        │
        │  spawn agent in a fresh git worktree
        ▼
[local LLM (Gemma 4 26B uncensored on :8899)]
        ▲
        │  ReAct loop, bash tools, web search, file fetch
        ▼
[reads/writes code in /tmp/kotonia-agent-xxx worktree]

The web side gets four new endpoints.

MethodPathRole
POST/api/agent-runtime/device-codesCLI issues the pairing code
POST/api/agent-runtime/device-codes/verifyBrowser enters the code → approve
GET/api/agent-runtime/ws/{device_id}Daemon's WebSocket inlet (Bearer auth)
POST/api/agent-runtime/devices/{id}/agent-taskSubmit a task from the web
GET/api/agent-runtime/devices/{id}/agent-streamStream events as SSE

From the user side: pair → start daemon → submit task from web → watch the result stream in. That's the entire ritual. The diff was four commits, ~1,600 lines.


2. Why I built it

I'd wanted "casual phone-side editing of my home repo" for ages. Existing options come in three flavors.

Existing optionWhat I hated
GitHub Codespaces / Cursor CloudMonthly subscription. The workspace lives in their cloud = my code goes over there.
ssh + tmux + nvimPhone keyboards make complex bindings painful. Babysitting a long LLM inference over ssh feels awful.
Claude Code / Codex CLI over sshIt's still a CLI. Phones can't drive it. Plus the monthly bill.

My local box already runs an RTX 6000 Blackwell Max-Q (96 GB) + an RTX PRO 4000 Blackwell (24 GB), with Gemma 4 26B uncensored, DeepSeek V4-Flash, Irodori-TTS, and Ditto avatars always resident. Cloudflare Tunnel already serves kotonia.ai straight from that machine.

Slot a thin "task submission lives on the web, execution lives at home" bridge into that, and suddenly:

  • Zero monthly cost (entirely local LLM)
  • Code never leaves the house
  • 24/7 GPU availability
  • One-handed phone operation

The wager paid off — those four properties showed up in a single weekend.


3. Three things that were technically interesting

3.1 Bridging a sync ApprovalHandler to an async WebSocket

kotonia-cli was originally a CLI. When the agent reached for a destructive command, it asked for [y/N] on stdin — a sync structure:

pub trait ApprovalHandler: Send {
    fn ask(&mut self, command: &str, reason: &str) -> ApprovalOutcome;
}

For the web path I need the same call to "send a modal request over WS, wait for the browser button". ask is sync (returns ApprovalOutcome immediately), yet internally it has to wait on an async WS round-trip.

The pattern I landed on: tokio::task::block_in_place + std::sync::mpsc.

impl ApprovalHandler for WsApprovalHandler {
    fn ask(&mut self, command: &str, reason: &str) -> ApprovalOutcome {
        let approval_id = Uuid::new_v4().to_string();
        let (tx, rx) = std::sync::mpsc::channel::<bool>();

        // Register BEFORE sending so the reader can't race in with the answer
        // and drop it on the floor.
        self.pending.lock().unwrap().insert(approval_id.clone(), tx);

        // Ask the WS reader to relay this to the operator.
        let _ = self.out_tx.send(DeviceMsg::ApprovalRequest {
            approval_id: approval_id.clone(),
            task_id: self.task_id.clone(),
            command: command.to_string(),
            reason: reason.to_string(),
        });

        // block_in_place lets this worker thread block on a sync recv
        // without freezing the multi-thread runtime. Requires multi_thread.
        let approved = tokio::task::block_in_place(|| rx.recv().unwrap_or(false));
        if approved { ApprovalOutcome::Approve } else { ApprovalOutcome::Deny }
    }
}

block_in_place only works on a multi-thread runtime (it deadlocks on current-thread), so the CLI's #[tokio::main(flavor = "current_thread")] had to flip to #[tokio::main] (default multi-thread) at the same time.

On the WS reader side, the ServerMsg::ApprovalResult { approval_id, approved } coming back from the browser gets routed to the matching sync sender:

ServerMsg::ApprovalResult { approval_id, approved } => {
    if let Some(tx) = pending.lock().unwrap().remove(&approval_id) {
        let _ = tx.send(approved);  // unblocks the ask() above
    }
}

I could have rewritten the trait to be async, but the requirement was that the CLI path (stdin approval) and the web path (WS approval) share the same Agent code. Keeping the sync trait and bridging only at the call site was the smaller diff — about 150 lines.

The initial pairing ritual used to be two curl calls. kotonia-cli login collapses that into one command. The shape borrows from the OAuth 2.0 Device Authorization Grant (RFC 8628), with client_id/client_secret stripped because Cookie auth on the web side already identifies the user.

$ kotonia-cli login

  Open this URL in a logged-in browser tab:
     https://kotonia.ai/agent/pair

  Then enter this code:
     ABCD-2345

Waiting for approval...... approved!
Paired as device 5b33ef2f.
Saved to /home/zhener/.kotonia/daemon.json

Run `kotonia-cli daemon` to connect.

Three endpoints carry it:

  1. POST /api/agent-runtime/device-codes (no auth) — CLI calls. Server issues device_code (CLI's secret) and user_code (human-typeable).
  2. POST /api/agent-runtime/device-codes/verify (cookie auth) — browser submits the user_code. Server attaches the current user_id and marks approved.
  3. GET /api/agent-runtime/device-codes/{device_code} (no auth) — CLI polls. On the approved → used transition the server atomically creates the agent_runtime_devices row and hands device_token back exactly once.

That last atomic transition matters: if two concurrent polls both reach approved, only one walks away with the token.

let claimed = sqlx::query(
    "UPDATE agent_device_codes SET status = 'used' \
     WHERE device_code = $1 AND status = 'approved'",
)
.bind(device_code)
.execute(&state.postgres)
.await?;

if claimed.rows_affected() == 0 {
    return Err((StatusCode::GONE, "already consumed".into()));
}
// Only past this line do we generate device_token + INSERT.

device_token lands in ~/.kotonia/daemon.json (chmod 0600). After that first login, every subsequent kotonia-cli daemon reads from there — no flags needed.

3.3 SessionRegistry — multi-turn lives on the daemon side

The /agent web tab generates one UUID on mount and carries that session_id on every task. The daemon holds a HashMap<session_id, Arc<Mutex<SessionState>>> where:

  • One session = one worktree + one Agent (the agent keeps its conversation history)
  • Consecutive tasks call the same Agent's run_turn = prior context carries forward
  • 30-minute idle GC drops both the session and the worktree
  • WS disconnects don't kill the session; in-memory state survives as long as the daemon process does
struct SessionState {
    agent: Agent,                  // keeps message history across run_turn
    workspace: AgentWorkspace,     // /tmp/kotonia-agent-xxx
    last_active: Instant,
}

struct SessionRegistry {
    inner: RwLock<HashMap<String, Arc<AsyncMutex<SessionState>>>>,
}

The extra win: HistoryStore::open(session_id) auto-attaches ~/.kotonia/sessions/{id}.jsonl. That's the same JSONL format the one-shot CLI uses. Even across daemon restarts, typing /resume <session_id> in the web UI seeds the prior conversation back into a fresh Agent:

let prior = load_session_messages(session_id).unwrap_or_default();
let resuming = !prior.is_empty();
if resuming {
    agent.seed_messages(prior);  // inject prior messages verbatim
}

A nice side effect: a "kotonia-cli direct" session at home and a "web from outside" session can share the same conversation space transparently. Continue at home with kotonia-cli --resume <id>. Continue outside with /resume <id>. Same machine, so the worktree state lines up physically too.


4. One more thing — Cloudflare Tunnel and WebSocket idle timeouts

WebSocket through Cloudflare Tunnel hit one trap: idle connections get killed after ~100 seconds. With nothing flowing, the TCP RST comes through and the daemon enters a "reconnecting in 5s" loop, every 5 seconds.

Adding an application-level keepalive — server pushes ServerMsg::Ping every 25 seconds, daemon answers with DeviceMsg::Pong — quieted it instantly.

let pinger = tokio::spawn(async move {
    let mut tick = tokio::time::interval(PING_INTERVAL);
    tick.tick().await;  // discard the immediate-fire first tick
    loop {
        tick.tick().await;
        if inbox_tx_for_ping.send(ServerMsg::Ping).is_err() {
            break;  // channel closed = disconnect detected, exit
        }
    }
});

On disconnect, the channel send fails → the ping task exits naturally. pinger.abort() in cleanup adds a second layer.


5. The audit-email bonus

If someone steals your session cookie, they can pile through /agent submitting tasks of their own. To make that legible, every task submission triggers an email to the owner's account address.

Time:    2026-06-24T03:45:00Z
Device:  laptop (5b33ef2f)
Client:  133.xxx.xxx.xxx
Task:    7f77c5ca

Prompt:
----
can you find novel in this project?
----

──────── 日本語 ────────
(same trilingual disavow path follows)

It piggybacks on the existing Mailgun integration (OTP, etc). The psychological safety it adds is bigger than the implementation cost. Naive 1-task-1-email is plenty for now.


6. UX detail — slash commands

The /agent textarea recognises a small / command set.

CommandBehaviour
/helpOpen the command-list modal
/newFresh session (fresh worktree, history cleared)
/clearWipe the local log (session continues)
/model gemma4-26b-uncensoredSwitch model + auto-/new
/resume <session_id>Switch to an existing session; daemon auto-resumes from disk

On a phone keyboard, /help discovering everything else is unexpectedly important. The "command palette safety net" that Claude Code gives me on desktop survives onto the phone.


7. Open gaps

Being honest about what's not done.

  • Interrupt of an in-flight Agent — no /abort yet. Submitting a new task while one is running stalls behind the session mutex.
  • Multi-device fanout — same device_id paired on two daemons → latest-wins, one gets bumped.
  • /resume session picker UIsession_id typing only (kotonia-cli --list-sessions on the box itself is the workaround).
  • iOS Safari backgrounded SSE — Safari drops the SSE when the tab loses focus; manual reload to resubscribe.

All next-weekend material. The win this weekend was just "it works at all".


8. systemd-ifying the daemon for 24/7

This is what makes "submit a task while you're out and about" real. Drop it onto the existing hage-* systemd --user pattern.

# ~/.config/systemd/user/kotonia-cli-daemon.service
[Unit]
Description=kotonia-cli daemon (paired to kotonia.ai)
After=network-online.target

[Service]
ExecStart=%h/.cargo/bin/kotonia-cli daemon --model kotonia-gemma4-26b
Restart=on-failure
RestartSec=5
StartLimitBurst=5
StartLimitIntervalSec=600

[Install]
WantedBy=default.target
systemctl --user daemon-reload
systemctl --user enable --now kotonia-cli-daemon
loginctl enable-linger zhener  # keep user services alive across logouts

After this the daemon survives logout and machine reboot.


9. Wrap — a dev environment that fits in a pocket

Life after shipping:

  • The moment a "I should fix that" thought hits, I can act on it from the phone
  • Commute train: one task. Convenience-store stop: one task. Dinner: one task. Bite-sized progress all day.
  • The laptop stays at home
  • ¥0 / month, zero bytes of source ever leave the house
  • My home GPU starts feeling like a coworker waiting in the wings, 24/7

If you've ever wanted "a dev environment that fits in your pocket" even once in your engineering life, it actually lands in your hands for Cloudflare Tunnel + WebSocket + about 200 lines of sync-to-async bridge.

The code is open source (github.com/zhener562/kotonia-cli). Create a kotonia.ai account, run kotonia-cli login, and the same setup is yours the same day. Tell me how it goes.

kotonia.ai / Making more of my stack actually mine, one weekend at a time.

Kotonia brings voice AI, AI chat, image generation, and team collaboration into one AI workspace.

Try Kotonia