ABOUT KOTONIA

Shinji Shimizu

Solo developer building Kotonia (kotonia.ai).

Profile

I build real-time voice, image, and video AI pipelines in a full-stack of Rust, Next.js, and Python.

Kotonia is the platform I am building under the vision: "Deliver an AI companion with voice, face, and hands to those who challenge alone." I focus on multilingual high-quality TTS, lipsync avatars, and emotionally continuous conversation at one-user-one-GPU economics.

RustAxum backend

Next.jsPublic UI + app

PythonAI pipeline

Focus areas

Behind the public tools and engineering notes, I keep working on fitting low-latency conversation and heavy generation pipelines into one product.

Multilingual TTS

Local operation and latency tuning for Qwen3-TTS / Irodori-TTS / VoiceVox.

Lipsync avatars

VRAM optimization for Ditto / MuseTalk and integration into conversation UX.

Audio-to-Video

Direction pipelines using LTX-2.3 from voice and script inputs.

Image control

T2I, editing, and character consistency tuning with HiDream-O1-Image.

Solo-dev operations

Rust (Axum), Next.js, Python, and local GPU operations as one stack.

What is public now

The technical record lives in the blog, and the hands-on entry points live in each Studio.

Tech blogImplementation notes on GPU operations, generation models, and video pipelines.StudioImage generation and editing with HiDream-O1-Image.Voice StudioMultilingual TTS, voice cloning, and voice design.Video StudioVideo generation with the LTX-2.3 family.