Multilingual TTS
Local operation and latency tuning for Qwen3-TTS / Irodori-TTS / VoiceVox.
ABOUT KOTONIA
Solo developer building Kotonia (kotonia.ai).
Profile
I build real-time voice, image, and video AI pipelines in a full-stack of Rust, Next.js, and Python.
Kotonia is the platform I am building under the vision: "Deliver an AI companion with voice, face, and hands to those who challenge alone." I focus on multilingual high-quality TTS, lipsync avatars, and emotionally continuous conversation at one-user-one-GPU economics.
Focus areas
Behind the public tools and engineering notes, I keep working on fitting low-latency conversation and heavy generation pipelines into one product.
Local operation and latency tuning for Qwen3-TTS / Irodori-TTS / VoiceVox.
VRAM optimization for Ditto / MuseTalk and integration into conversation UX.
Direction pipelines using LTX-2.3 from voice and script inputs.
T2I, editing, and character consistency tuning with HiDream-O1-Image.
Rust (Axum), Next.js, Python, and local GPU operations as one stack.
What is public now
The technical record lives in the blog, and the hands-on entry points live in each Studio.