Kotonia Articles

Kotonia Articles https://kotonia.ai/en/articles/ Latest technical articles from kotonia.ai — voice AI / multilingual TTS / local-GPU stack notes. en Wed, 27 May 2026 00:01:59 GMT Why I Trained a LoRA for HiDream-O1-Image — The Unfiltered Story https://kotonia.ai/en/articles/hidream-o1-lora-why/ https://kotonia.ai/articles/_g/13c642eb-f6ef-4374-9b2e-79b0555afaa8/en Tue, 26 May 2026 23:33:39 GMT The unfiltered story behind the HiDream-O1-Image LoRA: why anime quality is O1's weak spot, how 191 hand-picked images trained a visual booster, NSFW controllability, and what my analytics revealed about conversion drivers. lora hidream imagegen Betting on the video niche the big labs walked away from — model A/B to making I2V the mainstay https://kotonia.ai/en/articles/residual-video-domain-i2v/ https://kotonia.ai/articles/_g/f0d8ddbb-0c15-4726-8017-09161c49d570/en Tue, 26 May 2026 06:33:36 GMT Is there niche demand in free creation with the guardrails off? A solo dev on one local GPU: from model A/B to making high-res I2V the mainstay. solo-dev generative-ai video-generation local-gpu i2v HiDream Raw Output Failed → Tried Dev-2604 → VRAM Math Killed It → Won with a Prompt Enhancer Instead https://kotonia.ai/en/articles/hidream-prompt-enhancer/ https://kotonia.ai/articles/_g/f5b3fcbd-52b3-459d-979a-dd428d4a9af8/en Sat, 23 May 2026 06:57:54 GMT How we discovered that HiDream-O1's raw outputs collapse on plain Japanese prompts, tried switching to Dev-2604, found the VRAM math impossible, and won by adding a Gemini Flash Lite prompt enhancer instead. Four non-obvious HiDream pitfalls documented via A/B benchmarking. hidream diffusion promptengineering gemini imagegeneration Turning a 1-Line Idea Into a 40-Second Short with a 10-Beat Local Video Pipeline https://kotonia.ai/en/articles/consent-dilemma-10beat-pipeline/ https://kotonia.ai/articles/_g/bb1b2af1-a523-451d-9e4f-d031787a87d0/en Fri, 22 May 2026 11:11:32 GMT Full pipeline: Gemma 4 31B expands a one-liner into a 10-beat script, HiDream generates images, LTX-2 I2V renders clips, and ffmpeg assembles everything — all on one local GPU in 25–30 min. python ai machinelearning gpu Building a Sarcastic AI English Tutor with Persona-as-Code and Gemini Audio Input for Pronunciation Correction https://kotonia.ai/en/articles/mesugaki-english-persona-gemini-audio/ https://kotonia.ai/articles/_g/364754ac-800f-475a-af4d-758610977a52/en Fri, 22 May 2026 11:09:13 GMT How I built a high-attitude AI English tutor using persona-as-code design, Qwen3-ASR for multilingual STT, and Gemini audio input for real pronunciation feedback — solo dev perspective. ai webdev typescript rust Running LTX-2.3 Alongside TTS on a Single 96GB GPU with a Cold-Start Architecture https://kotonia.ai/en/articles/ltx2-cold-start-vram-coexistence/ https://kotonia.ai/articles/_g/96857773-b422-476e-9b24-6096f61ee7d5/en Fri, 22 May 2026 11:07:05 GMT How to go from 86 GiB idle VRAM (instant OOM) to 0 GiB idle / 40 GiB peak by using a cold-start design for LTX-2.3 on one RTX Pro 6000 Blackwell. gpu python machinelearning ai Cutting LTX-2 22B Peak VRAM by 40% with fp8_cast — and Why optimum-quanto Was a Trap https://kotonia.ai/en/articles/ltx2-22b-fp8-cast-quantization/ https://kotonia.ai/articles/_g/8af9ff14-6784-442c-b66f-cd7985100bc5/en Fri, 22 May 2026 11:05:39 GMT How fp8_cast reduced LTX-2 22B peak VRAM from 40 GiB to 24 GiB in cold-start mode, and why optimum-quanto silently breaks the transformer. ai machinelearning gpu python HiDream Skeleton Mode: Prompt Beats OpenPose Ref — 8 Patterns Benchmarked https://kotonia.ai/en/articles/hidream-skeleton-pose-prompt/ https://kotonia.ai/articles/_g/23352312-01ee-4167-bd15-8b00192e67d3/en Fri, 22 May 2026 11:03:56 GMT Benchmarking HiDream-O1-Image skeleton mode across 8 patterns reveals 3 counterintuitive findings about openpose refs, resolution drops, and shift values. ai python machinelearning gpu Replicating a Language-Learning Comedy Short with Claude Code — Gemini as a Multimodal Sub-Agent https://kotonia.ai/en/articles/comedy-shorts-claude-gemini/ https://kotonia.ai/articles/_g/3e8a3850-bd49-4779-b545-92cb6fed61f4/en Fri, 22 May 2026 11:01:06 GMT Building a local GPU + Gemini 3.1 Pro hybrid pipeline that generates publishable comedy Shorts from a single line of text in under 60 seconds. ai python machinelearning productivity Five Years Later, I Finally Have 96GB VRAM — What It Actually Unlocks for Agent Loops https://kotonia.ai/en/articles/96gb-vram-mma-and-5-years/ https://kotonia.ai/articles/_g/ee5931ea-d81e-4dcd-a26b-5e84250754a1/en Fri, 22 May 2026 10:59:17 GMT Not a GPU unboxing. A real look at what 96GB VRAM enables for multi-model agent pipelines — and where it still hits its limits. gpu ai machinelearning python HiDream-O1-Image 3–8x Faster: Benchmarking Steps, CFG, and Resolution https://kotonia.ai/en/articles/hidream-quality-speed-bench/ https://kotonia.ai/articles/_g/b2efb8c1-a7e4-40b4-ac67-a698c1c20b31/en Fri, 22 May 2026 10:46:34 GMT Real-world timing benchmarks for HiDream-O1-Image Full — tuning steps, guidance scale, and resolution to speed up iteration without killing quality. ai machinelearning gpu python