How to use the API

A REST API that generates image, audio and video on local GPUs. One API key, OpenAI-compatible request shapes. Voice TTS returns the first audio bytes in ~100ms, images in seconds, and video via async jobs. Every response carries a timing block so you can verify speed yourself.

Issue/revoke keys and view usage limits on the API Management page:Open API Management

How to use (4 steps)

1. Sign in / register
Sign in to your account first. Registration is free if you do not have one.
2. Issue an API key
On the API Management page, enter a project name and click Issue. The plaintext key is shown only once at creation — copy it somewhere safe (only a hash is stored in the DB).
3. Your first request
Call an endpoint below with Authorization: Bearer <your key>.
4. Handle the response / operate
Image and audio return base64; video returns a job id to poll. Check the timing block for speed. Revoke unused keys, and revoke + reissue if a key leaks.

Latency (RTX PRO Blackwell)

Endpoint	Mode	Speed
Audio `/audio/generations`	sync	first audio ~85–120ms / under 1s for short text
Image `/images/generations`	sync	~4s (1024², 20 steps)
Video `/videos/generations`	async job	~60–90s (poll the job id)

Authentication

Send your API key as a Bearer token on every request. Issue keys from the API Management page (the plaintext is shown only once at creation).

Authorization: Bearer kotonia_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Image API

curl -X POST https://kotonia.ai/api/v1/images/generations \
  -H "Authorization: Bearer $KOTONIA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "a serene japanese garden at dawn, soft light",
    "size": "1024x1024",
    "steps": 20
  }'

# → { "data": [ { "b64_json": "<PNG base64>" } ], "timing": { "total_ms": 4200 } }

Optional: seed, guidance_scale, shift, ref_image (base64, edit mode). Limits: prompt ≤ 4000 chars, size 256–2048 per side (out-of-range returns 400).

Audio API

curl -X POST https://kotonia.ai/api/v1/audio/generations \
  -H "Authorization: Bearer $KOTONIA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Hello, what a lovely day.",
    "engine": "qwen3",
    "language": "en"
  }'

# → { "audio": { "b64": "<WAV base64>", "format": "wav", "sample_rate": 24000 },
#     "timing": { "first_byte_ms": 92, "total_ms": 480 } }

engine: qwen3 (default, multilingual) / irodori / voicevox. Optional: voice, speed, instruct. Limits: input ≤ 4000 chars.

Video API (async)

# 1) submit job
curl -X POST https://kotonia.ai/api/v1/videos/generations \
  -H "Authorization: Bearer $KOTONIA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "prompt": "a cat walking through neon-lit streets", "width": 768, "height": 512 }'

# → { "id": "<job_id>", "status": "queued", "poll_url": "/api/v1/videos/generations/<job_id>" }

# 2) poll
curl https://kotonia.ai/api/v1/videos/generations/<job_id> \
  -H "Authorization: Bearer $KOTONIA_API_KEY"

# → { "status": "completed", "data": [ { "url": "/api/ltx/video?path=..." } ] }

Optional: image (base64, I2V), audio (base64, A2V lip-sync), num_frames. Limits: prompt ≤ 4000 chars, width/height 256–1280 per side, num_frames ≤ 200 (out-of-range returns 400).

Response codes

`200`	Success. Image/audio return body (base64); video returns a job id.
`400`	Bad request. Missing or invalid params (e.g. missing prompt / input, bad base64).
`401`	Unauthorized. API key missing or invalid (check the Authorization header).
`429`	Too Many Requests. Free-tier daily limit exceeded. Resets at JST midnight.
`503`	Service unavailable. Generation server temporarily down/busy — retry later.