Build with audio-to-text
A versioned, token-authenticated REST API to submit audio, transcribe it with state-of-the-art models, and download the result in multiple formats. One credit per job. No SDK required.
POST /api/v1/transcriptions with multipart/form-data. Receive a job_id immediately.GET /api/v1/transcriptions/{job_id} every 2 s. Status transitions: pending → processing → doneGET /api/v1/transcriptions/{job_id}/export?format=txt|srt|lrc|json. Or read structured segments as JSON directly.| Method | Path | Purpose |
|---|---|---|
| POST | /api/v1/transcriptions | Submit audio file |
| GET | /api/v1/transcriptions/{id} | Poll status + metadata |
| GET | /api/v1/transcriptions/{id}/segments | Timestamped segments (JSON) |
| GET | /api/v1/transcriptions/{id}/export | Download TXT / SRT / LRC / JSON |
| GET | /api/v1/transcriptions | List your jobs (paginated) |
| DEL | /api/v1/transcriptions/{id} | Cancel or delete a job |
| GET | /api/v1/me | Your profile + credit balance |
Quickstart
The full lifecycle in three shell commands.
$ TFT_TOKEN="tft_pat_…" # 1 — Submit $ curl -s -X POST "https://app.textfromtrack.com/api/v1/transcriptions" \ -H "Authorization: Bearer $TFT_TOKEN" \ -F "file=@track.mp3" | jq .job_id "f0c7e2d4-1b3a-…" # 2 — Poll (repeat until status == "done") $ curl -s -H "Authorization: Bearer $TFT_TOKEN" \ "https://app.textfromtrack.com/api/v1/transcriptions/f0c7e2d4-1b3a-…" | jq .status "done" # 3 — Download LRC $ curl -s -H "Authorization: Bearer $TFT_TOKEN" \ "https://app.textfromtrack.com/api/v1/transcriptions/f0c7e2d4-1b3a-…/export?format=lrc" \ -o track.lrc
Authentication
All /api/v1/* endpoints require a Personal Access Token (PAT).
Send it as a Bearer token on every request:
Authorization: Bearer tft_pat_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Security model: a PAT cannot manage other PATs and cannot log into the web UI. A stolen web session cannot call the API. This separation is intentional.
Managing tokens
Create, list, and revoke tokens from the web UI via Account → API tokens. These three management endpoints use web JWT auth, not a PAT:
| Method | Path | Notes |
|---|---|---|
| POST | /api/auth/tokens | Create — plaintext returned once |
| GET | /api/auth/tokens | List metadata (no plaintext ever) |
| DEL | /api/auth/tokens/{id} | Revoke — idempotent |
Tokens are prefixed tft_pat_ so secret-scanning tools (GitHub, gitleaks) catch accidental leaks.
Revoke from your account at any time.
API Reference
Base URL: https://app.textfromtrack.com/api/v1
Upload audio with multipart/form-data. Returns immediately with a job_id.
Request body
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
file | File | yes | — | .mp3, .wav, .flac · max 100 MB · max 10 min — see note below |
pinyin | bool | no | false | See below — pinyin annotation for Chinese text |
vintage | bool | no | false | See below — high-quality mode for music / old recordings |
File size & automatic transcoding
The server accepts files up to 100 MB. The cloud transcription API (OpenAI Whisper) has its own 25 MB limit — but this is handled transparently: if your file exceeds 25 MB, the server automatically re-encodes it to MP3 mono 16 kHz 128 kbps with ffmpeg before sending it upstream. You never see this step.
The practical ceiling is the 10-minute duration limit, not the file size. A 10-minute audio file at 128 kbps mono is ≈ 9.4 MB — well under the 25 MB cloud threshold — so a re-encoded file can never overflow the upstream limit.
Large HD files (FLAC/WAV/AIFF): a lossless 10-minute stereo FLAC can exceed 100 MB.
In that case the upload is rejected with 413 Entity Too Large.
Re-export at a lower bitrate or convert to MP3 before uploading.
Option: pinyin=true
When your track contains Mandarin Chinese lyrics, enabling pinyin post-processes the transcript
to annotate each Chinese character with its romanised pronunciation (pīnyīn).
Characters are wrapped in a bracketed notation — e.g. 你好[nǐ hǎo] — so the result is readable
by both Chinese readers and non-Chinese tools that only handle Latin characters.
This option has no effect on non-Chinese audio. It adds ~1 s to the response time and does not cost extra credits.
Option: vintage=true (Music / high-quality mode)
Enables a heavier, two-stage local pipeline designed for music tracks and difficult recordings (heavy reverb, background instruments, old recordings, low bitrate):
Trade-offs: vintage mode takes significantly longer (30 s–5 min depending on duration) and cannot be used with pinyin=true simultaneously. It consumes the same 1 credit as a standard job — no extra charge.
The web UI also uses vintage mode automatically as a retry fallback when the standard model returns a poor-quality transcript (repetition loops, silence, etc.). When calling the API directly, you choose upfront.
Response — 201
{
"job_id": "f0c7e2d4-1b3a-…",
"status": "pending",
"credits_quoted": 1,
"duration_seconds": 187.3,
"vintage_mode": false
}
Poll every ~2 s. Status flow: pending → processing → done or error
When attempts is 2 and vocal_separation_used is true, the pipeline automatically retried with Demucs vocal separation. The response is the winning transcript — you don't need to handle the retry yourself.
Response — 200 (done)
{
"job_id": "f0c7e2d4-1b3a-…",
"status": "done",
"filename": "track.mp3",
"language": "en",
"duration_seconds": 187.3,
"segment_count": 42,
"credits_charged": 1,
"model": "whisper-1",
"degraded": false,
"vocal_separation_used": false,
"attempts": 1,
"runtime_seconds": 12.4,
"created_at": "2026-05-08T10:00:00Z",
"completed_at": "2026-05-08T10:00:13Z"
}
Returns the transcript as a structured JSON array — same data as the JSON export, without downloading a file.
{
"job_id": "…",
"language": "fr",
"duration_seconds": 187.3,
"segments": [
{ "index": 0, "start": 0.0, "end": 4.2, "text": "Première ligne…" },
{ "index": 1, "start": 4.2, "end": 8.5, "text": "Deuxième ligne…" }
]
}
Streams the file. format ∈ {txt, srt, lrc, json}. Content-Type is set per format.
Retention: exports are deleted after 7 days. Requesting an expired export returns 410 Gone.
Query params: page (default 1), per_page (default 50, max 200), status (filter: pending, done, error).
{
"page": 1,
"per_page": 50,
"total": 173,
"transcriptions": [ /* slim shape, no attempts_log */ ]
}
Marks the job as cancelled and removes its export directory.
Credits are not refunded if the job is already done — you've consumed compute. Cancelling a pending job before processing starts does not charge credits.
{
"user_id": "…",
"email": "you@example.com",
"credit_balance": 47,
"top_up_url": "https://app.textfromtrack.com/",
"scopes_granted": ["transcriptions:read", "transcriptions:write"]
}
When the balance is low, redirect users to top_up_url. The API never accepts payment directly.
Code examples
Minimal example — only httpx required (pip install httpx). A full reference client with error handling and a CLI is available below.
import httpx, time, os TOKEN = os.environ["TFT_TOKEN"] BASE = "https://app.textfromtrack.com/api/v1" HDR = {"Authorization": f"Bearer {TOKEN}"} # 1 — Submit with open("track.mp3", "rb") as f: r = httpx.post(f"{BASE}/transcriptions", headers=HDR, files={"file": f}) r.raise_for_status() job_id = r.json()["job_id"] print(f"Submitted — job_id={job_id}") # 2 — Poll until done while True: j = httpx.get(f"{BASE}/transcriptions/{job_id}", headers=HDR).json() print(f" status: {j['status']}") if j["status"] == "done": break if j["status"] == "error": raise RuntimeError(j["error"]) time.sleep(2) print(f"Done — {j['language']} · {j['segment_count']} segments") # 3 — Download LRC lrc = httpx.get( f"{BASE}/transcriptions/{job_id}/export", headers=HDR, params={"format": "lrc"} ) open("track.lrc", "wb").write(lrc.content) print("track.lrc written ✓")
Full reference client (CLI)
Includes error handling, polling with timeout, retry telemetry logging, and a command-line interface:
# Install deps $ pip install httpx # Download the reference client $ curl -O https://raw.githubusercontent.com/r45635/audio-lyrics-extractor/main/example/api_client.py # Run it $ TFT_TOKEN=tft_pat_… python api_client.py track.mp3 --format lrc $ TFT_TOKEN=tft_pat_… python api_client.py track.flac --format srt --vintage
Works in Node.js 18+ and modern browsers (with a CORS proxy for browser usage).
const TOKEN = process.env.TFT_TOKEN; const BASE = "https://app.textfromtrack.com/api/v1"; const HDR = { Authorization: `Bearer ${TOKEN}` }; const sleep = ms => new Promise(r => setTimeout(r, ms)); async function getLyrics(filePath, format = "lrc") { // 1 — Submit const fd = new FormData(); fd.append("file", fs.createReadStream(filePath)); // Node.js const sub = await fetch(`${BASE}/transcriptions`, { method: "POST", headers: HDR, body: fd, }).then(r => r.json()); if (!sub.job_id) throw sub; console.log(`Submitted — ${sub.job_id}`); // 2 — Poll let job; do { await sleep(2000); job = await fetch(`${BASE}/transcriptions/${sub.job_id}`, { headers: HDR }) .then(r => r.json()); console.log(` → ${job.status}`); if (job.status === "error") throw new Error(job.error); } while (job.status !== "done"); // 3 — Download export const res = await fetch(`${BASE}/transcriptions/${sub.job_id}/export?format=${format}`, { headers: HDR }); return res.text(); } // Usage getLyrics("track.mp3", "lrc").then(lrc => console.log(lrc));
$ TFT_TOKEN="tft_pat_…" $ BASE="https://app.textfromtrack.com/api/v1" # Check credit balance $ curl -s -H "Authorization: Bearer $TFT_TOKEN" "$BASE/me" | jq .credit_balance # Submit (with pinyin for a Chinese track) $ JOB=$(curl -s -X POST "$BASE/transcriptions" \ -H "Authorization: Bearer $TFT_TOKEN" \ -F "file=@song.mp3" \ -F "pinyin=true" | jq -r .job_id) $ echo "job_id: $JOB" # Poll until done $ while true; do STATUS=$(curl -s -H "Authorization: Bearer $TFT_TOKEN" \ "$BASE/transcriptions/$JOB" | jq -r .status) echo " $STATUS" [ "$STATUS" = "done" ] && break [ "$STATUS" = "error" ] && exit 1 sleep 2 done # Download all 4 formats $ for FMT in txt srt lrc json; do curl -s -H "Authorization: Bearer $TFT_TOKEN" \ "$BASE/transcriptions/$JOB/export?format=$FMT" \ -o "song.$FMT" echo "song.$FMT ✓" done
Error catalogue
All errors share a single envelope:
{
"error": {
"code": "insufficient_credits",
"message": "Your account has 0 credits. This job needs 2.",
"details": { "required": 2, "available": 0, "top_up_url": "…" }
}
}
| HTTP | Code | When |
|---|---|---|
| 400 | validation_error | Invalid field, bad body, wrong file extension |
| 400 | track_too_long | Audio exceeds 10 minutes |
| 401 | unauthorized | Missing, unknown, expired, or revoked PAT |
| 402 | insufficient_credits | Balance below credits_quoted. Check details.top_up_url |
| 403 | forbidden | PAT lacks required scope, or account suspended |
| 404 | not_found | Job doesn't exist or belongs to another user (deliberately ambiguous) |
| 409 | validation_error | Operation invalid for current job status (e.g. export a pending job) |
| 410 | gone | Export removed by retention policy (after 7 days) |
| 413 | validation_error | Upload exceeds 100 MB |
| 429 | rate_limited | Too many concurrent jobs (2 per user, 3 per IP) |
| 500 | internal_error | Unexpected server-side failure |
Limits & versioning
| Limit | Value |
|---|---|
| Max file size | 100 MB · files > 25 MB are re-encoded server-side before cloud processing (see below) |
| Max audio duration | 10 minutes (600 s) |
| Concurrent jobs per user | 2 |
| Concurrent jobs per IP | 3 |
| Export retention | 7 days — after that 410 Gone |
Versioning policy
The /api/v1/* namespace is contract-stable. Breaking changes move to /api/v2/. Additive changes (new fields, optional params) can land in v1 without notice.
A v1 endpoint will be supported for at least 6 months after v2 is announced.
Credits
1 credit = 1 transcription, regardless of duration (within limits). Check your balance with GET /api/v1/me. The API never accepts payment — point users at top_up_url.