TextFromTrack API v1

Build with audio-to-text

A versioned, token-authenticated REST API to submit audio, transcribe it with state-of-the-art models, and download the result in multiple formats. One credit per job. No SDK required.

1

Create a Personal Access Token

In the app: click your avatar → Account → API tokens section → Generate. Save the token immediately — it's shown once.

2

Submit an audio file

POST /api/v1/transcriptions with multipart/form-data. Receive a job_id immediately.

3

Poll until done

GET /api/v1/transcriptions/{job_id} every 2 s. Status transitions: pending → processing → done

4

Download your transcript

GET /api/v1/transcriptions/{job_id}/export?format=txt|srt|lrc|json. Or read structured segments as JSON directly.

Method	Path	Purpose
POST	`/api/v1/transcriptions`	Submit audio file
GET	`/api/v1/transcriptions/{id}`	Poll status + metadata
GET	`/api/v1/transcriptions/{id}/segments`	Timestamped segments (JSON)
GET	`/api/v1/transcriptions/{id}/export`	Download TXT / SRT / LRC / JSON
GET	`/api/v1/transcriptions`	List your jobs (paginated)
DEL	`/api/v1/transcriptions/{id}`	Cancel or delete a job
GET	`/api/v1/me`	Your profile + credit balance

Quickstart

The full lifecycle in three shell commands.

cURL

$ TFT_TOKEN="tft_pat_…"

# 1 — Submit
$ curl -s -X POST "https://app.textfromtrack.com/api/v1/transcriptions" \
    -H "Authorization: Bearer $TFT_TOKEN" \
    -F "file=@track.mp3" | jq .job_id
"f0c7e2d4-1b3a-…"

# 2 — Poll (repeat until status == "done")
$ curl -s -H "Authorization: Bearer $TFT_TOKEN" \
    "https://app.textfromtrack.com/api/v1/transcriptions/f0c7e2d4-1b3a-…" | jq .status
"done"

# 3 — Download LRC
$ curl -s -H "Authorization: Bearer $TFT_TOKEN" \
    "https://app.textfromtrack.com/api/v1/transcriptions/f0c7e2d4-1b3a-…/export?format=lrc" \
    -o track.lrc

Authentication

All /api/v1/* endpoints require a Personal Access Token (PAT). Send it as a Bearer token on every request:

HTTP Header

Authorization: Bearer tft_pat_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Security model: a PAT cannot manage other PATs and cannot log into the web UI. A stolen web session cannot call the API. This separation is intentional.

Managing tokens

Create, list, and revoke tokens from the web UI via Account → API tokens. These three management endpoints use web JWT auth, not a PAT:

Method	Path	Notes
POST	`/api/auth/tokens`	Create — plaintext returned once
GET	`/api/auth/tokens`	List metadata (no plaintext ever)
DEL	`/api/auth/tokens/{id}`	Revoke — idempotent

Tokens are prefixed tft_pat_ so secret-scanning tools (GitHub, gitleaks) catch accidental leaks. Revoke from your account at any time.

API Reference

Base URL: https://app.textfromtrack.com/api/v1

POST /transcriptions Submit an audio file ▶

Upload audio with multipart/form-data. Returns immediately with a job_id.

Request body

Field	Type	Required	Default	Description
`file`	File	yes	—	.mp3, .wav, .flac · max 100 MB · max 10 min — see note below
`pinyin`	bool	no	`false`	See below — pinyin annotation for Chinese text
`vintage`	bool	no	`false`	See below — high-quality mode for music / old recordings

File size & automatic transcoding

The server accepts files up to 100 MB. The cloud transcription API (OpenAI Whisper) has its own 25 MB limit — but this is handled transparently: if your file exceeds 25 MB, the server automatically re-encodes it to MP3 mono 16 kHz 128 kbps with ffmpeg before sending it upstream. You never see this step.

The practical ceiling is the 10-minute duration limit, not the file size. A 10-minute audio file at 128 kbps mono is ≈ 9.4 MB — well under the 25 MB cloud threshold — so a re-encoded file can never overflow the upstream limit.

Large HD files (FLAC/WAV/AIFF): a lossless 10-minute stereo FLAC can exceed 100 MB. In that case the upload is rejected with 413 Entity Too Large. Re-export at a lower bitrate or convert to MP3 before uploading.

Option: `pinyin=true`

When your track contains Mandarin Chinese lyrics, enabling pinyin post-processes the transcript to annotate each Chinese character with its romanised pronunciation (pīnyīn). Characters are wrapped in a bracketed notation — e.g. 你好[nǐ hǎo] — so the result is readable by both Chinese readers and non-Chinese tools that only handle Latin characters.

This option has no effect on non-Chinese audio. It adds ~1 s to the response time and does not cost extra credits.

Option: `vintage=true` (Music / high-quality mode)

Enables a heavier, two-stage local pipeline designed for music tracks and difficult recordings (heavy reverb, background instruments, old recordings, low bitrate):

1

Demucs vocal separation

The track is split into stems using Meta's Demucs htdemucs model. The vocals stem is extracted and passed to the transcription step. Instrumental bleed is removed, which dramatically improves accuracy on music.

2

Whisper large-v3 (local)

The cleaned vocals are transcribed with OpenAI Whisper large-v3 running locally (not the cloud API). This model has better recall on sung, mumbled, or non-standard speech than the default faster model.

Trade-offs: vintage mode takes significantly longer (30 s–5 min depending on duration) and cannot be used with pinyin=true simultaneously. It consumes the same 1 credit as a standard job — no extra charge.

The web UI also uses vintage mode automatically as a retry fallback when the standard model returns a poor-quality transcript (repetition loops, silence, etc.). When calling the API directly, you choose upfront.

Response — 201

{
  "job_id":            "f0c7e2d4-1b3a-…",
  "status":            "pending",
  "credits_quoted":    1,
  "duration_seconds":  187.3,
  "vintage_mode":      false
}

GET /transcriptions/{id} Poll status ▶

Poll every ~2 s. Status flow: pending → processing → done or error

When attempts is 2 and vocal_separation_used is true, the pipeline automatically retried with Demucs vocal separation. The response is the winning transcript — you don't need to handle the retry yourself.

Response — 200 (done)

{
  "job_id":                  "f0c7e2d4-1b3a-…",
  "status":                  "done",
  "filename":                "track.mp3",
  "language":                "en",
  "duration_seconds":        187.3,
  "segment_count":           42,
  "credits_charged":         1,
  "model":                   "whisper-1",
  "degraded":                false,
  "vocal_separation_used":   false,
  "attempts":                1,
  "runtime_seconds":         12.4,
  "created_at":              "2026-05-08T10:00:00Z",
  "completed_at":            "2026-05-08T10:00:13Z"
}

GET /transcriptions/{id}/segments Structured timestamped transcript ▶

Returns the transcript as a structured JSON array — same data as the JSON export, without downloading a file.

{
  "job_id":           "…",
  "language":         "fr",
  "duration_seconds": 187.3,
  "segments": [
    { "index": 0, "start": 0.0,  "end": 4.2,  "text": "Première ligne…" },
    { "index": 1, "start": 4.2,  "end": 8.5,  "text": "Deuxième ligne…" }
  ]
}

GET /transcriptions/{id}/export?format=… Download a file export ▶

Streams the file. format ∈ {txt, srt, lrc, json}. Content-Type is set per format.

Retention: exports are deleted after 7 days. Requesting an expired export returns 410 Gone.

.txt

Plain text, one line per segment

.srt

SubRip subtitles — for video players

.lrc

Synced lyrics — for music players & karaoke

.json

Structured segments array with timestamps

GET /transcriptions List your jobs (paginated) ▶

Query params: page (default 1), per_page (default 50, max 200), status (filter: pending, done, error).

{
  "page":             1,
  "per_page":         50,
  "total":            173,
  "transcriptions":   [ /* slim shape, no attempts_log */ ]
}

DEL /transcriptions/{id} Cancel or delete a job ▶

Marks the job as cancelled and removes its export directory.

Credits are not refunded if the job is already done — you've consumed compute. Cancelling a pending job before processing starts does not charge credits.

GET /me Profile + credit balance ▶

{
  "user_id":          "…",
  "email":            "you@example.com",
  "credit_balance":   47,
  "top_up_url":       "https://app.textfromtrack.com/",
  "scopes_granted":   ["transcriptions:read", "transcriptions:write"]
}

When the balance is low, redirect users to top_up_url. The API never accepts payment directly.

Code examples

Minimal example — only httpx required (pip install httpx). A full reference client with error handling and a CLI is available below.

Python (httpx)

import httpx, time, os

TOKEN = os.environ["TFT_TOKEN"]
BASE  = "https://app.textfromtrack.com/api/v1"
HDR   = {"Authorization": f"Bearer {TOKEN}"}

# 1 — Submit
with open("track.mp3", "rb") as f:
    r = httpx.post(f"{BASE}/transcriptions", headers=HDR, files={"file": f})
r.raise_for_status()
job_id = r.json()["job_id"]
print(f"Submitted — job_id={job_id}")

# 2 — Poll until done
while True:
    j = httpx.get(f"{BASE}/transcriptions/{job_id}", headers=HDR).json()
    print(f"  status: {j['status']}")
    if j["status"] == "done":   break
    if j["status"] == "error":  raise RuntimeError(j["error"])
    time.sleep(2)

print(f"Done — {j['language']} · {j['segment_count']} segments")

# 3 — Download LRC
lrc = httpx.get(
    f"{BASE}/transcriptions/{job_id}/export",
    headers=HDR, params={"format": "lrc"}
)
open("track.lrc", "wb").write(lrc.content)
print("track.lrc written ✓")

Full reference client (CLI)

Includes error handling, polling with timeout, retry telemetry logging, and a command-line interface:

shell

# Install deps
$ pip install httpx

# Download the reference client
$ curl -O https://raw.githubusercontent.com/r45635/audio-lyrics-extractor/main/example/api_client.py

# Run it
$ TFT_TOKEN=tft_pat_… python api_client.py track.mp3 --format lrc
$ TFT_TOKEN=tft_pat_… python api_client.py track.flac --format srt --vintage

Works in Node.js 18+ and modern browsers (with a CORS proxy for browser usage).

JavaScript (fetch)

const TOKEN = process.env.TFT_TOKEN;
const BASE  = "https://app.textfromtrack.com/api/v1";
const HDR   = { Authorization: `Bearer ${TOKEN}` };
const sleep = ms => new Promise(r => setTimeout(r, ms));

async function getLyrics(filePath, format = "lrc") {
  // 1 — Submit
  const fd = new FormData();
  fd.append("file", fs.createReadStream(filePath)); // Node.js
  const sub = await fetch(`${BASE}/transcriptions`, {
    method: "POST", headers: HDR, body: fd,
  }).then(r => r.json());
  if (!sub.job_id) throw sub;
  console.log(`Submitted — ${sub.job_id}`);

  // 2 — Poll
  let job;
  do {
    await sleep(2000);
    job = await fetch(`${BASE}/transcriptions/${sub.job_id}`, { headers: HDR })
            .then(r => r.json());
    console.log(` → ${job.status}`);
    if (job.status === "error") throw new Error(job.error);
  } while (job.status !== "done");

  // 3 — Download export
  const res = await fetch(`${BASE}/transcriptions/${sub.job_id}/export?format=${format}`, { headers: HDR });
  return res.text();
}

// Usage
getLyrics("track.mp3", "lrc").then(lrc => console.log(lrc));

cURL — full lifecycle

$ TFT_TOKEN="tft_pat_…"
$ BASE="https://app.textfromtrack.com/api/v1"

# Check credit balance
$ curl -s -H "Authorization: Bearer $TFT_TOKEN" "$BASE/me" | jq .credit_balance

# Submit (with pinyin for a Chinese track)
$ JOB=$(curl -s -X POST "$BASE/transcriptions" \
    -H "Authorization: Bearer $TFT_TOKEN" \
    -F "file=@song.mp3" \
    -F "pinyin=true" | jq -r .job_id)
$ echo "job_id: $JOB"

# Poll until done
$ while true; do
    STATUS=$(curl -s -H "Authorization: Bearer $TFT_TOKEN" \
      "$BASE/transcriptions/$JOB" | jq -r .status)
    echo "  $STATUS"
    [ "$STATUS" = "done" ] && break
    [ "$STATUS" = "error" ] && exit 1
    sleep 2
  done

# Download all 4 formats
$ for FMT in txt srt lrc json; do
    curl -s -H "Authorization: Bearer $TFT_TOKEN" \
      "$BASE/transcriptions/$JOB/export?format=$FMT" \
      -o "song.$FMT"
    echo "song.$FMT ✓"
  done

Error catalogue

All errors share a single envelope:

{
  "error": {
    "code":    "insufficient_credits",
    "message": "Your account has 0 credits. This job needs 2.",
    "details": { "required": 2, "available": 0, "top_up_url": "…" }
  }
}

HTTP	Code	When
400	`validation_error`	Invalid field, bad body, wrong file extension
400	`track_too_long`	Audio exceeds 10 minutes
401	`unauthorized`	Missing, unknown, expired, or revoked PAT
402	`insufficient_credits`	Balance below `credits_quoted`. Check `details.top_up_url`
403	`forbidden`	PAT lacks required scope, or account suspended
404	`not_found`	Job doesn't exist or belongs to another user (deliberately ambiguous)
409	`validation_error`	Operation invalid for current job status (e.g. export a pending job)
410	`gone`	Export removed by retention policy (after 7 days)
413	`validation_error`	Upload exceeds 100 MB
429	`rate_limited`	Too many concurrent jobs (2 per user, 3 per IP)
500	`internal_error`	Unexpected server-side failure

Limits & versioning

Limit	Value
Max file size	100 MB · files > 25 MB are re-encoded server-side before cloud processing (see below)
Max audio duration	10 minutes (600 s)
Concurrent jobs per user	2
Concurrent jobs per IP	3
Export retention	7 days — after that `410 Gone`

Versioning policy

The /api/v1/* namespace is contract-stable. Breaking changes move to /api/v2/. Additive changes (new fields, optional params) can land in v1 without notice.

A v1 endpoint will be supported for at least 6 months after v2 is announced.

Credits

1 credit = 1 transcription, regardless of duration (within limits). Check your balance with GET /api/v1/me. The API never accepts payment — point users at top_up_url.