[T] [T]extFromTrack Developers
Swagger UI ↗ ← Back to app
TextFromTrack API v1

Build with audio-to-text

A versioned, token-authenticated REST API to submit audio, transcribe it with state-of-the-art models, and download the result in multiple formats. One credit per job. No SDK required.

Start in 2 minutes

Generate an API token from your account, then paste the curl below.

1
Create a Personal Access Token
In the app: click your avatar → AccountAPI tokens section → Generate. Save the token immediately — it's shown once.
2
Submit an audio file
POST /api/v1/transcriptions with multipart/form-data. Receive a job_id immediately.
3
Poll until done
GET /api/v1/transcriptions/{job_id} every 2 s. Status transitions: pendingprocessingdone
4
Download your transcript
GET /api/v1/transcriptions/{job_id}/export?format=txt|srt|lrc|json. Or read structured segments as JSON directly.
MethodPathPurpose
POST/api/v1/transcriptionsSubmit audio file
GET/api/v1/transcriptions/{id}Poll status + metadata
GET/api/v1/transcriptions/{id}/segmentsTimestamped segments (JSON)
GET/api/v1/transcriptions/{id}/exportDownload TXT / SRT / LRC / JSON
GET/api/v1/transcriptionsList your jobs (paginated)
DEL/api/v1/transcriptions/{id}Cancel or delete a job
GET/api/v1/meYour profile + credit balance

Quickstart

The full lifecycle in three shell commands.

cURL
$ TFT_TOKEN="tft_pat_…"

# 1 — Submit
$ curl -s -X POST "https://app.textfromtrack.com/api/v1/transcriptions" \
    -H "Authorization: Bearer $TFT_TOKEN" \
    -F "file=@track.mp3" | jq .job_id
"f0c7e2d4-1b3a-…"

# 2 — Poll (repeat until status == "done")
$ curl -s -H "Authorization: Bearer $TFT_TOKEN" \
    "https://app.textfromtrack.com/api/v1/transcriptions/f0c7e2d4-1b3a-…" | jq .status
"done"

# 3 — Download LRC
$ curl -s -H "Authorization: Bearer $TFT_TOKEN" \
    "https://app.textfromtrack.com/api/v1/transcriptions/f0c7e2d4-1b3a-…/export?format=lrc" \
    -o track.lrc

Authentication

All /api/v1/* endpoints require a Personal Access Token (PAT). Send it as a Bearer token on every request:

HTTP Header
Authorization: Bearer tft_pat_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Security model: a PAT cannot manage other PATs and cannot log into the web UI. A stolen web session cannot call the API. This separation is intentional.

Managing tokens

Create, list, and revoke tokens from the web UI via Account → API tokens. These three management endpoints use web JWT auth, not a PAT:

MethodPathNotes
POST/api/auth/tokensCreate — plaintext returned once
GET/api/auth/tokensList metadata (no plaintext ever)
DEL/api/auth/tokens/{id}Revoke — idempotent

Tokens are prefixed tft_pat_ so secret-scanning tools (GitHub, gitleaks) catch accidental leaks. Revoke from your account at any time.


API Reference

Base URL: https://app.textfromtrack.com/api/v1

POST /transcriptions Submit an audio file

Upload audio with multipart/form-data. Returns immediately with a job_id.

Request body

FieldTypeRequiredDefaultDescription
fileFileyes.mp3, .wav, .flac · max 100 MB · max 10 min — see note below
pinyinboolnofalseSee below — pinyin annotation for Chinese text
vintageboolnofalseSee below — high-quality mode for music / old recordings

File size & automatic transcoding

The server accepts files up to 100 MB. The cloud transcription API (OpenAI Whisper) has its own 25 MB limit — but this is handled transparently: if your file exceeds 25 MB, the server automatically re-encodes it to MP3 mono 16 kHz 128 kbps with ffmpeg before sending it upstream. You never see this step.

The practical ceiling is the 10-minute duration limit, not the file size. A 10-minute audio file at 128 kbps mono is ≈ 9.4 MB — well under the 25 MB cloud threshold — so a re-encoded file can never overflow the upstream limit.

Large HD files (FLAC/WAV/AIFF): a lossless 10-minute stereo FLAC can exceed 100 MB. In that case the upload is rejected with 413 Entity Too Large. Re-export at a lower bitrate or convert to MP3 before uploading.

Option: pinyin=true

When your track contains Mandarin Chinese lyrics, enabling pinyin post-processes the transcript to annotate each Chinese character with its romanised pronunciation (pīnyīn). Characters are wrapped in a bracketed notation — e.g. 你好[nǐ hǎo] — so the result is readable by both Chinese readers and non-Chinese tools that only handle Latin characters.

This option has no effect on non-Chinese audio. It adds ~1 s to the response time and does not cost extra credits.

Option: vintage=true (Music / high-quality mode)

Enables a heavier, two-stage local pipeline designed for music tracks and difficult recordings (heavy reverb, background instruments, old recordings, low bitrate):

1
Demucs vocal separation
The track is split into stems using Meta's Demucs htdemucs model. The vocals stem is extracted and passed to the transcription step. Instrumental bleed is removed, which dramatically improves accuracy on music.
2
Whisper large-v3 (local)
The cleaned vocals are transcribed with OpenAI Whisper large-v3 running locally (not the cloud API). This model has better recall on sung, mumbled, or non-standard speech than the default faster model.

Trade-offs: vintage mode takes significantly longer (30 s–5 min depending on duration) and cannot be used with pinyin=true simultaneously. It consumes the same 1 credit as a standard job — no extra charge.

The web UI also uses vintage mode automatically as a retry fallback when the standard model returns a poor-quality transcript (repetition loops, silence, etc.). When calling the API directly, you choose upfront.

Response — 201

{
  "job_id":            "f0c7e2d4-1b3a-…",
  "status":            "pending",
  "credits_quoted":    1,
  "duration_seconds":  187.3,
  "vintage_mode":      false
}
GET /transcriptions/{id} Poll status

Poll every ~2 s. Status flow: pendingprocessingdone or error

When attempts is 2 and vocal_separation_used is true, the pipeline automatically retried with Demucs vocal separation. The response is the winning transcript — you don't need to handle the retry yourself.

Response — 200 (done)

{
  "job_id":                  "f0c7e2d4-1b3a-…",
  "status":                  "done",
  "filename":                "track.mp3",
  "language":                "en",
  "duration_seconds":        187.3,
  "segment_count":           42,
  "credits_charged":         1,
  "model":                   "whisper-1",
  "degraded":                false,
  "vocal_separation_used":   false,
  "attempts":                1,
  "runtime_seconds":         12.4,
  "created_at":              "2026-05-08T10:00:00Z",
  "completed_at":            "2026-05-08T10:00:13Z"
}
GET /transcriptions/{id}/segments Structured timestamped transcript

Returns the transcript as a structured JSON array — same data as the JSON export, without downloading a file.

{
  "job_id":           "…",
  "language":         "fr",
  "duration_seconds": 187.3,
  "segments": [
    { "index": 0, "start": 0.0,  "end": 4.2,  "text": "Première ligne…" },
    { "index": 1, "start": 4.2,  "end": 8.5,  "text": "Deuxième ligne…" }
  ]
}
GET /transcriptions/{id}/export?format=… Download a file export

Streams the file. format ∈ {txt, srt, lrc, json}. Content-Type is set per format.

Retention: exports are deleted after 7 days. Requesting an expired export returns 410 Gone.

.txt
Plain text, one line per segment
.srt
SubRip subtitles — for video players
.lrc
Synced lyrics — for music players & karaoke
.json
Structured segments array with timestamps
GET /transcriptions List your jobs (paginated)

Query params: page (default 1), per_page (default 50, max 200), status (filter: pending, done, error).

{
  "page":             1,
  "per_page":         50,
  "total":            173,
  "transcriptions":   [ /* slim shape, no attempts_log */ ]
}
DEL /transcriptions/{id} Cancel or delete a job

Marks the job as cancelled and removes its export directory.

Credits are not refunded if the job is already done — you've consumed compute. Cancelling a pending job before processing starts does not charge credits.

GET /me Profile + credit balance
{
  "user_id":          "…",
  "email":            "you@example.com",
  "credit_balance":   47,
  "top_up_url":       "https://app.textfromtrack.com/",
  "scopes_granted":   ["transcriptions:read", "transcriptions:write"]
}

When the balance is low, redirect users to top_up_url. The API never accepts payment directly.


Code examples

Minimal example — only httpx required (pip install httpx). A full reference client with error handling and a CLI is available below.

Python (httpx)
import httpx, time, os

TOKEN = os.environ["TFT_TOKEN"]
BASE  = "https://app.textfromtrack.com/api/v1"
HDR   = {"Authorization": f"Bearer {TOKEN}"}

# 1 — Submit
with open("track.mp3", "rb") as f:
    r = httpx.post(f"{BASE}/transcriptions", headers=HDR, files={"file": f})
r.raise_for_status()
job_id = r.json()["job_id"]
print(f"Submitted — job_id={job_id}")

# 2 — Poll until done
while True:
    j = httpx.get(f"{BASE}/transcriptions/{job_id}", headers=HDR).json()
    print(f"  status: {j['status']}")
    if j["status"] == "done":   break
    if j["status"] == "error":  raise RuntimeError(j["error"])
    time.sleep(2)

print(f"Done — {j['language']} · {j['segment_count']} segments")

# 3 — Download LRC
lrc = httpx.get(
    f"{BASE}/transcriptions/{job_id}/export",
    headers=HDR, params={"format": "lrc"}
)
open("track.lrc", "wb").write(lrc.content)
print("track.lrc written ✓")

Full reference client (CLI)

Includes error handling, polling with timeout, retry telemetry logging, and a command-line interface:

shell
# Install deps
$ pip install httpx

# Download the reference client
$ curl -O https://raw.githubusercontent.com/r45635/audio-lyrics-extractor/main/example/api_client.py

# Run it
$ TFT_TOKEN=tft_pat_… python api_client.py track.mp3 --format lrc
$ TFT_TOKEN=tft_pat_… python api_client.py track.flac --format srt --vintage

Works in Node.js 18+ and modern browsers (with a CORS proxy for browser usage).

JavaScript (fetch)
const TOKEN = process.env.TFT_TOKEN;
const BASE  = "https://app.textfromtrack.com/api/v1";
const HDR   = { Authorization: `Bearer ${TOKEN}` };
const sleep = ms => new Promise(r => setTimeout(r, ms));

async function getLyrics(filePath, format = "lrc") {
  // 1 — Submit
  const fd = new FormData();
  fd.append("file", fs.createReadStream(filePath)); // Node.js
  const sub = await fetch(`${BASE}/transcriptions`, {
    method: "POST", headers: HDR, body: fd,
  }).then(r => r.json());
  if (!sub.job_id) throw sub;
  console.log(`Submitted — ${sub.job_id}`);

  // 2 — Poll
  let job;
  do {
    await sleep(2000);
    job = await fetch(`${BASE}/transcriptions/${sub.job_id}`, { headers: HDR })
            .then(r => r.json());
    console.log(` → ${job.status}`);
    if (job.status === "error") throw new Error(job.error);
  } while (job.status !== "done");

  // 3 — Download export
  const res = await fetch(`${BASE}/transcriptions/${sub.job_id}/export?format=${format}`, { headers: HDR });
  return res.text();
}

// Usage
getLyrics("track.mp3", "lrc").then(lrc => console.log(lrc));
cURL — full lifecycle
$ TFT_TOKEN="tft_pat_…"
$ BASE="https://app.textfromtrack.com/api/v1"

# Check credit balance
$ curl -s -H "Authorization: Bearer $TFT_TOKEN" "$BASE/me" | jq .credit_balance

# Submit (with pinyin for a Chinese track)
$ JOB=$(curl -s -X POST "$BASE/transcriptions" \
    -H "Authorization: Bearer $TFT_TOKEN" \
    -F "file=@song.mp3" \
    -F "pinyin=true" | jq -r .job_id)
$ echo "job_id: $JOB"

# Poll until done
$ while true; do
    STATUS=$(curl -s -H "Authorization: Bearer $TFT_TOKEN" \
      "$BASE/transcriptions/$JOB" | jq -r .status)
    echo "  $STATUS"
    [ "$STATUS" = "done" ] && break
    [ "$STATUS" = "error" ] && exit 1
    sleep 2
  done

# Download all 4 formats
$ for FMT in txt srt lrc json; do
    curl -s -H "Authorization: Bearer $TFT_TOKEN" \
      "$BASE/transcriptions/$JOB/export?format=$FMT" \
      -o "song.$FMT"
    echo "song.$FMT ✓"
  done

Error catalogue

All errors share a single envelope:

{
  "error": {
    "code":    "insufficient_credits",
    "message": "Your account has 0 credits. This job needs 2.",
    "details": { "required": 2, "available": 0, "top_up_url": "…" }
  }
}
HTTPCodeWhen
400validation_errorInvalid field, bad body, wrong file extension
400track_too_longAudio exceeds 10 minutes
401unauthorizedMissing, unknown, expired, or revoked PAT
402insufficient_creditsBalance below credits_quoted. Check details.top_up_url
403forbiddenPAT lacks required scope, or account suspended
404not_foundJob doesn't exist or belongs to another user (deliberately ambiguous)
409validation_errorOperation invalid for current job status (e.g. export a pending job)
410goneExport removed by retention policy (after 7 days)
413validation_errorUpload exceeds 100 MB
429rate_limitedToo many concurrent jobs (2 per user, 3 per IP)
500internal_errorUnexpected server-side failure

Limits & versioning

LimitValue
Max file size100 MB · files > 25 MB are re-encoded server-side before cloud processing (see below)
Max audio duration10 minutes (600 s)
Concurrent jobs per user2
Concurrent jobs per IP3
Export retention7 days — after that 410 Gone

Versioning policy

The /api/v1/* namespace is contract-stable. Breaking changes move to /api/v2/. Additive changes (new fields, optional params) can land in v1 without notice.

A v1 endpoint will be supported for at least 6 months after v2 is announced.

Credits

1 credit = 1 transcription, regardless of duration (within limits). Check your balance with GET /api/v1/me. The API never accepts payment — point users at top_up_url.