History

shaihi 40a46f9ed1 benchmark: add harness artifacts and handoff record		2026-03-09 10:13:34 +02:00
..
README.md	benchmark: add harness artifacts and handoff record	2026-03-09 10:13:34 +02:00
long.txt	benchmark: add harness artifacts and handoff record	2026-03-09 10:13:34 +02:00
medium.txt	benchmark: add harness artifacts and handoff record	2026-03-09 10:13:34 +02:00
short.txt	benchmark: add harness artifacts and handoff record	2026-03-09 10:13:34 +02:00

README.md

Reference Transcripts

This directory contains canonical reference transcripts used by the benchmark correctness gate.

Expected files:

short.txt
medium.txt
long.txt

How they are used:

benchmark/parse_results.py extracts transcript text from each measured run log.
Text is normalized (case, punctuation, spacing).
WER and CER are computed against these reference files.
benchmark/bench.sh enforces correctness thresholds by default:
- MAX_WER=0.02
- MAX_CER=0.02

Notes:

Keep references fixed once baseline is established.
If audio inputs change, regenerate references intentionally and document why.