|
|
||
|---|---|---|
| .. | ||
| README.md | ||
| long.txt | ||
| medium.txt | ||
| short.txt | ||
README.md
Reference Transcripts
This directory contains canonical reference transcripts used by the benchmark correctness gate.
Expected files:
short.txtmedium.txtlong.txt
How they are used:
benchmark/parse_results.pyextracts transcript text from each measured run log.- Text is normalized (case, punctuation, spacing).
- WER and CER are computed against these reference files.
benchmark/bench.shenforces correctness thresholds by default:MAX_WER=0.02MAX_CER=0.02
Notes:
- Keep references fixed once baseline is established.
- If audio inputs change, regenerate references intentionally and document why.