parakeet.cpp/tests/librispeech

LibriSpeech is a standard dataset for training and evaluating automatic speech recognition systems.

This directory contains a set of tools to evaluate the recognition performance of parakeet.cpp on LibriSpeech corpus.

Quick Start

(Pre-requirement) Compile parakeet-cli and prepare the Parakeet model in ggml format.

$ # Execute the commands below in the project root dir.
$ cmake -B build
$ cmake --build build --config Release

Set up the environment to compute WER score.

$ pip install -r requirements.txt

For example, if you use virtualenv, you can set up it as follows:

$ python3 -m venv venv
$ . venv/bin/activate
$ pip install -r requirements.txt

Create eval.conf and override variables.

PARAKEET_MODEL = parakeet-tdt-0.6b-v3
PARAKEET_FLAGS = --no-prints --threads 8 --language en --output-txt

Check out eval.mk for more details.