← Back to Orinode

Nigerian Pidgin (Naija) Speech Recognition

Orinode's Pidgin ASR is one of the strongest results in Aria v1: 9.8% WER on 200 spontaneous samples (May 2026). Nigerian Pidgin — locally Naija — is the most widely spoken Nigerian language with an estimated 75+ million speakers, yet it has no dedicated language slot in any global ASR model. We made one.

What Nigerian Pidgin is

Nigerian Pidgin (ISO 639-3 pcm) is a creole that emerged from English contact with West African languages — predominantly Yorùbá, Igbo, Hausa, Edo, and Efik. It is grammatically and phonologically distinct from any variant of English:

Why standard ASR fails on Pidgin

Whisper, Google Speech-to-Text, and Azure all classify Pidgin audio as "broken English" and transcribe it phonetically against an English lexicon — producing output like "we are going to the market" for we dey go market. Worse, they often refuse to transcribe at all when language detection mislabels the segment.

Orinode's approach

Performance (May 2026)

MetricValueN
WER (normalized)9.76%200
WER (raw, case-sensitive)9.90%200

Use cases

Pidgin ASR matters most for: customer-service call centers (BVN verification, banking complaints, telco issues), broadcast captioning for Naija-language radio/TV stations, and government services trying to be linguistically inclusive for the majority of Nigerians who use Pidgin as their working everyday language.

Get the model

Aria v1 weights with the Pidgin (Hawaiian-slot) configuration: huggingface.co/Orinode. Production API: [email protected].