Nigerian English Speech Recognition
Orinode's Nigerian English ASR achieves 10.9% WER on naturalistic Nigerian-accented English (May 2026, 200 samples) — a substantial improvement over off-the-shelf US/UK-trained ASR systems on the same audio.
What "Nigerian English" actually is
Nigerian English (ISO 639-3 eng-NG) is the local variant of English spoken by an estimated 100+ million Nigerians as a second language. It has stable, systematic phonological and lexical features that diverge from General American or Received Pronunciation:
- Vowel mergers — many speakers neutralize /ɪ/–/iː/ ("bit" / "beat"), /ʊ/–/uː/ ("full" / "fool"), /ʌ/–/ɔ/ ("but" / "bought").
- Non-rhotic in most regions — /r/ is not pronounced post-vocalically; "car" sounds closer to "kah".
- Substrate prosody — speech rhythm is syllable-timed rather than stress-timed, shifting the temporal cues English-trained ASR depends on.
- Distinctive vocabulary — flash my phone, okada, kabukabu, NEPA, wahala — domain-specific terms that out-of-the-box ASR transcribes incorrectly or omits.
Why Whisper-large-v3 alone is not enough
OpenAI's Whisper-large-v3 was trained on ~680k hours of multilingual audio, dominated by US and European English. Its English language token (<|en|>) decodes Nigerian English passably but consistently:
- Substitutes "beat" for "bit" and vice-versa.
- Drops or mis-inserts function words ("the", "a", "to") at higher rates.
- Hallucinates entirely on short utterances with strong Nigerian prosody.
Orinode's fine-tuned encoder corrects these systematic errors by training on Nigerian-accented English specifically.
Architecture
- Encoder: Whisper-large-v3 fine-tuned on Nigerian English (Common Voice + LDC Nigerian English subset + Orinode-curated radio archives).
- Decoder + adapter: shared with the rest of Aria v1 — Gemma-2-9b-it with LoRA on q/k/v/o projections.
- Language token: standard
<|en|>— but the encoder has been re-aligned to Nigerian-accented English at the acoustic level.
Performance (May 2026)
| Metric | Value | N |
|---|---|---|
| WER (normalized) | 10.92% | 200 |
| WER (raw, case-sensitive) | 12.03% | 200 |
Code-switching with Nigerian English
Real Nigerian English speech routinely embeds Hausa, Yorùbá, Igbo, or Pidgin tokens within an otherwise English utterance — e.g. "My boss said sannu, but he no fit even pronounce the name properly". Aria v1 handles this via a multilingual decoder that can emit any of the six trained languages mid-sentence. See our Naija customer-call code-switch dataset on Hugging Face for representative training data.
Get the model
Open weights on huggingface.co/Orinode. Production API access: [email protected].