Hausa Speech Recognition
Orinode's Aria v1 model transcribes spontaneous Hausa speech with 31.1% word-error rate (measured May 2026, 200-sample dev set). It is the first publicly benchmarked production-grade Hausa ASR model from a Nigerian research lab.
Why Hausa needs purpose-built ASR
Hausa is spoken by approximately 70 million people across Nigeria, Niger, Cameroon, Ghana, Sudan, and the Hausa diaspora — making it one of the largest African languages by speaker count. Yet global voice AI systems treat it as an afterthought:
- Tonal grammar — Hausa is a tonal language; the same syllabic sequence carries different meaning under different tone contours. Mainstream ASR ignores this.
- Implosive consonants — /ɓ/, /ɗ/, /ƙ/, /ʔ/ have no direct equivalent in English-trained acoustic models, leading to systematic substitution errors.
- Loanword code-switching — natural Hausa speech embeds English, Arabic, and French loanwords. The accent on those embedded words is Hausa, not Standard American English.
- Geographic accent variation — Kano, Sokoto, Zaria, and Maiduguri Hausa differ significantly in vowel length and pitch realization.
Aria v1 — Hausa ASR architecture
Orinode's Hausa ASR is part of the Aria v1 multilingual Speech-LLM stack:
- Encoder: fine-tuned
openai/whisper-large-v3trained on Hausa CommonVoice + Mozilla Common Voice 17 + Orinode-curated spontaneous speech (Kano, Lagos, Abuja sources). - Adapter: MLP + temporal-reshape down-sampler (375 audio tokens per 30s clip) bridging Whisper encoder to LLM input space.
- Decoder:
google/gemma-2-9b-itwith LoRA (r=16, α=32) onq_proj/k_proj/v_proj/o_proj. - Total trainable: 49.4M params (frozen encoder + frozen base decoder).
Measured performance (May 2026)
| Metric | Value | N |
|---|---|---|
| WER (normalized) | 31.12% | 200 |
| WER (raw, case-sensitive) | 32.77% | 200 |
| Inference latency (single L4 GPU, fp16) | ~1.8s for 4s clip | — |
Known weaknesses
The model is honest about its limits. Hausa is the second-hardest language in the v1 benchmark behind Igbo, and the gap to English (10.9%) reflects:
- Sparse high-quality Hausa training audio compared to English or Mandarin.
- Limited speaker diversity in CommonVoice's Hausa partition (most contributors are male, age 25–45, Kano accent).
- Implosive consonants remain a systematic substitution error class — see our public benchmarks page for examples.
Roadmap
Aria v2 (target Q3–Q4 2026) will add ~80 hours of newly recorded studio Hausa across four regional accents, with a target of under 20% WER. Code-switch coverage for Hausa↔English is in active training; see the Orinode Naija customer-call code-switch dataset on Hugging Face.
Get the model
Aria v1 research weights are published on huggingface.co/Orinode. Production API access is in pilot — email [email protected] for evaluation.