Hausa Speech Recognition

Orinode's Maraba v1 model transcribes spontaneous Hausa speech with 31.1% word-error rate (measured May 2026, 200-sample dev set). It is the first publicly benchmarked production-grade Hausa ASR model from a Nigerian research lab.

Looking for the deployable product? The Hausa ASR model below powers Maraba — an AI call agent that answers Hausa business calls in production, with code-switching and ƙ/ɗ/ɓ preserved end-to-end.

Why Hausa needs purpose-built ASR

Hausa is spoken by approximately 70 million people across Nigeria, Niger, Cameroon, Ghana, Sudan, and the Hausa diaspora — making it one of the largest African languages by speaker count. Yet global voice AI systems treat it as an afterthought:

Tonal grammar — Hausa is a tonal language; the same syllabic sequence carries different meaning under different tone contours. Mainstream ASR ignores this.
Implosive consonants — /ɓ/, /ɗ/, /ƙ/, /ʔ/ have no direct equivalent in English-trained acoustic models, leading to systematic substitution errors.
Loanword code-switching — natural Hausa speech embeds English, Arabic, and French loanwords. The accent on those embedded words is Hausa, not Standard American English.
Geographic accent variation — Kano, Sokoto, Zaria, and Maiduguri Hausa differ significantly in vowel length and pitch realization.

Maraba v1 — Hausa ASR architecture

Orinode's Hausa ASR is part of the Maraba v1 multilingual Speech-LLM stack:

Encoder: fine-tuned openai/whisper-large-v3 trained on Hausa CommonVoice + Mozilla Common Voice 17 + Orinode-curated spontaneous speech (Kano, Lagos, Abuja sources).
Adapter: MLP + temporal-reshape down-sampler (375 audio tokens per 30s clip) bridging Whisper encoder to LLM input space.
Decoder: google/gemma-2-9b-it with LoRA (r=16, α=32) on q_proj/k_proj/v_proj/o_proj.
Total trainable: 49.4M params (frozen encoder + frozen base decoder).

Measured performance (May 2026)

Metric	Value	N
WER (normalized)	31.12%	200
WER (raw, case-sensitive)	32.77%	200
Inference latency (single L4 GPU, fp16)	~1.8s for 4s clip	—

Known weaknesses

The model is honest about its limits. Hausa is the second-hardest language in the v1 benchmark behind Igbo, and the gap to English (10.9%) reflects:

Sparse high-quality Hausa training audio compared to English or Mandarin.
Limited speaker diversity in CommonVoice's Hausa partition (most contributors are male, age 25–45, Kano accent).
Implosive consonants remain a systematic substitution error class — see our public benchmarks page for examples.

Roadmap

Maraba v2 (target Q3–Q4 2026) will add ~80 hours of newly recorded studio Hausa across four regional accents, with a target of under 20% WER. Code-switch coverage for Hausa↔English is in active training; see the Orinode Naija customer-call code-switch dataset on Hugging Face.

Get the model

Maraba v1 research weights are published on huggingface.co/Orinode. Production API access is in pilot — email [email protected] for evaluation.

For the deployable voice agent built on this model, see Maraba — Hausa AI at maraba.ai.