Updates — Orinode | Nigerian Voice AI Progress

11 May 2026 Milestone Dataset

Maraba v1 Stage 3 complete · 23.2% avg WER · Code-switch corpus released.

Today marks two milestones on Orinode's roadmap.

1. Stage 3 LoRA training is complete. Maraba v1's multilingual speech-LLM (Whisper-large-v3 encoder + MLPReshapeAdapter + Gemma-2-9B-it decoder with LoRA r=16) finished its 20,000-step training run. We evaluated all 16 checkpoints on 200 samples per language, 6 language conditions (English, Hausa, Yoruba, Igbo, Nigerian Pidgin, Yoruba–English code-switch). The best checkpoint (step 20,000) achieves 23.2% average WER:

Language	WER
Nigerian English	10.9%
Nigerian Pidgin	9.8%
Yoruba–English code-switch	18.0%
Yoruba	29.5%
Hausa	31.1%
Igbo	39.8%
Average	23.2%

Igbo remains the bottleneck language, consistent with public dataset scarcity. We are actively expanding our Igbo training data to close this gap before Maraba pilot.

2. The Naija Customer-Call Code-Switch Corpus is public. 15,000 hand-written customer-service sentences across Hausa–English, Igbo–English, and Yoruba–English — covering 30 business sectors — are now available under CC-BY 4.0 on Hugging Face:

huggingface.co/datasets/Orinode/naija-customer-call-code-switch

We are publishing the full corpus including honest quality notes per language. The first 3,000 lines of each file are gold-quality; later sections cover broader vocabulary but show more pattern repetition. Released as-is so the community can filter for their use case rather than wait for a sanitised subset.

What's next. Stage 3 weights remain commercial; the eval protocol and per-checkpoint results are open. Next up: TTS layer build-out on CosyVoice 2 (Apache 2.0) with studio-quality Nigerian voice recordings beginning Q1 2026, and the Maraba pilot launch in Q3 2026. Methodology paper targeted at Interspeech 2026.

Updates.

Maraba v1 Stage 3 complete · 23.2% avg WER · Code-switch corpus released.