What is Orinode's business model?

Our initial revenue comes from selling API access and managed deployments of Maraba to enterprise clients in Nigeria — telecoms, banks, and healthcare providers. As our models mature, we will offer tailored voice AI infrastructure solutions, allowing companies to license our Nigerian speech processing layer for their own applications.

What is the word error rate for Hausa and Yoruba speech recognition?

Best public baselines show Whisper-large-v3 at roughly 38% WER on Hausa and 65% WER on Yoruba on naturalistic Nigerian speech; Google Cloud STT does not support Hausa or Yoruba as locales. Orinode v1 measured (May 2026) achieves 31.1% WER on Hausa and 29.5% on Yoruba — a 2× improvement on Yoruba. Per-checkpoint results and eval scripts are at github.com/Orinode-ltd/orinode-lm.

Can Orinode's API handle code-switching between Yoruba, Hausa, and English?

Code-switching — the natural alternation between Nigerian English, Pidgin, Hausa, Yoruba, and Igbo within a single utterance — is Orinode's primary research focus. Unlike general-purpose ASR models that treat each language separately, our multi-stage training pipeline explicitly models code-switched speech patterns. The Maraba voice assistant is designed from the ground up for this use case, which is why it outperforms general models in Nigerian enterprise environments.

Is Orinode's NaijaSpeech dataset available for download?

The Naija Customer-Call Code-Switch Corpus — 15,000 hand-written customer-service sentences in Hausa–English, Igbo–English, and Yoruba–English code-switch across 30 business sectors — is publicly available under CC-BY 4.0 on Hugging Face: huggingface.co/datasets/Orinode/naija-customer-call-code-switch. Audio-paired releases and the full Maraba training corpus will follow. Researchers can request additional access at research@orinode.ai.

Nigerian Speech AI

Voice AI for the
languages Nigerians
speak natively.

Orinode trains ASR and speech recognition models for Hausa, Yoruba, Igbo, Nigerian Pidgin, and Nigerian English — including the code-switching that no global AI handles. Open research models and production API access for developers and enterprises.

Get in touch See Maraba (product) → Our approach

Maraba · switching languages live Preview · Live Q3 2026

Maraba Preview · Live Q3 2026

EN · English detected

Data Collected

1,500hrs+

Maraba Pilot

Q32026

Orinode is a Nigerian artificial intelligence company founded in 2026 by Usman Abubakar Aliyu in Lagos, Nigeria. The company develops automatic speech recognition (ASR) and text-to-speech (TTS) models for Hausa, Yorùbá, Igbo, Nigerian Pidgin, and Nigerian English — including the natural code-switching that no global voice AI handles. Orinode is registered as Orinode Ltd (RC 9486856) and publishes its research models openly on Hugging Face.

230M+

Nigeria's 2025 population — with Hausa, Yoruba, Igbo, and Pidgin as the primary languages for the majority of citizens.

0

Publicly benchmarked production voice AI systems with published accuracy metrics on spontaneous Nigerian code-switching.

23.2%

Average WER measured on Orinode v1 across English, Hausa, Yoruba, Igbo, Pidgin and Yoruba–English code-switch (May 2026).

Our approach

Nigerian speech recognition & ASR infrastructure built from the ground up.

Global models fail on Nigerian speech. We don't fine-tune them and hope for the best — we build each layer of the stack specifically for how Nigerians speak.

A Nigerian speech corpus at scale.

1,500+ hrs crowdsourced, consented for commercial + research

A fine-tuned Speech-LLM.

4-stage training on Whisper-large-v3 + Gemma, informed by MERaLiON and SeaLLM

A Nigerian voice.

Native-speaker studio recordings + CosyVoice 2 fine-tune for all 5 languages

Applications built on the infrastructure.

First one is Maraba, others via API

Maraba v1 architecture

How the model is built.

Maraba v1 is a parameter-efficient speech-LLM. A Whisper encoder bridges to a Gemma decoder through a learned adapter, with LoRA wrappers handling cross-lingual adaptation. Only 0.5% of total parameters are trained — the rest stays frozen, which keeps compute cost low and lets us iterate quickly on training curricula.

Encoder

Whisper-large-v3 — 1.55B params, frozen after Stage 1 fine-tune

Adapter

MLPReshapeAdapter, scale factor 4 — 1500 frames → 375 speech tokens (~31M trainable)

Decoder

Gemma-2-9B-it — frozen base + LoRA on q_proj / k_proj / v_proj / o_proj

LoRA config

r=16, α=32, dropout=0.05

Trainable

49.4M parameters (~0.5% of total ~9.2B)

Training

20,000 steps · bf16 · effective batch 16 (batch 2 × grad accum 8)

Best ckpt

step 20,000 — 23.2% avg WER across 6 language conditions

Evaluation

Speaker-disjoint dev set, 200 samples × 6 languages = 1,200 generations/checkpoint

TTS layer

CosyVoice 2 (Apache 2.0) fine-tuned on Nigerian studio voices · 5 languages + native code-switching · Ships with Maraba pilot Q3 2026

Developer preview

What an Maraba call looks like.

Maraba exposes a simple speech-to-text endpoint. Pass an audio file (or stream), get back a transcript with auto-detected language and code-switch segments. Full API access opens with the pilot in Q3 2026 — request early access.

from orinode import Maraba

aria = Maraba(api_key="sk_orin_...")

result = aria.transcribe(
    "customer_call.wav",
    language="auto",            # detects HA/YO/IG/PCM/EN + code-switch
    return_segments=True,
)

print(result.text)
# "Sannu, ina son in book table for two at 7pm"

print(result.detected_languages)
# ["ha", "en"]  — code-switching detected

for seg in result.segments:
    print(f"{seg.lang}: {seg.text}")
# ha: Sannu, ina son in
# en: book table for two at 7pm

SDK snippets above are illustrative of the Q3 2026 release interface. Today, evaluation scripts and the per-checkpoint WER protocol are available at github.com/Orinode-ltd/orinode-lm.

Performance

Why existing systems fall short on Nigerian speech.

Word Error Rate (WER) measures how many words a speech model gets wrong. Lower is better. Current leading systems were not trained on Nigerian accents, code-switching, or indigenous languages. These gaps are what Orinode is built to close.

System	Nigerian English	Yoruba	Hausa	Code-switching
Whisper-large-v3 ^¹	~22%	~65%	~38%	Not supported
Google Cloud STT ^²	~31%	Not supported	Not supported	Not supported
Orinode v1 (measured · May 2026)	10.9%	29.5%	31.1%	18.0% (Yoruba–EN CS)

Orinode v1 results measured on speaker-disjoint dev set, 200 samples per language. Best checkpoint (step 20,000) achieves 23.2% average WER across English, Hausa, Yoruba, Igbo, Nigerian Pidgin, and Yoruba–English code-switch. Per-checkpoint results and eval code at github.com/Orinode-ltd/orinode-lm.

¹ Olatunji et al., "AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR," TACL 2023. — aclanthology.org
² Google Cloud Speech-to-Text evaluated on Nigerian English (en-NG). Yoruba and Hausa not available as supported locales in Google Cloud STT as of 2025. — cloud.google.com

Open release schedule

Models & artifacts

Artifact	Description	License	Status
Orinode-Whisper v1	Nigerian-multilingual Whisper encoder — Stage 1 fine-tune	MIT	Trained · Publishing Q3 2026
Naija Customer-Call Code-Switch Corpus	15,000 hand-written sentences (HA-EN, IG-EN, YO-EN), 30 sectors	CC-BY 4.0	Available · Hugging Face
Orinode Speech-LLM v1 (Maraba v1)	Multilingual ASR across EN/HA/YO/IG/PCM + Yoruba–EN code-switch	Eval public · Weights commercial	Trained · 23.2% avg WER
Maraba (product)	Voice assistant for Nigerian businesses	Commercial	Q3 2026 pilot
Maraba Voices (TTS)	Multilingual TTS for EN, HA, YO, IG, PCM — CosyVoice 2 base + Nigerian studio voices	Commercial · Apache 2.0 base	Studio recording Q1–Q2 2026 · Pilot Q3 2026
Research paper	Code-switching Speech-LLM methodology — Interspeech 2026 target	arXiv	Drafting
Training & filtering code	Data pipeline, training recipes	Apache 2.0	Available · GitHub

Language coverage

Five core languages — with more on the way.

ENG

English

Nigerian accent

ENG

English

"Good morning, how may I help you?"

ASR TTS Phase 1 Code-switch base

In development

HAU

Hausa

85M+ speakers · Northern Nigeria

HAU

Hausa

"Ina kwana, yaya zan taimake ka?"

ASR TTS Phase 1 85M speakers

In development

YOR

Yoruba

50M+ speakers · Southwestern Nigeria

YOR

Yoruba

"Ẹ káàrọ̀, báwo ni mo ṣe lè ràn yín lọ́wọ́?"

ASR TTS Phase 1 50M speakers

In development

IBO

Igbo

45M+ speakers · Southeastern Nigeria

IBO

Igbo

"Ụtụtụ ọma, kedụ ka m ga-esi nyere gị aka?"

ASR TTS Phase 1 45M speakers

In development

PCM

Nigerian Pidgin

100M+ speakers · Nationwide

PCM

Pidgin

"How you dey? Wetin I fit help you do?"

ASR Phase 1 Code-switch heavy 100M speakers

In development

Nigeria has over 500 languages across its 230M+ population. Phase 1 focuses on the five languages spoken by the largest share of Nigerians — English, Hausa, Yoruba, Igbo, and Nigerian Pidgin. Additional languages including Fulfulde, Kanuri, Tiv, Efik, and others are planned for subsequent phases.

Why these are hard

Phonetic challenges global models miss.

YORUBA

Lexical tone (high / mid / low) changes meaning — ọkọ́ (hoe), ọkọ̀ (vehicle), ọkọ (husband). General STT collapses these into one word.

HAUSA

Glottal stop, implosive consonants (ɓ, ɗ), and ejective consonants (k', ts') — phonemes absent from English-centric speech models, so they get hallucinated as near neighbors.

IGBO

Downstep tone (a third tonal level mid-utterance) plus nasalized vowels. Tonal compression is a frequent error in non-Igbo-trained models.

NIGERIAN PIDGIN

Lexicon shift (wahala, chop, na) plus substrate-language prosody. Treated as broken English by general models; transcribed nonsensically.

NIGERIAN ENGLISH

Distinct vowel space, rhoticity differences, and code-switching with all four other languages. General “English” models default to American or British speaker assumptions.

CODE-SWITCHING

Mid-utterance language transitions are the dominant pattern in Nigerian speech. No global speech AI handles this natively; cascade systems break, single-language models silently mistranscribe.

Product

Our first application: Maraba.

Maraba is a production-ready voice assistant built on Orinode's proprietary speech infrastructure. It represents one deployment pattern, demonstrating our capability, while the underlying API will be available for broader integrations.

Health clinics & telemedicine

Automate patient intake, appointment scheduling, and basic triage in the patient's preferred language.

Fintechs & financial services

Resolve balance inquiries, disputes, loan queries, and fraud reports securely via voice — including IVR automation in Nigerian languages.

E-commerce & logistics

Provide 24/7 support for order status updates, delivery rescheduling, and returns processing.

Service businesses & SMEs

Manage bookings, offer quotes, and confirm hours. Distinctly priced for local retail, salons, and tutors.

Request a pilot

Principles

Open research commitments

We believe foundational infrastructure for low-resource languages must be community-aligned. Orinode pursues a tiered release strategy: our base multilingual models and acoustic features will be open-sourced to the community via Hugging Face, while specialized product weights powering interactive agents strictly remain commercial.

Data dignity is critical to our process. All speakers contributing to our foundational corpus provide explicit, informed consent for both research and commercial exploitation. Furthermore, we compensate contributors at above-market rates, ensuring that the economic value created by African linguistic data benefits African contributors directly.

Aligned with the broader African NLP ecosystem, we actively participate in and learn from communities like Masakhane. We plan to release dataset samples for academic vetting and adhere to our transparent ethics framework focused on preventing harm in automated decision-making contexts.

Partners & collaborators

Building with the ecosystem, not around it.

Orinode operates at the intersection of research and industry in Nigeria. We are actively formalizing partnerships with academic institutions, pilot businesses, and supporting programs as we move toward the Maraba pilot launch.

Research

Conversations underway with Nigerian university departments. Contact [email protected] to discuss collaboration.

Pilot partners

Fintech and healthcare organizations in early evaluation. Names disclosed under NDA on request.

Supported by

Compute and tooling support program applications in progress.

Research and partnership inquiries: [email protected] · [email protected]

Building Orinode from Nigeria.

Usman Abubakar Aliyu

Founder & Principal Researcher

I'm studying Entrepreneurship at the British University in Egypt and have spent the last several years as a full-stack engineer, building backend systems and customer platforms for startups across Nigeria and international clients in the UK and Canada.

I started Orinode because there's nothing built for us — Nigerian voices, our code-switching, our accents are an afterthought for every global voice AI system. I'm self-taught in machine learning on top of formal training in product and business, and I'm building Orinode because if Africans don't build this infrastructure for our own languages, no one will.

Common questions

From investors, partners, and researchers.

How is this different from Google/Microsoft/OpenAI's voice AI?

Global players build generalized models primarily trained on vast amounts of Western data. When applied to Nigerian speech, they fail catastrophically due to heavy accent variations, distinct phonetics, and most importantly, code-switching (rapidly swapping between English, Pidgin, and local languages mid-sentence). Orinode builds targeted infrastructure resolving this specific linguistic challenge. We don't aim for global breadth; we focus on precision for the 230 million voices current infrastructure ignores.

Are you using open-source models or training from scratch?

We use a hybrid approach. We do not train from a random initialization. We leverage powerful open-weight base models, such as Whisper-large-v3 for feature extraction and Gemma for language reasoning. However, standard fine-tuning is insufficient for code-switched alignment. Our proprietary multi-stage training pipeline aggressively adapts these base models using our proprietary corpus, similar to methodologies seen in region-specific breakthroughs like MERaLiON in Singapore.

What's your data collection strategy, and how do you handle consent?

Our competitive advantage is our data pipeline. We collect conversational, highly code-switched audio across diverse demographic boundaries in Nigeria. Every participant signs an explicit release allowing both academic research and commercial utilization. Contributors are paid significantly above local averages. We believe ethical AI demands that the people whose voices power the models are compensated fairly and transparently.

How will you release your models and data?

Following in the footsteps of groups like AfriSpeech and standard practices shared on Hugging Face, our base acoustic models (ASR components) and smaller data subsets will be open-sourced under permissive licenses (MIT / CC-BY 4.0). The end-to-end commercial system powering enterprise apps, Maraba, along with the full proprietary corpus, remains closed to sustain our business and support continued research operations.

Who are you working with?

We are currently in early conversations with Nigerian fintech and telemedicine organizations who require robust vernacular support for their customer service deployments. We are also establishing formal research collaborations with Nigerian university departments in computer science and linguistics to validate our phonetic classifications and ensure our NLP approaches align with rigorous academic standards.

What's the timeline?

As of May 2026: Stage 1 (Whisper encoder fine-tune) and Stage 3 (multilingual LoRA adaptation) are complete. Orinode v1 achieves 23.2% average WER across six conditions — English (10.9%), Hausa (31.1%), Yoruba (29.5%), Igbo (39.8%), Nigerian Pidgin (9.8%), and Yoruba–English code-switch (18.0%). The 15,000-sentence hand-written customer-service code-switch corpus is publicly released on Hugging Face. Maraba pilot launches Q3 2026; the methodology paper targets Interspeech 2026.

What's the business model?

As an applied research company, our initial revenue comes from selling API access and managed deployments of Maraba to enterprise clients in Nigeria (telecoms, banks, healthcare). As our models mature, we will offer tailored infrastructure solutions — allowing multinational companies to license our voice-processing layer to localize their own global applications for the West African market.

Will Maraba speak Nigerian languages, not just understand them?

Yes. Maraba's text-to-speech (TTS) layer is built on CosyVoice 2 (Apache 2.0) and fine-tuned with studio-quality recordings from native voice actors we commission across English, Hausa, Yoruba, Igbo, and Nigerian Pidgin. The voices are owned by Orinode under perpetual commercial license. Voice recording is scheduled for Q1–Q2 2026 and the multilingual TTS layer ships with the Maraba pilot in Q3 2026, including code-switching support for at least Yoruba–English.

How does Orinode relate to Masakhane and African NLP research?

We are deeply inspired by and continuously learn from grassroots research networks like Masakhane. While Masakhane focuses on broad community-driven open science, Orinode serves as a commercially viable, infrastructure-focused entity. We bridge the gap between open academic breakthroughs and the stringent reliability requirements of enterprise production systems, validating that African NLP can drive sustainable commercial value.

+ + + +

Let's talk.

Whether you're building a product, conducting research, or exploring a partnership — we'd love to hear from you.

General

[email protected]

Research & Partnerships

[email protected]
[email protected]

Location

Lagos, Nigeria · Response within 2 business days

Name Please enter your name.

Email Please enter a valid email address.

Organization

I'm reaching out about…

Message Please enter a message.

Message sent.
We'll respond within 2 business days.

Or email directly: [email protected]

Voice AI for the languages Nigerians speak natively.

230M+

0

23.2%

Nigerian speech recognition & ASR infrastructure built from the ground up.

How the model is built.

What an Maraba call looks like.

Why existing systems fall short on Nigerian speech.

Models & artifacts

Five core languages — with more on the way.

English

English

Hausa

Hausa

Yoruba

Yoruba

Igbo

Igbo

Nigerian Pidgin

Pidgin

Phonetic challenges global models miss.

Our first application: Maraba.

Health clinics & telemedicine

Fintechs & financial services

E-commerce & logistics

Service businesses & SMEs

Open research commitments

Building with the ecosystem, not around it.

Research

Pilot partners

Supported by

Building Orinode from Nigeria.

Usman Abubakar Aliyu

From investors, partners, and researchers.

Let's talk.

Voice AI for the
languages Nigerians
speak natively.