Principles & practices

AI Ethics Framework for Nigerian Voice AI Data Collection

Version 1.0 · April 2026 · Orinode Ltd · Lagos, Nigeria

Why this document exists

Orinode builds voice AI infrastructure for Nigerian languages. Our models are trained on the speech of real people. Our products are deployed in contexts — healthcare, finance, telecommunications — where AI errors carry real consequences.

This document states plainly how we operate: how we collect data, what contributors are entitled to, how we measure the fairness of what we build, and what we commit to doing when things go wrong. It is written for grant reviewers, research partners, pilot customers, and the communities whose languages we work with.

We do not believe that a framework document substitutes for practice. Every commitment here is either already in operation or has a named deadline. We will update this document when practices change.

1. Data collection and consent

All speakers contributing audio to our training corpus go through an explicit, informed consent process before recording begins. The consent form is available in English, Hausa, Yoruba, Igbo, and Nigerian Pidgin.

The consent form explains in plain language:

That recordings will be used to train AI models for both research and commercial purposes
That audio clips may be shared publicly in anonymized dataset releases under CC-BY or CC-BY-NC licensing
That Orinode Ltd holds the rights to use recordings for the purposes described
That contributors can withdraw consent for future recordings at any time — recordings already incorporated into a published model release cannot be retroactively removed due to technical constraints, and this is stated explicitly
The compensation rate before recording begins, not after

No deceptive framing is used. Participants are not recruited with misleading descriptions of what the recordings will be used for. We do not record in contexts where participants could reasonably feel pressured to participate (e.g., in employer-organized group sessions without individual opt-out).

2. Contributor compensation

We pay contributors at above-market rates for the Nigerian gig economy. Our current target is 1.5× the prevailing hourly rate for comparable transcription or voice annotation work on platforms operating in Nigeria.

Current rate: ₦3,500–₦5,000 per hour of completed recordings, depending on language and session complexity. This will be updated as market rates change.

Payment is processed within 5 business days of session completion. We do not withhold payment pending quality review — contributors are paid for their time, not penalized post-hoc for audio that doesn't meet technical criteria.

Our position: the economic value created by African linguistic data should benefit African contributors directly and immediately, not only as an externality of a company's eventual commercial success.

3. Data retention and deletion rights

Contributors have the right to request deletion of their recordings from our corpus at any time, subject to the technical constraint noted in Section 1: recordings already incorporated into published model weights cannot be fully removed from those weights without retraining.

To exercise a deletion request: email [email protected] with the subject line "Data deletion request" and the session date and language. We will acknowledge within 5 business days and action within 30 days.

We do not sell or share individual contributor data with third parties. Aggregated, anonymized dataset releases do not include personally identifiable information (names, phone numbers, location below state level).

Our data handling practices are aligned with Nigeria's Nigeria Data Protection Regulation (NDPR) and its successor, the Nigeria Data Protection Act 2023 (NDPA). We do not currently operate in the European Economic Area and have not conducted a GDPR assessment, but we will do so before any EU expansion.

4. Bias testing and measurement

AI systems can encode and amplify bias. In voice AI for Nigerian languages, the most significant risks are:

Accent bias: higher error rates for speakers from particular regions (e.g., Northern vs. Southern Nigerian English) or for female speakers vs. male speakers
Language bias: systematically better performance on Nigerian English than on Yoruba, Hausa, Igbo, or Pidgin — reflecting data imbalance rather than true capability
Domain bias: better performance on formal speech than on conversational, code-switched speech — which is how most Nigerians actually speak

Our evaluation methodology addresses all three:

Speaker-stratified evaluation. Our held-out test sets are stratified by speaker gender, region of origin, and dominant language. We report Word Error Rate (WER) and Character Error Rate (CER) broken down by each stratum, not only as a single aggregate figure. An aggregate WER that masks 3× worse performance on female speakers is not an acceptable result. See our public benchmark table for current baseline comparisons.

Code-switching evaluation. We maintain a dedicated code-switched test set drawn from real conversational speech. Performance on this set is reported separately from monolingual performance.

Adversarial probing. Before any production deployment, we test the system with inputs specifically designed to elicit failure modes: fast speech, heavy noise, strong regional accents, and multi-party crosstalk. We document failure modes publicly alongside performance claims.

Publication commitment. We commit to publishing our full evaluation results — including results where our system underperforms — in our model release documentation. We will not cherry-pick metrics or restrict evaluation to favorable test conditions.

5. Healthcare and financial services deployment

Our primary pilot sectors — healthcare and financial services — carry elevated risk for AI errors. We apply additional constraints for these deployments:

Human escalation is always available. Maraba is configured to route any caller to a human agent on explicit request, at any point in the conversation, with no penalty or delay. This is not optional and cannot be disabled by business customers.
Maraba does not make clinical or financial decisions. Maraba schedules appointments, collects information, and routes calls. It does not diagnose, prescribe, approve loans, process transactions above configured limits, or provide financial advice. These boundaries are enforced at the system level, not solely by prompt engineering.
Sensitive data minimization. Maraba does not store complete call transcripts by default. Transcript retention, if enabled by business customers, is subject to their own NDPA compliance obligations. Orinode's infrastructure retains only the data necessary for system operation and debugging.
Pilot monitoring. All pilot deployments operate with Orinode technical staff monitoring call outcomes during the first 30 days. We review a random sample of calls weekly for the first 3 months of any deployment.

6. Open accountability

We commit to transparency on failures, not only successes:

If we discover a systematic bias in a released model, we will disclose it publicly in our GitHub repository and on this page within 30 days of discovery, alongside a remediation timeline
If a production deployment produces a documented harmful outcome, we will report it in a public incident log on this page
This ethics framework will be updated at least annually. Material changes will be communicated to active research partners and pilot customers directly
Version history for this document is available at github.com/orinode/orinode-lm

We do not expect to get everything right. We expect to be honest when we don't.

7. Contact

For questions about this framework, data rights requests, or to report a concern:

Data rights: [email protected]
Research ethics: [email protected]
General: [email protected]

Orinode Ltd · Lagos, Nigeria