Our Datasets and AI Tools
Build better language models with expertly curated datasets and enterprise-grade AI tools. Access expertly curated African language datasets and AI-powered tools to build more accurate, inclusive language technology
Available Datasets
| Dataset Name | Description | Curation Method | Records Curated | Languages | Data Type | Validation | Access | Action | Curation Year |
|---|---|---|---|---|---|---|---|---|---|
| XNLI | Cross-lingual natural language inference for reasoning tasks | Adapted | 22,555 | Igbo, Kinyarwanda, Kikuyu, Luo, Yoruba, Hausa, Nigerian Pidgin | Non-Parallel Text | Human QA | Public | Direct Access | 2025 |
| KKD Parallel Corpora | Kiswahilli-African language parallel text for machine translation | Adapted | 25,000 | Kiswahili ↔ English, Kidaw'ida, Kalenjin and Dholuo | Parallel Text - MT | Human QA | Public | Direct Access | 2025 |
| MRL-Benchmark | Commonsense reasoning benchmarking dataset for LLMs | Collaborated | 400 | Nigerian Pidgin, Yoruba | Non-Parallel Text | Human QA | Public | Direct Access | 2025 |
AI-Powered Language Tools
Language Data Translation Validation Tool
Automatically validate translation accuracy and cultural appropriateness at scale.
For technical inquiries or data access requests, please contact us at
Have a question or want to get started? Reach out to our team.
services@tonative.org