Logo

Our Datasets and AI Tools

Build better language models with expertly curated datasets and enterprise-grade AI tools. Access expertly curated African language datasets and AI-powered tools to build more accurate, inclusive language technology

Available Datasets

Dataset NameDescriptionCuration MethodRecords CuratedLanguagesData TypeValidationAccessActionCuration Year
XNLICross-lingual natural language inference for reasoning tasksAdapted22,555Igbo, Kinyarwanda, Kikuyu, Luo, Yoruba, Hausa, Nigerian PidginNon-Parallel TextHuman QAPublicDirect Access2025
KKD Parallel CorporaKiswahilli-African language parallel text for machine translationAdapted25,000Kiswahili ↔ English, Kidaw'ida, Kalenjin and DholuoParallel Text - MTHuman QAPublicDirect Access2025
MRL-BenchmarkCommonsense reasoning benchmarking dataset for LLMsCollaborated400Nigerian Pidgin, YorubaNon-Parallel TextHuman QAPublicDirect Access2025

AI-Powered Language Tools

Language Data Translation Validation Tool

Automatically validate translation accuracy and cultural appropriateness at scale.

Access Custom Language Tools

For technical inquiries or data access requests, please contact us at

Have a question or want to get started? Reach out to our team.

services@tonative.org