Skip to content

The Language Digitization Initiative is solutions-oriented and we focus on actionable deliverables.

The goal of Translation Commons’s Language Digitization Initiative (LDI) is to provide a vast array of resources such as toolkits, pilot studies, guidelines, and training to indigenous communities in order to improve their access to information.

Zero to Digital Guidelines

High-level overview and roadmap: evaluating the digital status of your language; the remaining steps and workflow needed for digitization of your language; and a glossary of useful terms.

Zero to Digital: A Guide to Bring Your Language Online

Spanish    French     Russian     Chinese     Arabic 

Zero to Digital: Terminology Guidelines

Spanish   French      Russian    Simplified Chinese

Instructions and workflow for documenting words, abbreviations, and phrases in your language relating to the various areas of specialized activity in your community. How to coin new words in your language as needed for the process of digitization.

Zero to Digital: Language Data Gathering Guidelines

Spanish (Download)    Russian (Download)    Chinese (Download)    Arabic (Download)

Guide to the kinds of existing language materials (print, audio, video, and otherwise) to be collected and preserved in order to support the digitization of your language. Explanation of the pairing of language materials with the various digital applications that require them.

Benefits of Digitization of a Language

(English | Simplified Chinese | Arabic | Spanish | French | Russian)

Language Documentation 

Guides for documenting and recording your language. TC recommends two excellent resources:

The “Language Gathering and Collection Guide” created by Benjamin Chung of First People’s Cultural Council guides you through the steps of “eliciting” or collecting words, sentences, conversations, stories, and more in your language. It provides user-friendly instructions on how to get started, some basic linguistic information to be aware of, ways to enhance accuracy, and different methods of collecting data appropriate for stories, conversations, word lists, and what he calls “Rapid Word Collection” and “Group Recordings”. This well-structured website is very easy to use and includes video clips, graphics for prompts, sample word lists, photos, audio clips, testimonials, and written step-by-step instructions. It is a very robust treatment of language documentation for the non-linguist, and will be useful to beginners and language advocates with no prior training as well as those with experience in the field.

Basic Oral Language Documentation” written by Will Reiman and hosted by ScholarSpace of the University of Hawai’i explains an oral-based approach to documenting your language. Many languages are primarily spoken and do not yet have a written tradition. In a nutshell, this approach uses three main steps without needing to wait for writing systems and spelling conventions to be established: “compile a sample of

recordings of a full range of speech event types; comment on those recordings [also as audio recordings]; and archive the complete corpus of recordings with an institution that will provide long-term access.” Reiman explains the process, the equipment needed, different set-ups you can use, and benefits of this method. The webpage provides a pdf of Reiman’s lecture slides with notes as well as an mp3 file with a recording of his lecture explaining the method

Indigenous Interpreting Training Manual

Indigenous Interpreting Training Workbook

TC Videos and Slide Presentations

Building Bridges for Digitization of Indigenous Languages by Jeannette Stewart (video)

An introduction to font issues for language support by Gerry Leonidas (video)

Text, Keyboard, Font, Essentials for your language on the internet by Craig Cornelius (video)

The Script Encoding Initiative and Unicode by Deborah Anderson (slides)

Accessing And Understanding Contents In Portuguese By Foreigners In Scientific Digital Libraries: Can This Methodology Be Generalized To Any Language And Script?  by Claudio Menezes (slides)

Language Documentation: A Brief Overview by Anna Belew (slides)

Terminology Management for Indigenous Languages by Sue Ellen Wright (video)

Terminology Management for Indigenous Languages by Sue Ellen Wright (slides)

IYIL 2019 Translation Commons Projects, TC Advisors (video)

Machine Translation for Indigenous Communities, by Kirti Vashee (video)

Discussion with UNESCO on IDIL, (video)

Indigenous Interpreters in Mexico, by Alexandra Hernandez Leon and Hector Santaella Barrera (video)

Modern NLP changes the requirements for building automatic Translation Systems, by Antonis Anastasopoulos (video)

Software Internationalization Principles and Strategies, by Tex Texin (video)

Bringing Newly Invented Scripts of South Asia into the Digital World, by Anshuman Pandey (video)

How Unicode Characters Become Glyphs on Your Screen, by Christopher Chapman, (video)

Making Optical Character Recognition Systems for Telugu, by Atul Negi (video)

Zero to Digital Series of Guidelines for Indigenous Communities, by Sue Ellen Wright (video)

Accelerating Support for Indigenous Languages in Digital Systems, by Jeannette Stewart and Craig Cornelius (video)

Digitization Solutions for Indigenous Languages, by Julie Anderson and Craig Cornelius (video)

Indigenous Languages Concerns of Identity and having Independent Script The case of India, by Siva Prasad Rambhatla (video)

Mukurtu CMS, by Michael Wynne (video)

Indigenous Community presentations

Principal Chief of Cherokee Nation Chuck Hoskin Jr. Address in UNESCO NA consultation (video)

Digitizing Koits Sunuwar, by Dev Kumar (video)

Sunuwar Digitization, Interview with Dev Kumar, by Julie Anderson (video)

Digitizing the Chakma Language, by Bivuti Chakma (video)

Chakma Digitization by Bivuti (video)

Sora Sompeng by Sony Salma (slides)

Children’s story in Mehri by Janet Watson (slides)

Selim and his shadow in Mehri by Janet Watson (slides)

Saami people and Climate Change, Klemetti Näkkäläjärvi lecture part 1 (video)

Saami people and Climate Change, Klemetti Näkkäläjärvi lecture part 2 (video)

UN – UNESCO Documents

Free and Prior Informed Consent

The Indigenous World 2020

UN Declaration on the Rights of Indigenous Peoples

UNESCO Policy on Engaging with Indigenous Peoples

UNESCO’s Engagement with Indigenous Peoples

The Sustainable Development Goals Repost 2019

UNESCO’s Internet Universality Indicators

Publishing of the Zero to Digital: A Guide to Bring Your Language Online was an important milestone on our journey to prevent and reverse global language extinction. But there is still a lot of work to be done for ensuring human rights and connectivity for all the native languages. We call for localization and internationalization experts, language translation professionals, language community advocates, linguists, font designers and marketing experts to join our mission in breaking the language and communication barriers!

In Progress

Machine Translation Guidelines

Repository Guidelines

Certification Guidelines

Mentoring Guidelines

Internship Guidelines

Zero to Digital: bringing a language online Pilot with Sunuwar language

Morphologizer Template and Pilot with Cherokee Nation

Future Projects

Language Technology Workshop  (Learn more)

Mapping the digitization status of all languages Database

Educational online teacher materials for sciences

Indigenous Translator Training

Verified by MonsterInsights