The Language Digitization Initiative is solutions-oriented and we focus on actionable deliverables.
The goal of Translation Commons’s Language Digitization Initiative (LDI) is to provide a vast array of resources such as toolkits, pilot studies, guidelines, and training to indigenous communities in order to improve their access to information.
Zero to Digital Guidelines
High-level overview and roadmap: evaluating the digital status of your language; the remaining steps and workflow needed for digitization of your language; and a glossary of useful terms.
Zero to Digital: A Guide to Bring Your Language Online
Spanish French Russian Chinese Arabic
Zero to Digital: Terminology Guidelines
Spanish French Russian Simplified Chinese
Instructions and workflow for documenting words, abbreviations, and phrases in your language relating to the various areas of specialized activity in your community. How to coin new words in your language as needed for the process of digitization.
Zero to Digital: Language Data Gathering Guidelines
Spanish (Download) Russian (Download) Chinese (Download) Arabic (Download)
Guide to the kinds of existing language materials (print, audio, video, and otherwise) to be collected and preserved in order to support the digitization of your language. Explanation of the pairing of language materials with the various digital applications that require them.
Benefits of Digitization of a Language
(English | Simplified Chinese | Arabic | Spanish | French | Russian)
Language Documentation
Guides for documenting and recording your language. TC recommends two excellent resources:
The “Language Gathering and Collection Guide” created by Benjamin Chung of First People’s Cultural Council guides you through the steps of “eliciting” or collecting words, sentences, conversations, stories, and more in your language. It provides user-friendly instructions on how to get started, some basic linguistic information to be aware of, ways to enhance accuracy, and different methods of collecting data appropriate for stories, conversations, word lists, and what he calls “Rapid Word Collection” and “Group Recordings”. This well-structured website is very easy to use and includes video clips, graphics for prompts, sample word lists, photos, audio clips, testimonials, and written step-by-step instructions. It is a very robust treatment of language documentation for the non-linguist, and will be useful to beginners and language advocates with no prior training as well as those with experience in the field.
“Basic Oral Language Documentation” written by Will Reiman and hosted by ScholarSpace of the University of Hawai’i explains an oral-based approach to documenting your language. Many languages are primarily spoken and do not yet have a written tradition. In a nutshell, this approach uses three main steps without needing to wait for writing systems and spelling conventions to be established: “compile a sample of
recordings of a full range of speech event types; comment on those recordings [also as audio recordings]; and archive the complete corpus of recordings with an institution that will provide long-term access.” Reiman explains the process, the equipment needed, different set-ups you can use, and benefits of this method. The webpage provides a pdf of Reiman’s lecture slides with notes as well as an mp3 file with a recording of his lecture explaining the method
Indigenous Interpreting Training Manual
Indigenous Interpreting Training Workbook
TC Videos and Slide Presentations
Building Bridges for Digitization of Indigenous Languages by Jeannette Stewart (video)
An introduction to font issues for language support by Gerry Leonidas (video)
Text, Keyboard, Font, Essentials for your language on the internet by Craig Cornelius (video)
The Script Encoding Initiative and Unicode by Deborah Anderson (slides)
Accessing And Understanding Contents In Portuguese By Foreigners In Scientific Digital Libraries: Can This Methodology Be Generalized To Any Language And Script? by Claudio Menezes (slides)
Language Documentation: A Brief Overview by Anna Belew (slides)
Terminology Management for Indigenous Languages by Sue Ellen Wright (video)
Terminology Management for Indigenous Languages by Sue Ellen Wright (slides)
IYIL 2019 Translation Commons Projects, TC Advisors (video)
Machine Translation for Indigenous Communities, by Kirti Vashee (video)
Discussion with UNESCO on IDIL, (video)
Indigenous Interpreters in Mexico, by Alexandra Hernandez Leon and Hector Santaella Barrera (video)
Modern NLP changes the requirements for building automatic Translation Systems, by Antonis Anastasopoulos (video)
Software Internationalization Principles and Strategies, by Tex Texin (video)
Bringing Newly Invented Scripts of South Asia into the Digital World, by Anshuman Pandey (video)
How Unicode Characters Become Glyphs on Your Screen, by Christopher Chapman, (video)
Making Optical Character Recognition Systems for Telugu, by Atul Negi (video)
Zero to Digital Series of Guidelines for Indigenous Communities, by Sue Ellen Wright (video)
Accelerating Support for Indigenous Languages in Digital Systems, by Jeannette Stewart and Craig Cornelius (video)
Digitization Solutions for Indigenous Languages, by Julie Anderson and Craig Cornelius (video)
Indigenous Languages Concerns of Identity and having Independent Script The case of India, by Siva Prasad Rambhatla (video)
Mukurtu CMS, by Michael Wynne (video)
Indigenous Community presentations
Principal Chief of Cherokee Nation Chuck Hoskin Jr. Address in UNESCO NA consultation (video)
Digitizing Koits Sunuwar, by Dev Kumar (video)
Sunuwar Digitization, Interview with Dev Kumar, by Julie Anderson (video)
Digitizing the Chakma Language, by Bivuti Chakma (video)
Chakma Digitization by Bivuti (video)
Sora Sompeng by Sony Salma (slides)
Children’s story in Mehri by Janet Watson (slides)
Selim and his shadow in Mehri by Janet Watson (slides)
Saami people and Climate Change, Klemetti Näkkäläjärvi lecture part 1 (video)
Saami people and Climate Change, Klemetti Näkkäläjärvi lecture part 2 (video)
UN – UNESCO Documents
Free and Prior Informed Consent
UN Declaration on the Rights of Indigenous Peoples
UNESCO Policy on Engaging with Indigenous Peoples
UNESCO’s Engagement with Indigenous Peoples
The Sustainable Development Goals Repost 2019
UNESCO’s Internet Universality Indicators
Publishing of the Zero to Digital: A Guide to Bring Your Language Online was an important milestone on our journey to prevent and reverse global language extinction. But there is still a lot of work to be done for ensuring human rights and connectivity for all the native languages. We call for localization and internationalization experts, language translation professionals, language community advocates, linguists, font designers and marketing experts to join our mission in breaking the language and communication barriers!
In Progress
Machine Translation Guidelines
Repository Guidelines
Certification Guidelines
Mentoring Guidelines
Internship Guidelines
Zero to Digital: bringing a language online Pilot with Sunuwar language
Morphologizer Template and Pilot with Cherokee Nation
Future Projects
Language Technology Workshop (Learn more)
Mapping the digitization status of all languages Database
Educational online teacher materials for sciences
Indigenous Translator Training