One basic challenge is the Internet’s textiness. The language database Ethnologue estimates that 3,535 of the world’s 7,105 living languages have no writing system whatsoever. It’s precisely this category, the unwritten half of all of today’s spoken languages, about which we know next to nothing. For hundreds of languages, linguists lack even the barest documentation — a word list, a brief recording, basic grammatical information. Outsiders, even those from the same region, may be entirely unaware of the language’s existence or merely consider it a broken, backward dialect. The number of speakers of each such language is usually under 10,000 and declining fast. Many won’t survive the century. Ninety-six percent of the world’s languages are spoken by just 4 percent of the world’s people.
Acknowledging our inability to know all the languages used in emails, texts, Skype calls and so on (maybe the NSA could help), Kornai nonetheless tries to survey all publicly available textual material online, with a particular focus on the hyperglot Wikipedia, which has versions in 287 languages (with another 533 in “incubator stage,” according to him). He rightly homes in on the invisible underpinnings that enable us to use a language online, such as input methods, OS support (on a range of devices, in countless applications), transliteration and translation and spell-checking tools. Just developing a Yiddish spell-checker, for instance, has required a stable input method for the modified Hebrew alphabet that Yiddish uses, the prior standardization of that alphabet (still contested), standardized spellings of most words (sometimes contested), technical ease in handling the Yiddish alphabet and a loaded dictionary.
Needless to say, support can be extremely patchy even for very widespread languages, and most of what exists has depended on open-source solutions and dedicated volunteers. Even translated versions of the most popular tools and sites so far have only a strictly limited reach. According to Kevin Scannell of the Indigenous Tweets project, as of late last year, you could search Google in 150 languages, use the Firefox browser is 105 languages, navigate Facebook in approximately 100 languages and find tweets in 139.
In all, there may be online primary materials of some sort in up to 1,500 languages, he estimates. Even this more generous number leaves 80 percent of the world’s languages invisible in the digital realm.