LANGUAGE BIASES IN TECH: A FULL STACK PROBLEM

AN XIAO MINA
February 10, 2016
8:28 am

Take a minute to imagine you’re a newcomer to the internet. First of all, you are not alone. The web has been around for decades, yes, but on the scale of the world’s population, regular connectivity is still technically a minority experience. With an estimated 3.3 billion internet users out of a world population of 7.2 billion, and a stunning 833 percent growth rate over the past five years, we can expect diversity on the internet to increase significantly, especially as the world internet population inches toward a tipping point.

Now imagine you don’t speak English, Chinese, Arabic, Spanish or another majority language on the internet. Imagine you speak Bihari or Ilokano, minority languages in India and the Philippines, respectively. Again, your experience isn’t unique. With the so-called “next billion” coming online, we can expect a significant increase in language diversity on the internet.

For English speakers, the internet might seem like a teeming wonderland of information and games and social connections, but for those who are just coming online, the internet has a dearth of content—if any—in their native languages. The pipelines for voice and civic action that we’ve seen for much of the world are facing a significant challenge: crossing language and cultural barriers.

For one, some languages are completely invisible and unusable on browsers, operating systems, and keyboards. In the words of Tibetan blogger Dechen Pemba, who can’t access the Tibetan language on a phone:

Given that the Tibetan literary tradition goes back to the 7th century and its linguistic influence reaches far across the Himalayas encompassing areas of India, Bhutan, Mongolia, Russia and Pakistan, my pet hate is when Tibetan language is described as “obscure”. I wonder how it is possible that the language of Tibetan Buddhism and Tibetan Buddhists, comprising of as many as 60 million people, can be wilfully left behind in terms of modern technology? For instance, Google has failed to incorporate a Tibetan font into its Android software, failed to develop a Tibetan language interface and failed to include Tibetan in Google Translate, the most useful of tools. At least Apple has seen the light there.

In a recent series of lectures at UCLA hosted by the Digital Media Arts program and the Processing Foundation, I talked through some of these issues, drawing on an essay I’d written for the Digital Asia Hub, a new think tank in Hong Kong that’s grown out of the Berkman Center for Internet and Society.

Here’s a summary of the key points I think we should be paying attention to with regards to the language biases inherent to our technologies. These are pulled directly from the Digital Asia Hub essay and transcripts from the UCLA talk provided by the terrific Open Transcripts, with minor editing to contextualize the words for this piece:

Language biases create sharp divides in the global web—laying the foundation for digital ghettos of information and community.

Without improved language and writing script support, new netizens run the risk of living in digital ghettos created by their native tongues. Any online actions they engage in or media they create will be largely invisible and unappreciated by those outside their cultural-linguistic spheres. This can have significant effects, for instance, on human rights advocacy, which can depend so heavily on using social media and email to raise awareness among international news sources.

New internet users who don’t speak majority languages will likely be unable to participate in global internet culture and conversations as both readers and contributors. A number of internet researchers looking at language divides online have noted that minority languages speakers, especially those from the global south, will experience substantial information inequality online. Indeed, people’s inability to speak English can significantly affect their very adoption and use of the internet, even if they are aware of its existence.

The internet has proven to be a crucial pipeline for attention for those who have traditionally been marginalized. But language barriers can prevent the broader public from understanding their voices.

I think a lot of us are familiar with the internet’s role in building social movements and the ability to amplify one’s perspective and words. Certainly the Umbrella Movement in Hong Kong and the Black Lives Matter movement here in the U.S. rely on the ability to broadcast a message, to use hashtags, and to create a pipeline from social media to mainstream media, and then hopefully to other audiences.

And certainly we can think about major hashtags and major movements that’ve been in English or a majority language: #TweetLikeAForeignJournalist in Kenya was a critique of media coverage of East Africa. And then #JeSuisCharlie, a simple enough French phrase for people to remember, understand and repeat online and offline.

But there are a number of other movements in other languages that are more difficult to understand, and get significantly less attention: There’s #sassoufit in Congo; there’s the gau wu (#鳩嗚 ) movement, part of the Hong Kong Umbrella Movement, but also a tangential group with different aims and strategies. As I argued at a recent panel on the topic of biased data, language is one important barrier that prevents these movements from reaching a wider audience.

Ultimately, language biases in our technologies are a full stack problem. These compound on each other, and as technologists, we have to think holistically about solutions.

In technology design we talk about the full stack, a series of the layers, such as the code and the user interface, on which software is built. As we note during the biased data panel discussion, human-facing part of that code is in English. Admittedly, much of code is constructed from simple phrases, like “if” and “then”. Yes, you can learn those phrases, but imagine trying to relearn code in a language that you don’t speak, and suddenly having to learn two languages: the programming language and then the language in which the programming language is expressed.

And then it moves up to the typography pressures. The ability to input Arabic on a mobile phone up until recently was severely limited, and Arabic speakers developed “Arabizi”, a chat language made of Roman letters and numbers to express their language online. This was incredibly creative, but it was also a response to a lack of support for the Arabic script. This affects many other languages whose primary script is not Latin.

Then it goes up from there into content. If you want to engage with the broader internet, you have to have access, and we can include language as a form of access. As one example, Stack Overflow is a critical go-to source for the open source community and coders in general, but the majority of the knowledge on the site is only available in English and Portuguese right now. If someone who speaks neither language wants to ask a question from this rich community of more experienced practitioners, whom could they ask?

And then the stack moves all the way to the typography. We’re talking about the political decisions around typography. In languages that use Latin letters, you have a wide variety of typography and fonts that you can use, and if you have that kind of critical knowledge about the implications of all these fonts you can really make important design decisions. But if you have access to only one or two fonts, suddenly the ability for you to cre

ate a space around the very content and the sites that you’re trying to create again becomes limited and you’re inheriting someone else’s designs around your typography.

To be clear, language biases in tech are an extension of the language biases we live with in broader society. As we discuss what it means to “speak American” in this diverse, multilingual country, and as we look to a world multilingual internet, it’s important to remember how often language barriers manifest. Just recently, I wrote about U.S. candidates’ attempts at Spanish language engagement on Twitter, which sometimes falls flat for native speakers. Both Clinton and Sanders have been called to task online for their not-always-perfect Spanish:

https://speakbridge.io/medias/embed/democratic-debates-2016/democratic-debates-2016-general/725

https://speakbridge.io/medias/embed/democratic-debates-2016/democratic-debates-2016-general/706

This is a bias of content, one that is higher up on the technology stack, but that creates a barrier between a candidate and their electorate. Whether a language is misunderstood, or, like Tibetan, completely invisible, the barrier of understanding creates a barrier to access. Solving this at all levels will take a lot of work, but it will be essential for a truly interconnected, accessible, and civically-engaged internet.

FAILURES OF OUR GLOBAL IMAGINATION

The problem with first world problems, and why we need to shift the way we talk about global tech

EILÍS O'NEILL
October 26, 2015
5:24 pm

There are #firstworldproblems, and there are #thirdworldproblems. When it comes to communications technologies and phones, there are the problems of being a human being in the 21st century. Recent articles about the role of mobile phones for migrant and refugee communities have unleashed a torrent of tweets and articles: If they are so in need for help, why do refugees have phones? How can they possibly be that desperate if they can afford a data plan?

I used to joke about #firstworldproblems myself, but after seeing misunderstandings like these, I stopped.

To be fair, these are good questions if your image of technology is that of luxury and distraction, and if your image of refugees is stuck in 20th century imagery of destitution. But travel the world over, and the role of mobile phones is clear: They are as essential as clothes, money, food and water. They help people stay in touch for business and family reasons. They have maps. They can help people take notes and share those notes across long distances. They have music. They are more affordable than clean, running water and more portable than a suitcase. In a mud hut in East Africa, a crowded bus in Southeast Asia, by the river in rural China, phones and their capabilities can improve the lives of many, both for utilitarian and emotional purposes. They are the Swiss Army Knives of the 21st century.

This misunderstanding is unfortunate but not uncommon when it comes to narratives about the global south. When President Obama boarded a plane for a state visit to Kenya, CNN described the country as a “hotbed of terror.” This was not the first time broadcast media made a sweeping generalization about the country, and Kenyans on Twitter quickly revived the hashtag #SomeoneTellCNN to draw attention to the absurdity. In previous years, such as during the 2013 elections, hashtags such as this one and #TweetLikeAForeignJournalist drew attention to outdated generalizations about life in the country. This past year, the hashtag jokes even prompted a visit from CNN’s managing editor, who publicly apologized for the mis-characterization.

A similar flurry of misguided articles erupted recently around Taylor Swift’s concert tour in China. A rumor emerged that Swift’s t-shirt line, bearing the phrase “T.S. 1989,” would be repurposed as a thinly veiled reference to the Tiananmen Square incident in 1989. Article after article in English-speaking press suggested there would be heavy censorship of the shirts online and that the tour itself might even be canceled. Nothing of the sort was happening inside the country. Writing in Vox, Max Fisher identified the source of the confusion: it’s hard to see past the story of censorship in China and imagine the daily lives of young people living under censorship and enjoying pop music from around the world.

I think of the examples above as symptomatic of a larger problem: we have what author Claire Light has called a failure of our global imagination. And by “we,” I mean those of us living and working in privileged Western contexts, far removed from the daily lives of those living in places of war, censorship, and rapid industrialization. We can imagine the general experiences and emotions that these words evoke—fear, doubt, uncertainty, excitement—but we cannot imagine the ins and outs, the everydays, the way people live under circumstances very different from our own.

This failure can have devastating consequences. Empathy is founded in our ability to see ourselves in the lives of others, to understand their pain and suffering and respond with compassion. If we cannot imagine the lives of others very different from ourselves, we cannot empathize with their joys and sorrows, and if we take as a frame of reference our own experiences, we cannot deeply engage with others’ lived experiences. If we assume that phones are frivolous, luxury devices for playing games and getting distracted at the dinner table, we cannot imagine how critical they are for helping people find their way to nearby safe points—and then we overlook the need to distribute prepaid SIM cards alongside water bottles. If we assume that transparency and openness are universal goods, we cannot imagine how that openness can be terrifying for a queer person trying to live safely and with dignity in a country with anti-LGBT legal structures—and then we enact Terms of Service and user experiences that promote the very thing (visibility) that can make their lives more dangerous.

The world has long been interconnected. If yesterday’s globalization was one of mass production and distribution of objects and a one of political relations, the globalization of today is that of people to people. Thanks to increased mobility and, more broadly, global internet connectivity, we are more in touch with the images, words and narratives of people living in parts of the world we may have never heard of and heard from. A tweet in Spanish can spark surprise in the English-speaking world, and an image meme made on China’s Sina Weibo can slowly wind its way over to Egyptian Facebook. A lot of people are finding voice, but it’s just as important—if not more so—that with this interconnectedness, those of us with greater privilege and access understand our own responsibility to listen and, where appropriate, amplify.

How can we change this? How can we transform our global imaginations to understand the rich diversity of human life and living in the 21st century? While travel outside major tourist zones can help, scaling that up for all people can be difficult, if not impossible. The tourist industry makes it easy for people with means to skip over to new cities, but it also cloaks the diversity of local life and living, something that takes time, language skills, and patience to understand and experience.

More importantly, for those of us who have the privilege of accessing diverse global experiences, we need to shift the narrative. And we need to listen and reflect the stories of others’ lives more effectively. Here’s how I think we can do that.

Writers about tech need to nuance the narrative. New tech has different effects on different people, and not everyone is a middle class Westerner. Every time we trot out the tired argument that selfies are a form of narcissism, we limit our perspectives on the vast diversity of creative production enabled by new and networked technologies like smartphones. There is no doubt that some selfies for some people serve a narcissistic, self-aggrandizing purpose. But selfies can be a form of advocacy, of creating visibility for underrepresented people, of simply connecting with family and friends back home (sometimes “home” is thousands of miles away). It can seem puzzling to see refugees arriving on the coasts of Greece with selfie sticks, but what better way than a selfie to tell family and friends back home that you’ve made it back safely and in good spirits?

The broader dialogue in Western media and intellectual culture must stop critiquing technology’s effects on society with a lens that seems to focus largely on middle class Americans and Western Europeans. Phones can certainly distract from in-person conversations, but they can also facilitate vital connections amongst a global diaspora. When we broadly apply critiques of technology to everywhere and every context, we overlook important discourses around justice, intent and power. We need to understand how technology is used in different cultures and for those with limited resources, and we need to remember positionality when it comes to how people use technology.

We need to tell the human stories of the next billion. Really tell them. Use photos. Use stories. Use videos. It’s not enough to talk about the “next billion” in abstract, like an opportunity to reach teeming masses of people ripe for monetization. We need to understand their lives and their priorities with the sort of detail that can build empathy for other people living under vastly different circumstances. A common misperception I’ve heard about refugees fleeing their country is that they probably wouldn’t prioritize their phones. And yet, it’s almost certain that anyone in a natural disaster in San Francisco would grab their smartphones. How else will they call their families, access resources, alert the authorities?

Can we shift narratives about the developing world to talk about building agency through technology? To talk about connecting once-disconnected communities through technology? Can we move past the sweeping discussions of marketing and monetization opportunities for the next billion and learn more about what motivates them to use the internet and phones in the first place? It’s one thing to share photos of a solar-powered cell phone in the Sahara; it’s quite another to tell the story of the music the phone’s owner listens to and how he uses the texting feature to stay in touch with family while he travels.

It’s time to abandon the First World/Third World dichotomy. Whether or not this dichotomy was a helpful one at some point in the past, it’s no longer helpful now. The “Third World” has glittering skyscrapers and glowing smartphones, and the “First World” has decaying neighborhoods and entire swaths of the country without broadband. There are very real and important differences between rich and poor countries, and these dynamics play out at the level of international relations, all the way down to the mundane and often humiliating work of applying for visas. But this framing creates a divide that limits our capacity to understand the vast spectra of the way human beings live in the 21st century. I don’t yet have a better vocabulary for this, but I hope someone smarter than me can figure that out. For now, I do use the phrases “developing world,” “global south,” and “poor countries,” but I’d like to have a better framework. Any suggestions?

Remember the diversity of ways we use communications technology: that includes connecting with people we care about and depend on. In contrast to narratives about vanity, slacktivism, and luxury when it comes to tech in the middle-class West, so much of the conversation about technology in the global south focuses on information and practical communications, like around agricultural trends and educational material. This is good and important work. But highly pragmatic use cases are just part of the reason anyone has used communications technology. Informal markets from Asia to Africa are filled with music and movies, like a Bluetooth-powered Napster, and people are just as likely to send text messages and Facebook posts to check in with friends and loved ones as they are to access important healthcare information and market reports. These things can coexist.

Like a city, the internet and mobile phones provide for a vast diversity of human needs, which include the basic human need for companionship, support, and access to joy in the face of suffering. Fortunately, this part of the global imagination doesn’t require too much effort: Just think of how everyone you know uses technology, the number of apps, the different ways they laugh, smile, cry, and scowl at what they see behind those plates of glass.

Shifting the narrative is such a critical part of the motivation behind my work with global internet cultures, and the above are just a few ideas for how I think we can do that. But more important than trying to know everything about the world is establishing a culture of knowing that we don’t know. The assumption that we can parachute into a foreign culture with formal expertise and knowledge and make things better has never been acceptable, and it has led to a lot of unnecessary suffering, especially in colonized countries. The fact that people in marginalized parts of the world can now call out misguided attitudes and perceptions about them will go a long way, and those of us with access to media and policy can do well to amplify and extend these voices.

But it is also not possible to know every detail about other people’s lives. Attention is limited, as is time. We can learn everything we can about the day to day of rural Laos, but the conflict in Mali will seem completely opaque. Instead, it’s more important to know that we don’t know, know that we need to listen to those who have greater familiarity, and to know that there are ways to go further. Adopting an attitude of humility and curiosity can take us much farther than an attitude of assuredness and assumption. This seems to me like a good place to start—and if you have other and better ideas, I’d love to hear them.

Civicist

CIVIC TECH NEWS & ANALYSIS

LANGUAGE BIASES IN TECH: A FULL STACK PROBLEM

FAILURES OF OUR GLOBAL IMAGINATION