recentpopularlog in

robertogreco : unicode   15

How the Appetite for Emojis Complicates the Effort to Standardize the World’s Alphabets - The New York Times
"nshuman Pandey was intrigued. A graduate student in history at the University of Michigan, he was searching online for forgotten alphabets of South Asia when an image of a mysterious writing system popped up. In eight years of digging through British colonial archives both real and digital, he has found almost 200 alphabets across Asia that were previously undescribed in the West, but this one, which he came across in early 2011, stumped him. Its sinuous letters, connected to one another in cursive fashion and sometimes bearing dots and slashes above or below, resembled those of Arabic.

Pandey eventually identified the script as an alphabet for Rohingya, the language spoken by the stateless and persecuted Muslim people whose greatest numbers live in western Myanmar, where they’ve been the victims of brutal ethnic cleansing. Pandey wasn’t sure if the alphabet itself was in use anymore, until he lucked upon contemporary pictures of printed textbooks for children. That meant it wasn’t a historical footnote; it was alive.

An email query from Pandey bounced from expert to expert until it landed with Muhammad Noor, a Rohingya activist and television host who was living in Malaysia. He told Pandey the short history of this alphabet, which was developed in the 1980s by a group of scholars that included a man named Mohammed Hanif. It spread slowly through the 1990s in handwritten, photocopied books. After 2001, thanks to two computer fonts designed by Noor, it became possible to type the script in word-processing programs. But no email, text messages or (later) tweets could be sent or received in it, no Google searches conducted in it. The Rohingya had no digital alphabet of their own through which they could connect with one another.

Billions of people around the world no longer face this plight. Whether on computers or smartphones, they can write as they write, expressing themselves in their own linguistic culture. What makes this possible is a 26-year-old international industrial standard for text data called the Unicode standard, which prescribes the digital letters, numbers and punctuation marks of more than 100 different writing systems: Greek, Cherokee, Arabic, Latin, Devanagari — a world-spanning storehouse of languages. But the alphabet that Noor described wasn’t among them, and neither are more than 100 other scripts, just over half of them historical and the rest alphabets that could still be used by as many as 400 million people today.

Now a computational linguist and motivated by a desire to put his historical knowledge to use, Pandey knows how to get obscure alphabets into the Unicode standard. Since 2005, he has done so for 19 writing systems (and he’s currently working to add another eight). With Noor’s help, and some financial support from a research center at the University of California, Berkeley, he drew up the basic set of letters and defined how they combine, what rules govern punctuation and whether spaces exist between words, then submitted a proposal to the Unicode Consortium, the organization that maintains the standards for digital scripts. In 2018, seven years after Pandey’s discovery, what came to be called Hanifi Rohingya will be rolled out in Unicode’s 11th version. The Rohingya will be able to communicate online with one another, using their own alphabet."



"Unicode’s history is full of attacks by governments, activists and eccentrics. In the early 1990s, the Chinese government objected to the encoding of Tibetan. About five years ago, Hungarian nationalists tried to sabotage the encoding for Old Hungarian because they wanted it to be called “Szekley-Hungarian Rovas” instead. An encoding for an alphabet used to write Nepal Bhasa and Sanskrit was delayed a few years ago by ethnonationalists who mistrusted the proposal because they objected to the author’s surname. Over and over, the Unicode Consortium has protected its standard from such political attacks.

The standard’s effectiveness helped. “If standards work, they’re invisible and can be ignored by the public,” Busch says. Twenty years after its first version, Unicode had become the default text-data standard, adopted by device manufacturers and software companies all over the world. Each version of the standard ushered more users into a seamless digital world of text. “We used to ask ourselves, ‘How many years do you think the consortium will need to be in place before we can publish the last version?’ ” Whistler recalls. The end was finally in sight — at one point the consortium had barely more than 50 writing systems to add.

All that changed in October 2010, when that year’s version of the Unicode standard included its first set of emojis."



"Not everyone thinks that Unicode should be in the emoji business at all. I met several people at Emojicon promoting apps that treat emojis like pictures, not text, and I heard an idea floated for a separate standards body for emojis run by people with nontechnical backgrounds. “Normal people can have an opinion about why there isn’t a cupcake emoji,” said Jennifer 8. Lee, an entrepreneur and a film producer whose advocacy on behalf of a dumpling emoji inspired her to organize Emojicon. The issue isn’t space — Unicode has about 800,000 unused numerical identifiers — but about whose expertise and worldview shapes the standard and prioritizes its projects.

“Emoji has had a tendency to subtract attention from the other important things the consortium needs to be working on,” Ken Whistler says. He believes that Unicode was right to take responsibility for emoji, because it has the technical expertise to deal with character chaos (and has dealt with it before). But emoji is an unwanted distraction. “We can spend hours arguing for an emoji for chopsticks, and then have nobody in the room pay any attention to details for what’s required for Nepal, which the people in Nepal use to write their language. That’s my main concern: emoji eats the attention span both in the committee and for key people with other responsibilities.”

Emoji has nonetheless provided a boost to Unicode. Companies frequently used to implement partial versions of the standard, but the spread of emoji now forces them to adopt more complete versions of it. As a result, smartphones that can manage emoji will be more likely to have Hanifi Rohingya on them too. The stream of proposals also makes the standard seem alive, attracting new volunteers to Unicode’s mission. It’s not unusual for people who come to the organization through an interest in emoji to end up embracing its priorities. “Working on characters used in a small province of China, even if it’s 20,000 people who are going to use it, that’s a more important use of their time than deliberating over whether the hand of my yoga emoji is in the right position,” Mark Bramhill told me.

Since its creation was announced in 2015, the “Adopt a Character” program, through which individuals and organizations can sponsor any characters, including emojis, has raised more than $200,000. A percentage of the proceeds goes to support the Script Encoding Initiative, a research project based at Berkeley, which is headed by the linguistics researcher Deborah Anderson, who is devoted to making Unicode truly universal. One the consortium recently accepted is called Nyiakeng Puachue Hmong, devised for the Hmong language by a minister in California whose parishioners have been using it for more than 25 years. Still in the proposal stage is Tigalari, once used to write Sanskrit and other Indian languages.

One way to read the story of Unicode in the time of emoji is to see a privileged generation of tech consumers confronting the fact that they can’t communicate in ways they want to on their devices: through emoji. They get involved in standards-making, which yields them some satisfaction but slows down the speed with which millions of others around the world get access to the most basic of online linguistic powers. “There are always winners and losers in standards,” Lawrence Busch says. “You might want to say, ultimately we’d like everyone to win and nobody to lose too much, but we’re stuck with the fact that we have to make decisions, and when we make them, those decisions are going to be less acceptable to some than to others.”"
unicode  language  languages  internet  international  standards  emoji  2017  priorities  web  online  anshumanpandey  rohingya  arabic  markbramhill  hmong  tigalari  nyiakengpuachuehmong  muhammadnoor  mohammedhanif  kenwhistler  history  1980  2011  1990s  1980s  mobile  phones  google  apple  ascii  facebook  emojicon  michaelaerard  technology  communication  tibet 
october 2017 by robertogreco
a16z Podcast: The Meaning of Emoji 💚 🍴 🗿 – Andreessen Horowitz
"This podcast is all about emoji. But it’s really about how innovation really comes about — through the tension between standards vs. proprietary moves; the politics of time and place; and the economics of creativity, from making to funding … Beginning with a project on Kickstarter to crowd-translate Moby Dick entirely into emoji to getting dumplings into emoji form and ending with the Library of Congress and an “emoji-con”. So joining us for this conversation are former VP of Data at Kickstarter Fred Benenson (and the 👨 behind ‘Emoji Dick’) and former New York Times reporter and current Unicode emoji subcommittee member Jennifer 8. Lee (one of the 👩 behind the dumpling emoji).

So yes, this podcast is all about emoji. But it’s also about where emoji fits in the taxonomy of social communication — from emoticons to stickers — and why this matters, from making emotions machine-readable to being able to add “limbic” visual expression to our world of text. If emoji is a (very limited) language, what tradeoffs do we make for fewer degrees of freedom and greater ambiguity? How exactly does one then translate emoji (let alone translate something into emoji)? How do emoji work, both technically underneath the hood and in the (committee meeting) room where it happens? And finally, what happens as emoji becomes a means of personalized expression?

This a16z Podcast is all about emoji. We only wish it could be in emoji!"
emoji  open  openstandards  proprietarystandards  communication  translation  fredbenenson  jennifer8.lee  sonalchokshi  emopjidick  mobydick  unicode  apple  google  microsoft  android  twitter  meaning  standardization  technology  ambiguity  emoticons  text  reading  images  symbols  accessibility  selfies  stickers  chat  messaging  universality  uncannyvalley  snapchat  facebook  identity  race  moby-dick 
august 2016 by robertogreco
Emojis are no longer cool in Japan.
"And so Japanese netizens moved on. The big thing in Japan now is Line, a wildly popular free messaging app that has some 58 million domestic users, though almost no profile abroad. Its key feature is users’ ability to send “stamps,” essentially mascots and cartoon characters, back and forth. Emojis differ slightly from platform to platform; Android’s don’t precisely resemble those used on Apple or Twitter or Skype, and vice versa. But since LINE users are all aboard the same service, the stamps are always the same. And many are set to automatically pop up in a window based on what you type, just like predictive text or spelling suggestions. The emojis are still there, too. But they’re relegated to a sideshow role rather than being the main event, all but upstaged by the far more flamboyant stamps.

Line stamps also have a big leg up on emojis from another standpoint: commercialization. The company’s main source of revenue is hawking an ever-changing array of new stamp sets to customers, many of which are designed in concert with big corporations merchandising their product lines. Line makes a great deal of money from paid corporate and celebrity tie-ins, which can run into the hundreds of thousands of dollars depending on the length of the campaign. On the other hand, even a company as deep-pocketed as Disney wouldn’t be able to pay Unicode to immortalize Mickey, Goofy, and the rest of the crew as emojis. The emoji system simply isn’t designed for monetization. There’s no such barrier with LINE.

The emojis have also taken another hit in Japan from an unlikely culprit: emoticons, those little pictorial representations of facial expressions constructed from punctuation marks. The West has emoticons and text art too, of course. Most English-speaking net users are familiar with the ubiquitous smiley :-) and frowny :-( marks and a handful of others. But Japanese emoticons—known as kaomoji, or face-text—come in a dizzying array of variations. They are complicated mixtures of punctuation, Japanese kana, foreign letters, and even scientific symbols, resembling something Dr. Frankenstein might have built had he majored in linguistics rather than played God. Perhaps the most common is キタ━━━━(゚∀゚)━━━━!! Pronounced kita, it’s the illustration of an excited “all right!” or “here we go!” that’s deployed endlessly on Japanese Twitter and chat rooms. The kaomoji express emotions in the way emojis do, but they’re composed of standard fonts rather than being illustrated by anyone in particular. This lets the constructions retain more ambiguity design-wise, which is a fancy-pants way of saying they’re more kawaii.

So bearing all of the above in mind: If emojis are in their twilight in their homeland, what are the implications for their popularity abroad? Are we heading for an emojipocalpyse?"

[Line app: http://line.me/ ]
emoji  japan  2015  stickers  lineapp  unicode  applications  messaging  emoticons 
december 2015 by robertogreco
Registering and naming your baby | nidirect
"What you need to register a birth
You will need to provide the following to register a birth:

- a birth registration form filled in by the person registering the birth (usually the mother)
- full name of the baby - you can register your child's name in any language - providing you use any unicode character
- sex and date of birth of the baby
- district and place of birth of the baby
- full names and dates of birth of parents
- full addresses and occupations of the parents
- Registration of a birth - form GRO 4 (PDF 23 KB)
- Help with PDF files"
unicode  birth  names  naming  northernireland  identity  language 
april 2015 by robertogreco
I Can Text You A Pile of Poo, But I Can’t Write My Name by Aditya Mukerjee | Model View Culture
"We can’t ignore the composition of the Unicode Consortium’s members, directors, and officers -- the people who define the everyday writing systems of all languages across the globe."



"Determining which graphemes and glyphs are essential to a given ethno-linguistic group is a tough problem. Identifying all of these for all languages in widespread use is even more challenging. But one thing is clear: we cannot design an alphabet meant for everyday use by native speakers of a language without the primary input of native speakers of these languages.

Out of compatibility concerns, the Unicode Consortium is unlikely to modify the 224,024 characters that have already been defined in any future updates. It took half a century to replace the English-only ASCII with Unicode, and even that was only made possible with an encoding that explicitly maintains compatibility with ASCII, allowing English speakers to continue ignoring other languages.

But that still leaves 80% of the codepoints unused. As the Unicode Consortium decides which characters to allocate, there are a number of ways to ensure that Unicode accurately reflects the stated goal of representing “all characters in widespread use today”.

Membership in the Consortium is not free, or even cheap. Full membership and voting rights cost $18,000 (and tellingly, all prices are listed in USD only). Discounts are already provided at lower membership tiers for non-profit organizations, such as the Mormon church. These discounts could be expanded to full membership, and to for-profit groups from non-European countries where English is a minority language. The Consortium could establish an explicit hiring plan to guarantee that its staff represent the many languages that it seeks to standardize. The Consortium could adopt bylaws that ensure that technical committee members and officers are not dominated by native English speakers. There are other measures that the Consortium can and should take as well, but these three are very straightforward both to implement and to evaluate, so they make a good starting point.

Gayatri Chakravorty Spivak has written, ‘The subaltern cannot speak’. They are structurally prohibited from having any dialogue – even an unbalanced one – with the very powers that oppress them. Access to digital tools that respect our languages is crucial to communicating in the Internet age. The power to control the written word is the ability both to amplify voices and to silence them. Anyone with this power must wield it with caution.

Whatever path we take, it’s imperative that the writing system of the 21st century be driven by the needs of the people using it. In the end, a non-native speaker – even one who is fluent in the language – cannot truly speak on behalf the monolingual, native speaker. For them, the language is simply a way of exploring a different part of their world, or of exploring familiar parts in a new way. For the native speaker, the language is not merely a novelty. It is the gateway to accessing life and society itself."
culture  language  unicode  technology  discrimination  internet  web  2015  inclusion  emoji  standards  universality  webstandards  bengali  adityamukerjee  history  gayatrichakravortyspivak  subaltern  diversity  inlcusivity  inclusivity 
march 2015 by robertogreco
Diacriticism
"This is a tool for abusing combining diacritical marks made by Niel McLaren. Source code."

[via: http://roomthily.tumblr.com/post/112666276437/diacriticism

"d̸̡͎̫͚ͬ̅ͭ̚͟í̟̲͒ͩ̓̒́͡͠a̷̴̙̓̾͂̑̂̽͘c̙͚͖̰͂͒̏͒̅̿r̛͍̳͍̪̖ͦ̽̈́̕i͏͓͙̯̰̭̂͑́͞ṭ̡̣̙͋́̓͑ͫ͘i̛͕̬ͤͣ̈́̆ͮ̐͘c͈̯̯͇̙̣ͣ̍́̅i̡̡̭̻͖̙͍ͦ͐̅s̭͚̲̰̄̆ͥ̆͌͋m̘̜̑͊̂͒́̾̏ͨ - abusing diacritics in a mildly lovecraftian way. lovecraftian in the way it is expressed on stackoverflow discussions of parsing html with regex and publicly shaming repos that don’t handle unicode terribly well."]
typography  unicode  diacritics  nielmclaren  html  regex 
march 2015 by robertogreco
Home :: Emojicons
"Welcome to Emojicons, your one-stop plot of internet land for every ლ(╹◡╹ლ), ¯\_(ツ)_/¯, ಠ_ಠ, and (╯°□°)╯︵ ┻━┻ you can possibly imagine. We're here to serve your every textual need, providing a relentless number of ways to refine your chats, tweets, IMs, Facebook posts, YouTube responses, Reddit comments, forum flaming, rage quitting, trolling, and every other type of written discourse. Emoticons, kaomoji, facemarks, and smileys galore!

So indulge, coddle, and rampage through this site to find every emoticon that speaks to your whimsical soul, then fling copious amounts of it all across the internets :)."
emoji  emoticons  unicode 
february 2015 by robertogreco
Respect Myanmar Diversity: Use Unicode Fonts | ISIF Asia
"Burmese is the dominant language of Myanmar, but its had a long and winding journey in the digital realm, and now there is a tension between two competing systems to represent it online.

Unlike Latin script or pictograph scripts like Chinese, Burmese doesn’t use spaces between words and generally doesn’t fit into nice, tidy blocks that are easy for computers to render on a screen.

Almost all languages have fonts that adhere to the Unicode standard for the consistent encoding, representation and handling of text. In Myanmar the development of Unicode compliance had a very slow start, and until recently, there wasn’t a strong Unicode standard.

To help Myanmar enter the digital age, a group of individuals produced the Zawgyi font to represent Burmese script. Most of the tech elite learned to type using Zawgyi, and like the American Qwerty system, the network effects – from keyboards to typing classes – has made Zawgyi the most widely used font. However, its popularity doesn’t mean Zawgyi is the best font to use.

Technologically, Zawgyi is a nightmare for backend software development, as it requires extensive customization to present the font correctly. The font itself also needs to be installed on computers or mobile phones, which can be a technical hurtle for novice users.

But culturally, there is an even greater imperative to use Unicode instead of Zawgyi. Zawgyi is useless for typing other ethnic Myanmar languages that use Burmese script, like Sanksrit, Shan, and Mon. Myanmar already has a rocky history (past and present) with ethnic minorities, and we should not use any digital tool that excludes them or presents a barrier to their digital voice.

Unicode fonts support 11 languages that use the Myanmar script, including Burmese, Pali, Sanskrit, Mon, Shan, Kayah, Rumai Palaung, and four Karen languages. Unicode is now standard on Android devices, which are and will be the most popular way to get online in Myanmar, and over 30% of Myanmar government websites use Unicode.

So it is time for all of us to use Unicode fonts to communicate in Myanmar, so we can truly communicate with everyone."

[via: http://www.ictworks.org/2014/08/20/myanmar-will-be-the-first-smartphone-only-country/#comment-113453 ]
myanmar  unicode  language  languages  wayanvota  burmese  zawgyi  encoding  2014 
september 2014 by robertogreco
Erler Dingbats :: The World’s first Complete Unicode Dingbats Font
"For the first time in the entire history of Unicode standard, the full encoding range for dingbats (U + 2700 – U + 27BF) is now covered by a complete, contemporary quality font. Erler Dingbats is a spin-off of the distinguished FF Dingbats 2.0 family, and was designed as a special collaboration between designers Johannes Erler and Henning Skibbe."
henningskibbe  johanneserler  unicode  typography  free  dingbats  fonts 
november 2012 by robertogreco
Neven Mrgan's Tumbl → Glyphboard 2.0
"Just in time for today’s release of iPhone OS 3.0 with its oh-so-handy pasteboard, I’ve updated a little project of mine, Glyphboard. It’s a sort of keyboard which lets you type glyphs not available on any of the standard iPhone keyboards. These glyphs include , ☂, ☺, ✔, and even ♫. You may find this handy for Twitter, text messaging, emails, and I’m sure I don’t know what else. A clarification: unfortunately Safari won’t let you just tap a key to copy it; you have to hold and tap. I wish you didn’t, but there. On the flip side, even though it’s a web app, once you’ve installed Glyphboard it will work even when you’re offline. How’s that!"
iphone  applications  webapp  characters  unicode  utilities  text  icons  glyphs  csiap  ios 
june 2009 by robertogreco

Copy this bookmark:





to read