It was very cool to see that Google now has transliteration for Ethiopic scripts available across a great many of its systems, including GMail. Despite what my brother-in-law thinks about the fact he spend untold hours figuring out how to insert a turd icon with or without flies in to a GMail message, I find this to be a vastly more useful tool, especially how loath hardware manufacturers are to making something just for the African market and Ehtiopic keyboards not being in high supply.
The only downside to it is that it requires the rich text interface which is a bit more bandwidth intensive for things. If your latency isn't super low, there is also a response time issue with the suggestions as well as actual transliteration which I didn't think was working at first. Also, you have to noodle around a bit to find out how to turn it on, although instructions are here.
I am impressed though with how they made it smart enough to transliterate sounds. As opposed to going from say, Croatian Latin characters to Serbian Cyrillic characters which is a one-to-one exchange, this required a good deal more work and thinkifying. I just question as to how well it works as I did an easy transliteration up top, which as you see from this Coca Cola photo is considerably off unless these are just variations of the fonts in different formats as it did take those eight characters with four syllables and produce four letters with four syllables as it should have. Guess we'll see where this heads in times.
Normally I stick to technology here, but Gaddafi has finally hit a sour note with me as I read his latest ramblings:
...Nigeria should follow the model of Yugoslavia, after previously saying it should be divided into two - along the lines of India and Pakistan.
I assume that he's backpedaling because using the India/Pakistan split for the model of dividing Nigeria in to separate countries would be suicidal. The Yugoslavia reference is just about as stupid given that that breakup cost a quarter million lives and is still not a solved problem 15 years later. Part of the thing that made the Yugoslavia breakup "easier" than what happened in India is the fact that there were pre-established borders for the countries wishing to separate. This is not the case in Nigeria. You can't just draw these arbitrarily as no one will ever be happy with the result. And maybe it should be mentioned that Muslim and Christian populations live side by side in most areas of Western Africa as was much more the case in Bosnia Herzegovina than anywhere else in former Yugoslavia. Thus you end up with bloodshed and what is basically a failed unity government when you try to split things up.
It should be noted that one of somewhat stabilizing (as well as heavily nationalizing) forces in former Yugoslavia was the concept of a singular, national language for all the breakaway states. While Croatian, Serbian, and Bosnian are all basically the same language, people still unified behind a single national language, even though there are three very distinct dialects in a place like Croatia.
If one were to apply this type of thinking to Nigeria, take a look at this language map of the region. Those would be some pretty twisted up looking countries and it would still solve nothing.
The Kamusi Project has just tossed their hat in to the ring of folks who are working to get African languages adapted to 21st century technology terminology. They made the official announcement about a site they've set up to try and further the goals of getting Swahili words adapted to computers. I mean, after all, according to Google Translate, 'computer' in Swahili is 'kompyuta' which for some reason I don't really buy as being a terribly Swahili word.
This frustration is further belied by Rebecca in a recent tweet:
Just done an interview abt @iHubNairobi with #BBCSwahili; I need to practice Swahili more....whats a domain in Swahili? More research needed
Again, according to Google Translate, 'domain' (as in web domain name) is 'miliki', which does sound like a proper Swahili word, but I'm assuming it has a completely different meaning probably having to do with rule of land or something. And that's the issue, do you adopt some "3rd party" loan word for these purposes or do you come up with a new word because let's face it, no one is probably going to call the 'web', 'mtandao' as it's just too long and everyone knows the word web now.
Let me emphasize that this doesn't just affect African languages. The problem exists everywhere which is why a word like 'web' is just 'web' in Spanish despite there officially being no 'w' in the alphabet. It's also why Speakers of Croatian will say SAD instead of Sjedinjene Američke Države for the United States of America due to the length.
I suppose that in the end there needs to be a balance of ease with authenticity when it comes to adding new words to a language. I just hope that efforts like the ones from Kamusi and ANLoc (site appears to be down?) gain some traction because it's a problem that isn't going to go away and will only get worse as time goes on. Just look at German, which currently has 8,000 loan words from English. At what point is your language (and thusly, your identity) no longer yours?
"Je suis low batt." In French, this literally means, "I am low battery." It doesn't make much sense on its own, but in the context of how Michaela Wrong talked about it in her book, "In the Footsteps of Mr. Kurtz" it is a phrase that she uses to sum up the great wealth of issues which plague the Congo. It harks back to when the first mobiles arrived in the country, which had a very annoying tendency to die after some 20 minutes of talking. Thus, the speaker would typically always have to tell the listener that they were out of battery and had to either switch phones or go charge. The term took on something of a life of its own and came to mean that something in general had run down.
We are at a point now here in Abengourou where the power cuts (or délestage if you will) have become regimented in that they're from 01:00 to 09:00 every day which coincides with no water during that time plus another 3-5 hours afterward as the system rebuilds pressure. Living life around this schedule is not what I would call choice, but it is doable, especially as you know that it's coming.
It just so happens that for the first time today, I heard of someone being asked how late he was out last night to which he replied, "Oh, we were out past-cut." meaning past 01:00. I'm sure that others are saying to make sure not flush the toilet until post-cut as well. As this way of life has become unfortunately ordinary (which is a shame as the resources do indeed exist) we have taken to incorporating it in to everyday language. No one probably even notices this, but happens all the time like when we say, "go Google it" when we mean to look something up online or "grab a Kleenex" when we mean a tissue.
I don't have the perception that people in Africa do this any more or any less than anywhere else in the world, but I find it more noticeable given that when it happens, it's usually a bending of pre-existing words or phrases, whereas in North America or Europe, it's the straight up adoption of a product name given the constant media and marketing blasts that permeate those societies a good deal more. Of course, many people here in Côte d'Ivoire keep insisting that the word for pen is "bic" instead of "stylo" or to grab a "Lotus" (a local brand) instead of a "mouchoir" so, I suppose the jury is out to some degree even still. We humans do enjoy our products; the power cuts, not so much.
It appears that this site has been registered since 2006, but I'm not sure how long gate2home has been in its current incarnation, which quite honestly, kicks ass. This site is an onscreen, virtual keyboard that allows you to type with the characters you need in your language and then paste them in to whatever text program you need. And while it may initially look like it's one of those hunt and peck things that you can use in other systems, it's not. You can actually type from your computer keyboard and it maps the characters accordingly. The fellow who created it did so out of personal need, which is where I find a great deal of the best projects come from.
The reason that this caught my attention was the fact that when you open up the initial dropdown to choose your language, right there, bam, at the top is Akan. But, the African languages don't just stop there. There is also Bambara, Bemba, Fulfulde, Ga, Hausa, seSotho, Lingala, Yoruba, and a wonderful slew of others. And naturally there are a lot of other language beyond the African ones, such as the developer's original Hebrew.
Having dealt with installing keyboard language packs and dealing with all the issues around the fact that American operating system manufacturers don't really care about languages in general, this is a godsend. Obviously, I probably wouldn't use it on an incredibly long-term basis (language pack, you are there for me on that for now...) but for short things, or maybe even decently long ones, this is really, really cool.
For anyone who writes and speaks in a language with characters beyond the extended Latin set, I really recommend checking this out and see how well it works for you. Or just keep it in mind if you, like the developer find yourself in an internet cafe trying to use the local keyboards. I'm still traumatized by the Belgian French layout.
Following on my prognostication for 2010, I came across a page on Wikipedia for the total number of language speakers in the world. I applaud the fact that this list was created as it is interesting to see. It's just a shame that some of the figures are insanely inaccurate, which is probably why it has been proposed to delete the article.
One of my barometers on anything to deal with world languages is Croatian or Serbo-Croatian if you will. On this chart, it lists it as the 50th most spoken language in the world. That really doesn't seem correct and a great deal of the numbers are out of whack. Digging deeper, I see that some of the African language totals are worse than a stab in the dark. For the most obvious starter, take a look at Kiswahili. It's listed with 5 million native speakers and 80 million secondary speakers. Most accounts I've seen have it listed at 100-150 million speakers. Some documentation is needed there.
It's little things like this that make this list need a great deal of love and it's unfortunate to see that despite all the activity on it, so many of the figures are quite inaccurate. So, I ask of anyone out there with some language knowledge to document and contribute to this list in order to make it something a great deal more respectable, at least on the African front if nothing else.
When it comes to web technology trends, there is typically one that is the sexiest one for that year. For example, "mobile" was the one for 2009.
I'm going to go out on what I feel to be a rather thick limb and say that 2010 is going to be the year of language. We've been seeing multi-lingual efforts grow by leaps and bounds over the past years and it seems that we're getting to a point where most people I know say, "Hey, Google Translate doesn't just simply translate literally, but it's actually quite good." The web has matured in the possibilities it allows in being able to cross the borders formed by language.
Nowhere is this more the case than in Africa. I see 2010 as a pivotal year in African languages getting online. Jimmy Wales wants more African languages in Wikipedia and there has been a good deal of push by Google in this department with their Kiswahili Wikipedia Challenge that the Google Africa blog covered two weeks after it was over--how timely. But, the fact is that while all kinds of money and effort can be tossed at getting more African languages in to a digital format, if it doesn't come from Africans, it's not going to take root.
While a great many African languages were alphabetized in to Latin character sets a century ago by missionaries, it's unfortunate to see that despite this, so many languages, while spoken, as not able to be read or written (Kiswahili and a handful of others are indeed working to buck this trend.) I would posit that while these alphabets exist, for the most part, they weren't created by those speaking the languages from birth. They were an artificial, external force that didn't stay around.
By comparison, a bit before the time that missionaries were traipsing about Africa, putting these historically oral languages to text, the Romantics in Europe were busy standardizing their languages. Pompeu Fabra, Vuk Karadžić, Ferenc Kazinczy, Alessandro Manzoni, and a slew of others were refining the languages that they had grown up with. But, instead of formalizing their languages in order to spread religion, they were doing so in order to spread the language.
It needs to be said that Amharic and other languages in Africa did indeed have established alphabets, but compatriots of these European Romantics were busily trouncing African languages through Colonialism. While enforcing English, French, Portuguese, and Spanish as lingua francas may have been practical (yet brashly inhumane) in the artificially created borders of a colony that may have had upwards of 100 or more languages and dialects, it set up a system that we still see in place today. This is especially in Anglophone or Francophone African countries where the local languages are spoken on familiar, yet not official terms. There have been strides made to try and stem this linguistic undertow of the last century as seen in Tanzania, Mali, and others, where education in the local languages is either being proudly enforced or at least investigated.
The problem in all of this is that spreading a language in an official capacity is expensive and English has (like it or not) become the business language of the world. Dictionaries are not cheap to print and institutions are not set up overnight, let alone the fact that you need people able to read and write in these languages in the first place who are in constantly dwindling numbers. Taking on the creation of language institutions for an entire country to function are not easy to propose, especially if there are several languages to consider.
So enters the internet and more importantly, the point where we are at with language on the web in 2010. Wikipedia, Google, Facebook, Twitter, and others (such as this site) are all taking the fact seriously that any 21st century web business model now needs to include a multi-lingual environment to reach the maximum number of users.
Kiswahili has been the golden child in all of this, making use of many of these crowd-sourced technologies to bolster its online presence. While Google is trying to promote competitions, these linguistic efforts can be self-started and homegrown. In fact, to truly succeed, I think that they have to be, as people need to convince themselves first and everyone else second. Of course, many people will very well be asking, why bother?
Google doesn't need to destroy all the data it can't index because it's going to reach a point where if it isn't online, then it will disappear from our collective knowledge. We're at a pretty crucial tipping point where all the languages that are going to be carried forward with us need to get online now, or they will simply cease to exist due to the original speakers dying off or a language like English or French supplanting them. While a monolingual culture may seem easier for people, the fact of the matter is that your identity is tied up in your language and if you lose your language, you lose your culture. The global corporations would love for us all to have the same language and buying habits, but I'm of the opinion that losing the languages and cultures which define us, we basically lose us.
So, let's keep the languages jumping as this new decade takes on the digital preservation of all our languages.
A couple of days ago, McAfee released the information that .cm was the most dangerous domain on the internet currently. It's not so much that Cameroon has more internet scammers, but more that the internet scammers of the world have turned to .cm domains to create malware sites when people mistype a .com address. Unfortunately I think that a lot of people are now going to associate the country of Cameroon with being full of nefarious net thugs, which is quite unfortunate, as I say it's simply not true. You can read a full breakdown of all of this in a PDF on McAfee's site. (Yeah, a PDF is kinda like a fax machine, if you were wondering...)
The funny thing in this is that as I state on my about page, rarely is it the case that I am able to combine my Croatian lineage with my interests in African tech. Well, it appears that this is one instance (again) where there is actually an overlap.
As a complete opposite to .cm, it turns out that the .hr domain for Croatia is one of the safest in the world--Croatia in Croatian is Hrvatska thus the HR. Again, this doesn't mean that there are less scammers in Croatia, it just means that of those sites using the .hr domain name, there are less that are harmful on the web. Why is this?
To start out, a Croatian domain is considerably more expensive to register than a .cm, so that does play in to things to some extent. Then there is the fact that unless you have a Croatian website, in Croatia, .hr sucks as a domain extension. No one in their right mind would bother to register that for typo mistakes because really, there are none to be had. This all makes it a safer domain purely due to being less desirable.
But beyond these two points, there is something else that plays in to this in that you have to be a Croatian citizen or have a Croatian company (incorporated in Croatia) to purchase one of these very expensive domains. This in effect limits the possible buyers to a maximum of about 5 million. While that sounds like a lot, think about the fact that from what I found, it seems that anyone can register a .cm domain. This creates a potential pool of billions of buyers. Obviously your chances to have a couple of bad apples in the bunch rises a great deal in this.
I think that when saying .cm is the most dangerous domain on the internet (or .cn or .hk or whatever) there needs to be a total given along with this to state how many of these sites are actually registered by the people of that country. I'd bet good money that if you did that, you'd see that nearly none of the malware idiots are Cameroonians because the overall penetration of the internet in Cameroon is around 3% currently. So, people just don't have the access to go about creating some nefarious site when there are much better things like email, Facebook, or other communication tools to use when one has limited and very expensive net time.
In all honesty, I think we screwed up (or rather the US with ICANN screwed up) in creating non-country specific domains in the first place such as .com, .net, .org, .info, .travel, .biz, etc. I think that if we only had domain extensions per country to date and you had to be a citizen of that country to get one, things would look a great deal different in internet land and I have absolutely no idea who on the net would have the "most dangerous domain" honors.
Google has really been busy on the language side of things lately. This wouldn't be news to anyone except translators and multilingual folks except for the fact that they introduced more African languages to their mix of available languages for translation and so it's suddenly become a good deal more important for Africa as it is boosting cross-communication abilities on many fronts.
First off is the new Google Translate. I use this system quite often, so I noticed right away when they made the switchover a couple of days ago. There were some bumps in the transition which I'm assuming were due to the work being done at off peak hours in the US, but very much on peak hours for those of us on UTC or UTC+1.
In general, I like the new format. It's definitely snappier overall for quick translations. What I don't like is that it's quite heavily AJAX driven (as are most things these days) and I'm curious as to how well it would perform in a low bandwidth setting. I'm hoping that someone can give that a go as try as I might to throttle my connection, I can't seem to get it to downscale to to point where I feel is properly representative of a low bandwidth connection.
Something that's also rather new is the speaking voice for English target translations. This is really quite important as the English alphabet is complete garbage when it comes to writing how the language is spoken and I'm sure that non-English speakers will get no end of enjoyment out of wondering how on earth through, threw, and thru all sound the same. What would be nice is that in addition to the Roman alphabet transliteration for languages like Chinese is if they did this for English as well...
Oh course the big news in translation land are the automatic captions for YouTube. These are huge and quite frankly, it's about time that a major video platform finally added in some proper subcaptioning abilities. Sadly, it will probably mean the death of dotSUB which is a platform that I like to varying degrees, but it's easy to understand why people were lax to add in subtitles as it was a great deal of work to create and then translate the text. Google takes the approach of "machine bash in to shape. human refine. everyone love." and I think that it will work quite well overall. Obviously once they fully deploy the system and people start to use it more, we'll see it refined a great deal. But it's good that Google's YouTube brand has finally started making good use of the Google abilities such as machine translation.
My only wish is that Vimeo would do something similar and maybe because of this, they will. Honestly, they should just buy out dotSUB or something to that effect. Their interface, video quality, and overall ease of use if vastly superior to YouTube with YouTube being kinda like a Spanish croissant in that it's okay overall, but once you dig in to it, it kinda sucks a great deal...
One of the big chunks of news to come out of the ICANN meeting in Seoul, Korea was a final timeline and implementation guideline to have internet domains in non-Latin characters. Honestly, I wasn't even going to write about it as I am much more interested in seeing how the implementation comes about and how it shakes down. But, in poking around for news about it, I came across this Pros & Cons article. I am nearly amused by the con comments as I'd really like to know if the people making them are a) English speakers and b) monolingual. They're just not well thought-out and such incredible straw man arguments that I would laugh if it wasn't the case that comments like these could derail the whole process of creating a proper multilingual internet.
Expanding beyond Roman characters also increases potential for site rip-offs that use homoglyphs, characters with identical or indistinguishable shapes.
Pfft. Then we should just shut down the internet and resolutely solve the problem. I mean, people die in car accidents every year. Should we not create new cars because people could die in the new cars when they're currently dying just fine in the old cars? This reasoning is not logical and sounds like a veiled attempt to excuse laziness in making this switch because hey, it works now, so why change?
Adding support for 100,000 international characters would make traditional keyboards insufficient input devices for accessing the entire Internet. As fellow PC World writer Jacqueline Emigh pointed out, it would be next to impossible to produce a keyboard that could support characters from every language under the sun.
Really? Are you serious? Depending on what I'm working on, I typically have up to four keyboards installed on my machine: English ISO, Spanish ISO (which also has the French characters), Croatian, and Cyrillic. I can probably type at least 1,000 different characters by easily swapping the active keyboard. I'm using Windows XP, which is old. Windows Vista and Mac OS X are even better in this department. We've had this "amazing" technology around for over a decade. It's easy to switch and it works fine. And really, if I need to go to a domain that has French characters in it, wouldn't I be probably be using a keyboard that supported the French characters already? Also, the English QWERTY keyboard was designed to have you type slower, so isn't it about time to update it anyway?
I realize that people are shuddering to think that this could establish "language silos" on the internet. Only an English speaker would think this because currently, imagine how it is for a Russian typing with a Cyrillic keyboard to have to switch all the time to Latin characters just to enter a domain? The silos will develop no matter when and if they're going to develop. I think that due to all the language work that's going on these days, we are actually entering an age of far better cross-communication than ever before.
All of this doesn't effect Sub-Saharan Africa as much as other countries due to the fact that African languages (with the exception of Amharic) were alphabetized using Latin-based alphabets. But the one thing that would be great out of this is that a language such as Lingala, which was created with accented characters, doesn't get "Anglicized" as often when written on the internet and the characters actually stick around.
If you don't currently have it, I recommend for anyone out there to switch to the US International Keyboard if an English speaker. It doesn't ship as default with operating systems for some insane reason, but it offers up a huge swath of other characters to access just by using one additional key.