I'm currently learning Arabic. This will be my seventh language, and while taking notes in class, I suddenly realized that writing in notebooks doesn't actually produce usable data – so I decided to make a spreadsheet of vocabulary terms to help myself study.
Then, I thought, "why not put this in a Google spreadsheet? Then I could share it with my classmates to help us all study." My daydreaming mind being what it is, this set me upon a path imagining what it would be like to create an open database of the most essential and basic vehicle of shared knowledge that humanity has developed - language.
Arabic presents a range of unique challenges to language learning. While resources for Romance languages are widely available to assist autodidacts, Arabic is very different It has no Academie Française that sets the gold standard of correct speech. Dialects, terminology and pronunciation can vary greatly by region in both the written and spoken form of the language, with varying degrees of mutual intelligibility. "Classical" Arabic, Modern Standard Arabic, and FusHa (or Fus7a, in contemporary Latinized written form popular on the web) are all very different from the spoken forms, which take on different characteristics in Egypt, the Levant, the Maghreb, East Africa, or the Gulf region.
Where a French language learner can translate phrases using a smartphone with relative ease, Google Translate is practically unusable for Arabic. Google Translate currently offers only FusHa, and is poor at accurately translating grammatical structure. FusHa is not used in spoken dialogue, and is basically the equivalent of Olde Shakespearean English. People on the street will laugh at you if you use it.
Most traditional textbooks and learning programs teach FusHa, and students rely on their instructors' knowledge of regional dialects in order to be able to learn the colloquial Arabic used to communicate in everyday situations. In addition to the challenges of spoken Arabic, most written script in books, signs, and everyday written word is presented without the vocalization markings that are usually only used by native speakers in primary school, but are a godsend to new Arabic learners. Without them, even knowing the full alphabet and correct spelling won't tell you how to pronounce it, or even what word it is - as the same letters with different vocalization markings (which aren't written) can be totally different words. Resources are limited in this field, and (quite surprisingly) even in the rich history and expanse of literature of the Arabic language on the internet, there does not currently exist on the internet any single comprehensive dictionary that combines FusHa with all of the regional Arabic dialogues and accents, with written vocalizations of the colloquial spoken forms – which are only recently becoming standardized in written form. Yep, Arabic is complicated!
I am only an introductory beginner at learning Arabic. But I have been learning languages all my life – the most recent one being the most universal and useful language of all — code.
This is a project that will start out simple - with a modest .csv spreadsheet that I am using by myself to build and annotate my vocabulary list. It will soon grow to encompass multiple sheets, each for nouns, adjectives, basic expressions, grammar - then verbs - then their conjugations. For each, there is an English infinitive, then an Arabic. Then an English transliteration. Then a FusHa translation. Then a Levantine regional translation.
Then there will be expressions that vary by region, even sometimes city by city, such as the case of geographically fragmented Palestine. How is this word pronounced in Yemen? Is there a colloquial form in Tunisia? What about Sudan? What phrases are unique to Egypt? Is it Cairo, or just the Upper Nile? There will be annotations and footnotes for these values. From there, entries alternative or equivalent translations will emerge, and values will need to be linked together. At some point, this modest (yet by now, robust) .csv will need to evolve into a relational database. By then, the possibilities for data storage and query will grow exponentially, allowing for interfaces far beyond the navigation of a spreadsheet editor - and this is where the power lies.
But it doesn't stop there - why only English to Arabic? Why not other languages? This is where the project will take a metaphysical turn, utilizing the open source approach to maintaining languages that are diminshing from the globe, and also undergoing transformations due to the exigencies of the internet.
It will take on not just a philosophical dimension, but a political one as well. As dozens of languages are becoming extinct with each passing year, why is it that Google gets to decide which languages are immortalized by its translation services, while dozens become extinct with each passing year? Why hasn't it expanded compatibility for Arabic, when it is one of the most widely spoken languages in the worl?
Let's work on this together, and see where it goes!