viznut's amazing discoveries

[2007-09-01]

Saving the world with a metahuman language

I now feel an immense urge of finally revealing an idea related to automatic processing of natural language. (Yeah, in case you didn't know, I also have a life-long interest in languages and linguistics).

Sometimes, writing text in a natural human language (such as English or Finnish) feels like using a computer assembly language. Here are some reasons why:

Therefore, instead of writing directly in a natural language, I would like to use a metalanguage where I describe the thoughts I want to express, and then run an automatic compiler that translates the "source code" into various natural languages.

This metalanguage would not be like an auxilary constructed language (such as Esperanto or Lojban), but something specifically designed for unambiguously describing "what you want to say".

So, instead of playing around with sentence structures, you would simply define objects and relationships between them, and the compiler would take care of creating a nicely-flowing output from this information. You could even describe typos and ambiguities in the metalanguage if you want to include such quirks in the final text.

The benefits would be immense. Considering that the metalanguage has a sensible and human-friendly grammar, it would save a lot of mental resources for more obvious purposes. A very important consequence would also be an automatic and accurate translation into any natural language that happens to have a "generator back-end" in the compiler.

Of course, there are existing projects that resemble this idea, such as the Universal Networking Language project of the United Nations. However, UNL is only designed as a "pivot language", or an intermediate representation for machine translation, not as something one would like to write directly. Besides, the UNL project is closed, patented, restricted, and I've also got an impression that it is somewhat stuck and hasn't yet reached a practically usable state.

From what I understand about language technology, I suspect that the analysis phase of machine translation is far more difficult and error-prone than the synthesis phase. The analysis phase includes things such as parsing the source text, constructing a concept tree, deciding between alternative interpretations, guessing what was left out, filling the gaps, etc. -- all this would be avoided if the source text was in an unambiguous metalanguage instead of another natural language.

I feel that a software project like this would be immensely important for the humankind -- far more important than all the various Linux distributions combined. Just imagine the consequences! And do not forget that, despite the illusion the Internet might give you, most of the people living on this planet do not understand English at all. Also remember all the savings of mental resources, all the possibilities to generate documents automatically, all the possible "toy" uses (random text generators, fictional languages), etc. etc.

I've had this idea for a long time and I've also been planning a a simple "proof-of-concept" implementation that could be gradually expanded so that it would eventually be useful for, say, publishing this weblog in two languages at a time.

However, I've always felt that an idea of a practically-oriented "metahuman language" is so obvious that there must be an existing project like this among the experts of computational linguistics. So, I've left this kind of ideas for them to process, as they are probably a long way ahead of me.

Still, I'm unable to find any existing projects closely resembling my idea, and I just can't understand why. This could be just perfect for an open collaborative project that actually benefits the humankind. And besides, I believe that, unlike natural language translation projects which tend to get big money and committees before reaching any usable results, this "metahuman language" could begin as a small and simple hobbyist project.