Skip to content

Future Direction for i18n of Web Applications #50

@turquoiseowl

Description

@turquoiseowl

No doubt we have all come to this i18n project looking for a better way to internationalize our web applications. We see that doing the old .NET resource look-up is backward. I expect we also see that leveraging the PO infrastructure for getting messages translated is the way forward.

Unfortunately, the PO infrastructure (i.e. the GNU Portable Object file format specification and the world of tools for translating the files) is very much tethered to GetText, the latter being very backward IMO.

Whoever invented GetText had a brain-wave: we can encode strings in our source code in such a way that A) they can be hooked at run-time, and B) we can find and extract those strings from the source code. A very nice duality! So he or she wrote a library of functions that can look-up and swap message strings, and a tool for scanning source code files for those function calls and extracting message strings to be translated. It therefore assumes that all your message strings are contained in source code files which it can parse, and that they can be encoded as an argument to a function call e.g. _("Translate me!");

For someone facing the problem of how to internationalize a GUI app written in C, GetText is a good approach. For a back-end server program (like a web application), I suggest it is also reasonable, but not the best. With a back-end application, we have access to the output stream, and with a web application it is very easy to get at the HTTP response and do our translations there.

Now, as soon as we drop one side of the duality, one might start to wonder about the other side.

The question is, why bother with all those _() functions when we only need them to mark the message strings (given that we can hook into the HTTP response body). The reason, of course, is that we still need to mark the message strings so they can be extracted into the PO file. Okay, but if we were going to choose a method for marking message strings for extraction, unhindered by any considerations other than it needs to be reliable, would we choose prefixing the string with _(" and suffixing with ")?

There must be a better way to mark message strings, so that they can be easily picked up in source code and the HTTP response. The same algorithm can be used for both. Better still would be compatibility with SQL LIKE so that they can be extracted from database tables too e.g. product descriptions.

The marking can be done in the string itself, so message strings can be written straight into source files without the need to call any helper functions. Very useful for const strings such as C# attributes and data annotations. They would be entirely language independent: C#, Razor, JavaScript, HTML. They can also be written straight into database fields. No need to think "how do I access that helper function?"

Performing the translations at the HTTP response layer has the advantage of confining message look-up and patching to a single place, hence efficiency gains. It reduces dependency on any particular web development platform; we can forget about MVC and drop down the stack to ASP.NET (or even lower).

So where are we with this? With Issue #37 I have taken a stab at defining a suitable message marking syntax, called the Nugget syntax. There will be scope for improvement on the syntax I have no doubt (and a better name). It would be great to have a discussion with you guys on this. I'm sure we can come up with a syntax that is easy to remember and use, and yet robust. Support for string formatting is essential (i.e. {0} substitution), and pluralization would be nice.

With the marking syntax defined, the only outstanding work is to swap out (or augment) the GetText-dependent post-build task with new logic for extracting the marked message strings and adding them to the PO output. My preference here would be to drop GetText altogether (along with the _() calls), but that would mean dropping backward compatibility for projects.

The v2.0 branch includes all the other support necessary for post-processing the HTTP response. At the moment it has support for processing the Nugget marking syntax, and changing that to support any new syntax would be trivial.

We then get to keep the best bits of the GNU translation project:

  • PO message file format
  • PO editor tools including collaborative ones

It has been a few months now that I have been developing a web app using i18n v2.0 branch, where there is the option to encode a message string as either _("Translate Me") or "[[[Translate Me]]]". Given the latter takes no extra thought other than including the [[[ and ]]] it wins every time.

Martin Connell

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions