Common internationalization misses

With the world becoming more and more connected, with better support for global users and global businesses, globalization (g11n), internationalization (i18n) and localization (l10n) are more commonly being viewed as functional requirements vs nonfunctional ones.  Everyone is focused on supporting RTL layouts, date-time formats, and translations.  But here are some commonly-overlooked internationalization challenges that nearly every site build misses on the first try.

But first, some definitions…

Everyone has heard internationalization and localization bantered around, but if you ask what they really mean you often get some blank stares.  So far the best definitions I’ve found are from the World Wide Web Consortium (W3C):

Localization refers to the adaptation of a product, application or document content to meet the language, cultural and other requirements of a specific target market (a locale).

Internationalization is the design and development of a product, application or document content that enables easy localization for target audiences that vary in culture, region, or language.

— Localization vs. Internationalization (https://www.w3.org/International/questions/qa-i18n)

Most tools and frameworks out there do a great job of localizing content once it’s been added to a system.  They also tend to offer good support for internationalization out of the box with language contexts, timezone support, and layout changes based on language.  However, like any tool, just because a solution can do something well doesn’t always mean it is being used correctly.  These common mistakes have plagued nearly every internationalized project I’ve ever worked on.

Dates and times

Yep, believe it or not these can be a real problem.  I’m not talking about simply localizing the output but actually collecting the data.  By allowing users to input raw date and time values (is 9/10/2015 September 10 or October 9?) you can introduce inconsistency in the values stored in the system.  Same goes for time where people could be reporting time in 24-hr format vs AM/PM and you have to account for both formats. Oh, and don’t forget many cultures don’t use the Gregorian calendar common in the Western hemisphere.

The simple solution?  Use some kind of date and time picker widgets that provide the software with a consistent date and time format that it can then act on.

Timezones

Closely associated with collecting date-time values, another pain point is often timezones.  Now that you’ve collected a consistently-formed date and time, what timezone is it for?  What timezone is that integer timestamp being stored in your database using?  More often than not they are based on the server timezone, which isn’t necessarily the timezone your users are looking for – and is something that can change based on server migrations, scaling clusters, etc.

The fix?  Store all values in GMT/UTC/Zulu (pick your flavor) which is basically no time offset, or store the full ISO 8601 encoding which includes the timezone.  This makes localization a straightforward matter in pretty much any programming language and removes any ambiguity.  It also insulates your data and software from side effects introduced by server configurations, migrations, and scaling operations.

Addresses

This might seem to be obvious, but most countries in the world do not conform to the same address structure as the United States Postal Service’s Publication 28.  I know, crazy talk, right?  That means providing free form textboxes or hardcoding city/state/zip as address collection mechanisms are not only antipatterns for localization, they also make any kind of geocoding nearly impossible which is a requirement for many systems today that provide automated localization.

What are your options?  Base your address-collection solutions on the UPU S42 Standard which should cover most of the world’s addressable locations (UPU claims 192 countries as members).  This will provide a comprehensive set of address components that can also be provided to geocoding and shipping automation systems, and a storage schema that can accommodate anything thrown at it.

Age

Age?  For real?  How could that be messed up, right?  Surprisingly to those of us in the Western hemisphere, not every culture counts the day you were born as Time Zero.  Some count the time in the womb, some consider you a year old the day you are born, etc ad nauseum.  Generally speaking this isn’t an issue unless you’re dealing with regulatory age restrictions or a marketing and/or service strategy based on actual age, but it does come up.  It can also wreak havoc in analytics as you need to consistently group users in the proper age demographic.

Easy fix?  Ask for their birthdate (or birth year if there’s a privacy concern), not their age.  Not only can you now easily calculate their age going forward, but you remove any ambiguity regarding their time on Earth and can provide properly localized ages in user profiles.

In conclusion…

Localization is hard.  With all the different cultures, languages, and locales out there it’s tough even when you have everything you need.  Make sure you back up the localization of your system with proper internationalization support, and you’ll find it makes things much easier and provides better data your organization can act on.  And your users will appreciate it as well!

Finally, if you’ve run into other common misses in i18n efforts you’ve worked on, chime in below!