Localization is a big topic. This is one of a two-part series. This first section covers the fundamentals of Localization. Part 2 will discuss strategy and briefly touch on multilingual automation.
QA Localization is more than just translation
My goal in writing this post is to not do a review of every possible Localization test you could run. From my experience, every individual product has individual needs defined by the Product Owner based on the supported platforms, users, function of the application, etc.
Below are a few examples of how complicated Localization Testing can be.
Date Formats – May 30, 2020 or 5/30/2020: These are the most commonly accepted date formats in the US.
Standard short format date syntax for United States English are:
These 7 available, standard date formats are for short format only. There is no one simple date format, and there are many more for the long format–many people don’t realize how many there actually are! I worked on one application for the food industry where the date formats had to be written based on international standards for Food Expiration Date to be printed on the food item, which is a unique format, different from the above, and one which I had never worked with before.
Locale settings – If you have not done testing around region and language before, go take a look at your setting controls. In Windows OS, a quick look at Region and Location language settings you can see all kinds of date formats, number separators, time formats, currency settings – there are a lot.
But, is Localization only about currency or date format? No. I have worked on Localization projects where there was no currency involved; on projects where the biggest issues were about sort order and message box size. It all depends on your product. One thing is for sure—don’t think it’s just about translation. That is very often the easiest part.
Understand the Main Terms Associated with Localization Testing
What are we talking about with this test strategy?
To get a bigger understanding of this “Localization” thing, let’s first understand the words used in this area and what they are about. There are 3 terms commonly associated, or interrelated. They are:
Globalization is the business issue surrounding how you will localize your software. Sometimes, people use the Globalization term as an umbrella for all these tasks. It is the least commonly used term for this work in Software Testing. People don’t use “G11N.”
For example, will you do a Simplified Chinese version for mainland China and Singapore, or will you also do a separate version of Traditional Chinese for Taiwan and Hong Kong? If your product is for the European market, will you do standard testing for French, Italian, German, and Spanish Project, or are you going to add Portuguese to sell or use your product in Brazil? If Canada is the main market, you may only do a mainly English product and localize into Canadian French and skip Standard French and Swiss-French. These are business decisions.
Internationalization is I18N (“I” 18 characters “n”).
What day is this? 3/5/20? Depending on where you are, It’s March 5th 2020, or May 3rd, 2020.
Internationalization is the ability for your application to be localized. An internationalized product is one that allows, for example, multiple date formats depending on your locale. It allows floating (can also be expandable or shrinkable) text boxes—this is since different languages may take fewer, or more words to message users (e.g. error messages, warnings, Help, menu size, screen text). The product itself is engineered to handle different names, addresses, or phone formats and is Unicode compliant. This is the most difficult part of a software engineering perspective.
Localization is L10N – “L” 10 characters “N.”
Localization is primarily concerning correct translation and cultural appropriateness among many other things. Non-tech people often think this is what L10N is all about. It’s actually the last and easiest step. But, Localization needs a bunch of people with specialized language skills. You may need a Korean speaker. A Japanese speaker. Someone who knows the language, culture, and other locale differences between Canadian French, Swiss-French, and Haitian French.
Localization is into one specific locale. For example, a product is localized into:
…depending on your product’s needs.
This is clearly not simply testing the French language translation. Each of these locales has specific settings that need to be tested by someone who is not only French language-aware but knows the differences for number format, date format, daylight saving, address…and cultural appropriateness for these five example locations where French operating systems are used.
It is easy to focus on language. It’s also easy to understand currency or date format. In one of the early Localization projects I worked on, there were 27 locale settings—from default paper size to color settings—that changed depending on location. Localization is a whole lot more than the size of an error message window.
The Secondary Words and Project Types: FIGS, Double-byte, Right-to-Left, and Unicode
To clarify more on what these projects used to be called, I still hear people refer to these as part of Internationalization projects. People may still refer to FIGS (French, Italian, German Spanish and Portuguese Languages) or DB (Double-Byte.) BUT, these words have lost popularity with Unicode. FIGS is rarely used today, but DB is often used—right or wrong, to mean East Asian Localization and any technical issues around that. Double-byte sized characters still cause unique issues that other single-byte characters do not.
FIGS is French, Italian, German, Spanish, including Portuguese, if Brazil is part of your product’s market. If you did a FIGS project, all the 256 character sets would work. There will always be individual language bugs but the product was ready for any 256 character sets language. All the European languages fall into this—including Russian/Cyrillic and Greek. Today, if someone uses the term FIGS, it means the product market will be in North America, South America, Australia, and Europe. The market will not be localized for the Asian/ Double-byte languages (e.g. Japanese, Chinese, Korean, or Thai). Or, they plan to use the English product in Asia. There are still locale settings to test in this case, but in that unique case, not language.
DB is double-byte character set languages—most people think of DB as East Asian languages— the main Asian languages for Localization include Japanese, Simplified Chinese, Traditional Chinese, and Korean. In the past, double-byte meant there were more than 256 characters needed for that language so instead of holding all the characters (<256) in one byte, you needed 2 bytes. As in a double-byte character. I worked on “DB projects” which for the projects I worked on, meant the product had a Japanese or “J-version.” If you sorted out the engineering issues with a J version, you could translate into the 2 Chinese character sets easier. From my project experience, there were more technical/functional problems to deal with on DB projects. The FIGS projects had as many languages and UI issues but less or no functional issues.
R-t-L (RTL) – right-to-left read languages: Mainly Arabic and Hebrew, but more as well, such as Persian.
Arabic is the 5th biggest spoken language globally. It is used in 25 countries and by more than 274 million people. For many companies that have had products in the European markets, as well as Japan and China, widespread expansion into the Arabic market has been a more recent event. In my experience, these were always separate engineering projects.
Unicode- an international standard for “all” character sets held in one standard organizing format.
Unicode 13.0 (these characters consist of 143,696 graphic characters and 163 format characters) covering 154 modern and historic scripts, as well as multiple symbol sets and emoji (Wikipedia).
Unicode standardized how characters would be represented with a plan to do away with language code-pages, the specific need for specific double-byte projects, and be across the entire digital world. This standard has been around a while but adoption took years. I remember it was an event when Oracle databases became Unicode compliant. It took a long time for Windows OS to be Unicode compliant. There are browser and encoding issues today that even in a “Unicode compliant” world need unique test efforts and maybe rough with “DB characters,” which means many users of your product may use UC or QQ browser more than you would test for a non-Asian market. Unicode did away with the need to have FIGS or DB projects. DB, rightly or wrongly, is still often used to make a J or Chinese version. Unicode compliant made things better – but absolutely does not mean “no bugs.”
In my first few years working in software quality, I worked on a product that was meant for the global marketplace. We worked on what we called the International English version. It was “internationalized” meaning it had been engineered to handle the various character sets we needed it to handle: date formats, currency formats, number separators, screen size, error messages, text boxes, screen text-all the text things were designed to work from various external resource files. That was not a common design at that time. The product was the “international version” but it was localized into American English so that the code-team (English speaking test engineers) tested the entire internationalized product in English.
Another strategy that some teams use for early testing is side-by-side testing where you have the English product running at the localized product at its side and walk thru tests where the test engineer executes the tests side-by-side without a clue of the localized UI.
This concludes part 1 of this series on Localization. We have discussed the relationship between Globalization, Internationalization, and Localization as well as a few types of Localization Testing. Part 2 will discuss, plan, and develop your testing strategy for QA Localization.