7 April 2008
Glocalization and Jobs
This blog was founded with the loose theme of glocalization, which in the business of technology implies the sequential processes of internationalization and localization. Internationalization demands a platform that is independent of any single language and not bound to any one locale. Localization is the process of adopting the universal internationalized platform and plugging it into a particular language and locale to deliver a complete product. Presumably, it's an appealing model because the labor cost of localizing an internationalized product is less than that of recreating the same product for every region and language in the world. I wonder how staff at global companies who currently develop region-specific versions will be affected as businesses continue to adopt the glocal model though…
28 May 2007
Yahoo and Google Survey
A survey of doctype and encoding usage across a couple of Yahoo and Google's international locales:
Yahoo!
- Doctype? Yes. HTML 4.01 Strict.
- Encoding specified? Yes. UTF-8.
Yahoo! Taiwan
- Doctype? Yes. HTML 4.01 Strict (but mixes in some XHTML syntax).
- Encoding specified? Yes. BIG5 (legacy traditional Chinese encoding).
Yahoo! Japan
- Doctype? No.
- Encoding specified? Yes. EUC-JP (legacy Japanese encoding).
Yahoo! China
- Doctype? Yes. XHTML 1.0 Transitional.
- Encoding specified? Yes. GB2312 (legacy simplified Chinese encoding).
Google
- Doctype? No.
- Encoding specified? Yes. UTF-8.
Google Taiwan
- Doctype? No.
- Encoding specified? Yes. UTF-8.
Google Japan
- Doctype? Yes. HTML 4.01 Strict (but mixes in some XHTML syntax).
- Encoding specified? Yes. UTF-8.
Google China
- Doctype? No.
- Encoding specified? Yes. UTF-8.
25 May 2007
Unicode UTF-8 Byte Order Mark
So, I setup this site with the intent of focusing on internationalization and localization as they relate to the web but have not done a whole lot of that yet. I have found that this is something I actually know more about just because I have developed in a multilingual (English, Chinese, Japanese) environment for a while. To kick this off, I thought I would put up a short blurb about the Unicode UTF-8 Byte Order Mark, otherwise known as the BOM. If you have ever seen  prefixing the first line of a file, then you have already been introduced. My thought here is not to discuss the nature of the BOM (you can check out the links below) but to mention some potentially lesser known facts about its use that developers may run into.
- When saving a file in Notepad, if you save with "Encoding" set to "UTF-8" then you are including the BOM at the beginning of the file even though you cannot see it. Similarly, in Visual Web Developer, if you save the file with encoding and choose "Unicode (UTF-8 with signature) – Codepage 65001" then you are also including the BOM at the beginning of the file.
- Properly using multilingual text on the web requires using files saved with Unicode encoding. There are usually a number of options. Generally, best practice is to not include the UTF-8 BOM, and I recommend choosing a Unicode encoding that both excludes the BOM and maintains a small file size. For example, in Visual Web Developer, I save files with the "Unicode (UTF-8 without signature) – Codepage 65001" encoding.
- Having said that, I ran into the same case twice on my former blogging platform where I was forced to include the BOM at the beginning of an ASP file in order for it to properly recognize the script as Unicode and correctly process Unicode text. If all else seems to be failing, give it a shot and see if it fixes it. Still, my recommendation is to exclude the BOM unless it proves absolutely necessary.
Check out these links for more information: