Saturday, October 22, 2022

lang-aware Domain

Yeah yeah, I know. No word from me for a while, and then suddenly two posts in quick order.

🤷‍♂️

I finished reading The Sandman series once again. The last time I read it was back nearly 10 years ago, from way back when. To say that The Sandman was influential to me is an understatement---I have previously written about how Death of the Endless is my handphone wallpaper, as evidenced from 2015, and once again in 2013. That was roughly when I started to fear less about Death and think of Death not necessarily as a cute goth chick, but as an inevitable old friend who will come visit in time to come.

Dream though, he's too bloody moody in his Morpheus aspect. Reading the CBRs in the full glory of my vertical monitor is a much different feel from reading it off the puny tablets and even punier phone.

But I didn't come here to write about reading The Sandman, and catching up on Komi Can't Communicate (till Chapter 376) and One-Punch Man (till Chapter 172), though they are tangentially related.

I recently updated the Unifont version from 13.0.04 to the more recent 15.0.01. While doing that, I observed that I was having a Variant Chinese character problem (this is where the tangential relatedness occurs).

First, have a look at this simulated screenshot (font used is Unifont):
Look carefully at the third CJK character after the punctuation parentheses in the second line for each (underlined in red).

Do you see a difference?

You should. The first one is the 素 glyph, rendered in in Chinese, and then later on, rendered in in Japanese. They are of the same Unicode codepoint---32032---but have different forms as determined solely by the context of the language that is used.

If you do not or cannot see a difference in what I wrote in this entry (as opposed to the screenshot), it is likely that you are viewing this on a set up that do not have the correct fallback fonts installed, leading to some strange glyph being rendered instead.

In any case, the screenshot shows the outcome of the quality of life improvement I did. In updating my hosted copy of Unifont, I also uploaded another version of Unifont that is tuned for Japanese text. I also replaced the TrueType format with the OpenType format where applicable, solely to transfer less bytes overall when these fallback versions of the fonts are used to handle combining character algorithms (about 4.8 MiB compared to around 15 MiB).

Now that the outcome is demonstrated, let me talk about what exactly I did.

I went through my entire website and added lang attributes to all text fragments that deviate from standard English. For each of the lang attributes used, I defined a list of fonts (most likely to appear on the Big 3 operating systems) and styles to best represent it. These included:
  • ``fonipa'' for International Phonetic Alphabet;
  • ``zh-Latn'' for the romanised pinyin (think dizi and the like);
  • ``zh'' for regular Chinese; and
  • ``ja'' for Japanese.
These language tags also allow anyone else who have different settings for rendering different languages (see this old rant about CJK font sizes).

I know that it is a short blurb of what I did, but it involved quite a bit of digging and updating through all the files. But all of these are worth it as it helps improve the readability of my personal domain.

And that's about all I want to talk about for this for now, I suppose.

Till the next update.

No comments: