Sunday, February 21, 2021

Chasing the Combining Diacritical Characters in Chromium Problem

I spent a better part of the morning investigation a weird problem with respect to combining diacritical marks in the Unifont font as implemented in Chrome/Chromium.

Observe this combination of characters: . It should look like a `2' with a dot on top of it, something like this: .2. Unfortunately, if you are in Chrome/Chromium, you will see something like a `2' with a ``high dot'' to the right of it instead---the COMBINING DOT ABOVE character refuses to be composited upon the character representing the hindu-arabic numeral `2' character.

This is a problem when I have an article collection that requires the high dot to be at the right location to represent the correct pitch of the note in cipher notation. Generating images to replace it was a no-go---it was quite clear to me that Chromium is doing something rather stupid in its choice of how to interpret COMBINING DOT ABOVE.

I will first attach the composite screenshot indicating the solved form (on the left), the original discovered problem (in the centre), and the non-problem (on the right) before explaining what I did to discover the workaround. The first two lines were pure paragraph elements using the default font from my basic CSS file, while the next two lines used classes that forced the use of Unifont.
To be had, this was a problem that I had observed for a while, but had not really made any effort to studying the whys, until today when I was sufficiently annoyed.

The answer was to also provide the TrueType (TTF) font as part of the font-face specifications in the associated CSS file to act as the fallback font. Why the full TTF is required over the split WOFF2 for Chromium when Firefox/Waterfox can do it perfectly well is something I don't understand and don't want to waste time to find out.

Anyway, my first clue on the solution came about through inspecting the rendered font in Chromium's Dev-Tools screen. Instead of telling me that the rendered font was ``Unifont'', it was telling me nonsense like ``Arial'', which happens to be the browser-defined fallback font for Chromium that is subsumed under the ``sans-serif'' font-family. That didn't make any sense to me---Unifont is pan-unicode, and again, I have had no problems with it in Firefox/Waterfox, thus seeing that the browser decided to use a fallback font instead of the pan-Unicode one was patently impossible in my book.

I remembered from the old times of the use of the full Unifont WOFF/WOFF2 file from back in the day, and the problem persisted. So, the issue had nothing to do with the splitting of the WOFF2 file into fragments to facilitate better server response---it was likely something to do with the WOFF2 format that was not well-tolerated/parsed by Chromium.

On a whim, I tossed in the TTF version as a fallback to the split WOFF2 files, and what do you know, it worked. Except instead of a faster download of around 5 to 6 files of around 4 kiB, it was that, and the massive 13 MiB TTF file.

That workaround is just stupid. Either Firefox/Waterfox has a bug in font selection/rendering that is totally in my favour, or Chromium/Chrome has a bug in trying to interpret the proper Unicode code points and dealing with constructed combining diacritics.

The larger TTF file size has been mitigated somewhat through careful use of the GZip compression of the payload at the server side, but we're still talking about an extra overhead of around 15 MiB, which is the same amount of size used by all the rest of my content files.

But I keep telling myself, it is just a fallback, and when cached locally, should not keep downloading repeatedly.

------

In other news, I think that I can/will complete The Outer Worlds today. My next immediate goal will be to complete Deep Learning by the end of next week, and in the mean time, probably end up watching some TV series/anime/movies in the upcoming week just to change things up a little. Reading is fun and all, but it's probably cool to do other things as well, otherwise in what way is it considered a break/sabbatical?

Till the next update, I suppose.

No comments: