Performance Improvements in Chrome's Non-Latin Text Rendering
- We worked with the Blink team to achieve speedups in complex font layout and character fallback
- Blink traditionally uses two code paths for text measurement and layout: The simple and complex code path, many non-Latin languages as well as advanced typography require the complex path
- Font Fallback has significant impact on page rendering performance
- We improved Blink's performance when displaying non-Latin pages by speeding up text measurement & font fallback
- We aim for moving towards a unified font code path in Blink and work with Google on this topic
Measuring Text during Layout: Blink's Simple and Complex Code Path
One of the most important stages to render a page is text layout. Blink, the layout engine of Chrome, needs to compute how much text it can fit in to one line before the line break each time it loads a page. In order to accomplish that it needs know the length of a so called run of text. Measuring text width in the
Font::width function is heavily used during text layout. Layout performance and the time it takes to first display the text on a page directly depends on computing the width of text fast.
In languages that use Latin Script as their writing system, this process is rather intuitive. Just like Gutenberg would have placed the individual lead glyphs on a line, Blink can just get the width of each glyph and add them up. For example for the world "Hello", it just adds the width of the glyph "H", "e", "l" and so on. This approach is called Simple Path in Blink.
Complex Code Path For Shaping
However, there are a large number of languages that do not use Latin Script as their writing system. Let's call them non-Latin languages for the purpose of this article. When displaying such languages the layout engine needs to look at the byte stream of characters and analyze it before displaying it. The choice of what a character in the text looks like on the screen is contextual. Depending on a character's position in the text, whether it is at the beginning, the middle or the end of a word, it may assume a different shape. This analysis is called text shaping.
Example of Text Shaping
The image illustrates this process for the Arabic word "marħaba", which means "hello". The Arabic characters MEEM, REH, HAH, BEH, ALEF are transmitted over the network, then their order is reversed since Arabic is a right to left language, then they are shaped and joined into the graphemes that you see below.
An analogous process is required for a large number of languages such as Khmer and Indic languages.
A program library which performs this analysis task is called a shaper. At the moment, Blink still uses two shapers, an open source library called HarfBuzz, maintained by Behdad Esfahbod on Windows, Android and Linux. On Mac, it is using a system API called CoreText.
Complex Code Path for Advanced Typographic Features
While for some languages, shaping is essential and the displayed text would otherwise be perceived as incorrect or unreadable, text shaping can also be used to enhance the user experience and increase the perceived elegance and accuracy of text.
Examples of Special Ligatures in the Zapfino Font
Using text shaping on Latin text enables advanced typographic features. One example is shown here. The Zapfino font has calligraphic ligatures for frequently used abbreviations, and as a form of a font easter egg, a special ligature for its own name.
Another important aspect of displaying web pages accurately and in high visual quality is font fallback. Not every font contains all glyphs to display every language. Some fonts only cover the Latin alphabet, some only CJK (Chinese, Japanese Korean) glyphs, and so on. But instead of just displaying broken boxes, the browser tries to make sure that it can display as much of the text as possible. To do that, the browser needs to select a different font, when the chosen font for the text does not contain the right glyphs. This process is called font fallback.
Illustration of Lacking Character Fallback in Khmer
Before displaying text, Blink passes through the text and checks whether the currently selected font has the required glyph. If the glyph is missing, it asks the system for an alternative font that contains the glyph. On Latin web pages, this is rarely necessary - most of the fonts cover the range of Latin characters. However, on non-Latin pages, font fallback can have a significant impact on performance in Blink. The illustration shows an example of failed and successful font fallback when displaying the Wikipedia page for the Khmer language.
As previously mentioned, both stages of text processing, shaping and character fallback, are critical for the layout performance in Blink. Together with engineers on the Blink team, we identified several bottlenecks and ways to improve performance.
Better caching and less system calls for font fallback
In crbug.com/266214 Eric Seidel introduced an idea for more effectively caching font fallback information, resulting in a significantly lower number of system calls to fontconfig, the system library for managing fonts on Linux. In order to enable this optimization, we had to refactor the way Blink stores font fallback information. Once this was fixed, we were able to apply the original idea and drastically improve font fallback on Linux, with tremendous effects on page load time of Japanese and Chinese pages.
Removing obsolete legacy optimizations
Despite the fact that forking Blink from WebKit happened more than a year ago, Blink still carries platform specific code from WebKit times, which in many cases is obsolete and no longer effective.
For example, Blink used a special code path for measuring text width on Mac, which was supposed to be an optimization. However, removing this legacy code, lead to speedups and better code maintainability.
WidthCache more effectively
Emil A. Eklund and the author worked together on improving the use of the
WidthCache, a class that helps with caching previously calculated widths information for runs of text. The WidthCache was not previously used in many non-Latin text scenarios. Extending its use in those cases lead to a significant performance improvement.
Unifying the Architecture
Having two separate code paths in Blink has pros and cons. On the one hand, it does currently improve performance for cases where the simple path works reliably. However, on the other hand, web designers expect advanced typographic features to be generally available and used by default. Plus, the number of users of non-Latin pages grows quickly. The complex path is used in more and more cases. Also, having two separate paths for text measurement can be a source of bugs. It is difficult and error prone to maintain identical behavior among the two code paths.
Long term, we aim for converging the two code paths into one with no regressions in performance. This will improve the user experience through better accuracy of text rendering. In addition, it will make it easier to understand and maintain Blink's code.
One required step is to first make sure that Blink only uses one text shaper, HarfBuzz, on all platforms.
Code Camps and Close Cooperation with Blink team
At Intel's Open Source Technology Center, we closely work with the Blink team to improve Blink's text rendering performance - especially in regards to non-Latin text and fonts. Next up, in September, we host a coding camp in OTC's Finland office with Google engineer Behdad Esfahbod to jointly continue these efforts.
Dominik Röttsches, Software Engineer in Intel OTC Finland, Twitter: @abrax5
Performance Improvements in Chrome's Non-Latin Text Rendering is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.