Sorry, you need to enable JavaScript to visit this website.

Feedback

Your feedback is important to keep improving our website and offer you a more reliable experience.

Optimizing ART Interpreter for YouTube* Launch Time

BY 01 Staff (not verified) ON Sep 30, 2019

INTRODUCTION

YouTube* is a free video sharing space that lets people upload, view, and share videos. This is a widely used app from Google* whose performance is expected to be higher on Intel® Platforms. YouTube is a built-in app on most of the Android*-based systems such as smartphones, tablets, and Chromebooks*. A Chromebook is a laptop of a different breed, which primarily runs on ChromeOS* and the Android apps run on it using containers. The performance of Android apps becomes a key differentiating factor when selecting Chromebooks.

App launch time is an important factor of user experience, so we explored ways to improve the warm launch time of the YouTube app. (This app is built-in app available in the Chromebook by default so there is no case of cold launch time.) In our work, we supported the Intel® AVX / AVX2 instruction set for Android ART Interpreter. Our assessment results exhibit that our implementation has achieved 11% warm launch time improvement in contrast with the Intel® Streaming SIMD Extensions (Intel® SSE) instruction set. The solution proposed in this article can be used in any other Intel® Core™ platform that has been enabled with Android OS.

MOTIVATION

A Chromebook is a laptop or tablet running the Linux-based Chrome OS as its operating system. The devices are primarily used to perform a variety of tasks using the Chrome browser, with most applications and data residing in the cloud rather than on the machine itself. This product became very popular when Android apps were made available on Chromebooks via the Play application distribution platform. The Android apps run using ARC++ container.

One such built-in app is YouTube which is the most popular video sharing and viewing space. YouTube has become the platform for video watching similar to what Google became for search engines years ago. In fact, YouTube is now the world’s 2nd biggest search engine, for more than 1.8 billion people registered on the site to check it daily to watch 5 billion videos for the last 13 years since it was launched back in 2005 (2). YouTube is ranked the 26th app of the most used apps and has a market share of about 18% over the top video content streaming. About 63 million viewers watch YouTube every day and about 1 billion hours of videos are watched. Every 60 seconds, more than 300 hours of HD quality video is uploaded to YouTube to contribute to already massive collection of 1,300,000,000 videos.

APPROACH

One of the key aspects of user experience is the launch time of the applications. Launch time is the time taken by the application to load and make its content visible to the user. Users expect apps to be responsive and fast to load (1). An app with a slow start time doesn’t meet this expectation and can be disappointing to users. This sort of poor experience may cause a user to rate your app poorly on the Play store, or even abandon your app altogether. This article describes how we have optimized the warm launch time of the YouTube app by using the Intel® AVX / AVX2 instructions in the ART interpreter.

As YouTube is the top ranked app, we profiled various use cases in YouTube such as Cold and Warm Launches using VTune™ Amplifier Software. This performance profiler is a commercial application for software performance analysis of 32 and 64-bit x86 based machines.

Refer to the below profiling data for Warm Launch Scenario shown in Figure 1 for YouTube App.

Figure 1: VTune™ Profile Data for YouTube App before AVX/AVX2 Implementation– Warm Launch

When we analyzed the above profile, the major contribution was seen in libart.so. Further breakdown of the reports showed that ART Interpreter functions CPU utilization was high. Based on code walkthrough, we observed that the ART Interpreter used Intel® SSE instructions for SIMD operations, which can be modified to use Intel® AVX/AVX2 instruction set for additional performance improvement.

An interpreter directly executes instructions written in a programming or scripting language without previously converting them to an object code or machine code. In terms of Android, this includes dex files that get converted to dex byte code (Odex) using the dex2oat compiler. The interpreter directly executes the Odex Code. Figure 2 shows the App execution with Interpreter flow.

Figure 2: App Execution with ART Interpreter

In our work, Binary Operations that use Intel® SSE instructions were replaced with Intel® AVX / AVX2 instruction set to improve performance. Refer to the below Code Snippet (Figure 4 & Figure 5) after implementing Intel® AVX / AVX2 in two of the files, namely sseBinop.S and sseBinop2Addr.S respectively. These files are used in the interpreter to match the input pattern and generate native code accordingly.

From the VTune™ Profiles, we saw that the contribution of libart was reduced and the time taken by the YouTube app process came down significantly from 1.997s to 1.830s. Refer to Figure 3 for profile details.

Figure 3:  VTune™ Profile Data for YouTube App after AVX / AVX2 Implementation – Warm Launch

Figure 4:  Intel® AVX / AVX2 Implementation in sseBinop.S

Figure 5:  Intel® AVX / AVX2 Implementation in sseBinop2Addr.S

In the standard Intel® SSE implementation, movs instructions are used for move operations. In the Intel® AVX / AVX2 implementation, vmulss, vmovss, and vdivss instructions were used, which uses XMM / YMM registers. The vmov, vcvtsi, vsubsd, vmulsd, vxorpd instructions have latency compared to Intel® SSE instructions, this in turn speeds up the performance to a greater extent. More information related to the instruction set is available in Intel® Instruction Set Manuals. Refer to the below table for Instruction Latency [3] and throughput data for mov and math operations for Intel® SSE vs Intel® AVX/AVX2 instruction set.

 

Instructions

Intel® SSE

 

Intel ® AVX / AVX2

 

 

Latency

Throughput

Latency

Throughput

Multiply / Add / Sub Operations

7

2

5 -- 6

1

MOV Operations

4

2

3

0.5

DIV Operations

32

32

19--35

16--28

RESULTS

Our Evaluation Platform is HP Chromebook, it uses Intel® Core™ i5-7Y54 processor with 2 cores. The processor base frequency is 1.20 GHz and can reach up to 3.2 GHz in Turbo mode. The memory available in the device is 8 GB. Latest ChromeOS version R73 with Android Pie is loaded in the device.  We have ensured that “Internet Speed Test” was executed before collecting the data to confirm the internet bandwidth is the same while execution of the tests. This app is a built-in app.

The Overall average performance gain for YouTube is given in the below Table 2 is 11%.

Scenario

Activity Name

Intel®
SSE (ms)

Intel®
AVX /AVX2 (ms)

%
Performance Gain

Warm

com.google.android.youtube/com.google.android.apps.youtube.app.WatchWhileActivity

331.66

296.33

11%

Table 2: You Tube Warm Launch Time Data on HP Chromebook

A similar experiment was carried out on an Intel® NUC KIT NUC715DNHE device which has Intel® Core™ Processor enabled with Android OS. The Evaluation Platform is Intel® NUC KIT NUC715DNHE, it uses i5-7260U processor with 2 cores. The processor base frequency is 2.20 GHz and can reach up to 3.4 GHz in Turbo mode. We have ensured that “Internet Speed Test” is executed before collecting the data to confirm the internet bandwidth is same while execution of the tests. It shows about 20% improvement compared to Intel® SSE Implementation.

Scenario

Activity Name

Intel® SSE (ms)

Intel®
AVX

/AVX2 (ms)

%
Performance
Gain

Warm

com.google.android.youtube/com.google.android.apps.youtube.app.WatchWhileActivity

318

254.667

19.91%

Table 3: You Tube Warm Launch Time Data on Intel® NUC KIT NUC715DNHE

ABOUT THE AUTHORS

This article was written by Jaishankar Rajendran, Biboshan Banerjee, and BC, Anuvarshini who are members of Google OS Run Times Team at Intel Technology India Pvt. Ltd.

NOTICES AND DISCLAIMERS

Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.

TEST CONFIGURATION

Software: Android 9.0, Kernel 4.19, OpenGL ES 3.0, Vulkan 1.0 Support, Fast Boot

Hardware: Intel® Core™ i5-7260U Processor, 2x2.2 GHz CPU, 4GB DDR RAM

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at www.intel.com

REFERENCES

[1]   App Launch Time Documentation [Online] Available: https://developer.android.com/topic/performance/vitals/launch-time

[2]   You Tube App Usage Statistics [Online] Available: http://www.businessofapps.com/data/youtube-statistics/#1

[3]   Instruction Latency and Throughput  [Online] Available: https://www.agner.org/optimize/instruction_tables.pdf

 

Intel Core and VTune™ are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.