WebGL™ Optimizations for Mobile
Lorenzo Dal Col
Senior Software Engineer, ARM
1
Agenda
1. Introduction to WebGL™ on mobile
Rendering Pipeline
Locate the bottleneck
2. Performance analysis and debugging tools for
WebGL
Generic optimization tips
3. PlayCanvas experience
WebGL Inspector
4. Use case: PlayCanvas Swooop
ARM® DS-5 Streamline
ARM Mali™ Graphics Debugger
5. Q &A
2
Bring the Power of OpenGL® ES to Mobile Browsers
What is WebGL™? Why WebGL?
A cross-platform, royalty free web It brings plug-in free 3D to the web,
standard implemented right into the browser.
Low-level 3D graphics API Major browser vendors are members of
Based on OpenGL® ES 2.0 the WebGL Working Group:
A shader based API using GLSL Apple (Safari® browser) Mozilla (Firefox® browser)
(OpenGL Shading Language) Google (Chrome™ browser) Opera (Opera™ browser)
Some concessions made to JavaScript™
(memory management)
3
Introduction to WebGL™
How does it fit in a web browser?
You use JavaScript™ to control it.
Your JavaScript is embedded in HTML5 and uses its Canvas element to draw on.
What do you need to start creating graphics?
Obtain WebGLrenderingContext object for a given HTMLCanvasElement.
It creates a drawing buffer into which the API calls are rendered.
For example:
var canvas = document.getElementById('canvas1');
var gl = canvas.getContext('webgl');
canvas.width = newWidth;
canvas.height = newHeight;
gl.viewport(0, 0, canvas.width, canvas.height);
4
WebGL™ Stack
What is happening when a WebGL page is loaded
User enters URL
HTTP stack requests the HTML page Browser
Additional requests will be necessary to get
User Space
JavaScript™ code and other resources
WebKit JavaScript Engine
JavaScript code will be pre-parsed while
loading other assets and the DOM tree is
built OpenGL® ES
HTTP Stack libc
JavaScript code will contain calls to the Library
WebGL API
They will go back to WebKit®, which calls ARM® Mali™
OpenGL® ES 2.0 library Linux Kernel
Kernel Space
GPU Driver
Shaders are compiled
Textures, vertex buffers & uniforms must be
loaded to the GPU ARM Mali ARM Cortex®-A
Hardware
GPU CPU
Rendering can start
See Chromium Rendering Stack:
http://www.chromium.org/developers/design-documents/
5 gpu-accelerated-compositing-in-chrome
Locate the Bottleneck
CPU
The frame rate of a particular WebGL™
application could be limited by:
Vertices
CPU Textures
Vertex Shader Uniforms
Fragment Shader
Memory
Bandwidth
Textures
Vertices Triangles
Uniforms Pixels
Fortunately we have tools to Uniforms Varyings Varyings
understand which one is the culprit
Fragment
Vertex Shader
Shader
6
Frame Rendering Time
Synchronous Rendering
// THIS DOES NOT MEASURE GPU RENDERING
var start = new Date().getTime();
gl.drawElements(gl.TRIANGLE, …);
var time = new Date().getTime() - start;
Deferred Rendering
// THIS FORCES SYNCHRONOUS RENDERING
// (BAD PRACTICE)
var start = new Date().getTime();
gl.drawElements(gl.TRIANGLE, …);
gl.finish(); // or gl.readPixels…
var time = new Date().getTime() - start;
7
Performance Analysis & Debug
DS-5 Streamline Mali Graphics Debugger Offline Compilers
• Understand complexity of GLSL
• System-wide performance analysis • API Trace & Debug Tool
shaders and CL kernels
• Combined ARM® Cortex® • Understand graphics and compute
issues at the API level • Support for Mali-4xx and Mali-
Processors and Mali™ GPU visibility
T6xx GPU families
• Debug and improve performance at
• Optimize for performance & power frame level
across the system
• Support for OpenGL® ES 1,1, 2.0, 3.0
and OpenCL™ 1.1
8
PlayCanvas
SWOOOP
HTML5/WebGL™ game built with
PlayCanvas
Demonstration that high-quality arcade
gaming is possible with HTML5+WebGL
across desktop, tablets and smartphones
Cross platform touch, mouse and
keyboard controls
Low poly art style
Flat shaded surfaces with ambient
occlusion combined with diffuse color
9
PlayCanvas Swooop Gameplay
Running in the Chrome™ browser on a Google Nexus 10 with Android™ 4.4
http://swooop.playcanvas.com/
10
PlayCanvas Experience
WebGL Inspector can be very useful to
optimize the stream of commands that
are submitted to WebGL™
It's very good at highlighting redundant
calls
It's also an important debugging tool (i.e.
debugging draw order and render state
problems)
GLSL Optimizer has been used to check http://benvanik.github.io/WebGL-Inspector/
that the GLSL that PlayCanvas https://github.com/aras-p/glsl-optimizer
procedurally generates is reasonably
optimal
11
ARM® DS-5 Streamline
12
Performance Optimization
How to reduce the CPU and system workload
Reduce your number of draw calls Avoid unnecessary WebGL™ calls
Models using the same shaders can be (gl.getError, redundant stage changes, etc.)
batched to reduce draw calls WebGL Inspector shows redundant calls
Even when they have different shaders, Models can be sorted to avoid state changes
sometimes batching makes sense
Pre-calculate positions
Do not force a pipeline flush by reading
Use typed arrays instead of JavaScript™
back data (gl.readPixels, gl.finish, etc.)
object arrays:
Move from CPU to GPU var vertices = new Array(size);
Rotation matrix computation can be
moved to the vertex shader (by passing a var vertices = new Float32Array(size);
timestamp) See also: https://developer.mozilla.org/en-
US/docs/Web/JavaScript/Typed_arrays
13
Fragment Bound and Bandwidth Optimizations
Reduce Bandwidth Usage Reduce the Fragment Activity
Use texture mipmapping Render to a smaller framebuffer
Reduce the size of the textures This will upscale the rendered frame to the
size of the HTML canvas
Reduce the number of vertices and
varyings Move computation from the fragment to
the vertex shader (use HW interpolation)
Interleave vertices, normals, texture
coordinates Consider overdraw
Most of these optimizations will also cause
a better cache utilization
14
ARM® Mali™ Graphics Debugger
15
Frame Capture
16
Overdraw
This is when you draw to each pixel on
the screen more than once 2x
Drawing your objects front to back
instead of back to front
reduces overdraw
Also limiting the amount of
transparency in the scene can help
4x
1x
Overdraw
17
Shader Map and Fragment Count
18
Inspect the Tripipe Counters
GPU Cycles 450M
Tripipe Cycles 423M
Load & Store 185M
Texture 140M
Arithmetic 133M
19
Shader Optimization
Since the arithmetic workload is not
very big, we could reduce the number of
uniforms and varyings and calculate
them on-the-fly
Reduce their size
Reduce their precision: all the varyings,
uniforms and local variables are highp, is
that really necessary?
Use the ARM® Mali™ Offline Shader
Compiler!
http://malideveloper.arm.com/develop-for-
mali/tools/analysis-debug/mali-gpu-offline-shader-
compiler/
20
References
Professional WebGL Programming, Andreas Anyuru (2012)
Debugging and Optimizing WebGL Applications, Ben Vanik and Ken Russell (2011)
Where to find more info?
http://www.khronos.org/webgl/
http://en.wikipedia.org/wiki/HTML5
http://en.wikipedia.org/wiki/Canvas_element
http://www.khronos.org/webgl/wiki/Tutorial
https://playcanvas.com/
21
Thank You
The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM Limited (or its subsidiaries) in the EU
and/or elsewhere. All rights reserved. Any other marks featured may be trademarks of their respective owners
22