From ActionScript 3 to C++ 2011

During the last Flash Onlince conference, I had the chance to share the latest work I’ve been involved in at Aerys with the rest of the Minko team. We’ve been working a lot on the next major version because we really want it to be a game changer for 3D on mobiles and the web.

You can read the original announcement for more details. But the big picture is that Minko is going to support WebGL. To introduce this new major feature we’ve created a first technical demonstration:

To do this, we are completely rewriting Minko using C++ 2011. This new version will include bindings for ActionScript 3 (and obviously Javascript too). So if you’re an AS3 developer: do not panic! You’ll still be able to leverage your AS3 skills with Minko. Yet if you want to learn new tricks now would be a good time and C++ is a good choice.

To understand the process of working with C++ code targeting the Flash platform and HTML5/Javascript, you can start by reading my slides:

To help AS3 developers migrating to C++, I’ve decided I’ll start gathering resources here on this very blog. If you are interested you can start by:

If you have suggestions regarding what you need to know in particular regarding C++ and especially cross-compilation targeting the Flash platform or Javascript, please let me know!

Stage3D Online Conference Slides

It was really awesome to be invited to talk about Minko today during the Stage3D online conference organized by Sergey Gonchar. He has done an excellent job in organizing this and I hope people enjoyed attending it as much as I enjoyed being a part of it.

minko_file_formats_comparison minko_editor_workflow minko_editor_triggers minko_darksider

You can watch the entire conference here.

As I promised, here are the slides to this presentation. They are pretty heavy because they embed some videos. Here is the outline of the content of the presentation:

  • Community SDK
    • Scripting
    • Shaders and GPU programming
    • Scene editor
  • Professional SDK
    • Physics
    • Optimizations for the web and mobile devices

At the end of the presentation, I also demonstrated how Minko can load and display Crytek’s Sponza smoothly on the iPad and the Nexus 7 in just a few minutes of work thanks to the editor and the optimizations granted by the MK format. You will soon here more about this very demonstration wiht a clean video demonstrating the app. but also the publishing process. This is incredibly cool since Sponza is quite a big scene with more than 50 textures including normal maps, alpha maps and specular maps for a total of 200+MB (only 70MB when published to MK).

Don’t forget to have a look at all the online resources for Minko:

As stated in the presentation, Minko’s editor public beta should start next week. So stay tuned!

New video: normal mapping and specular maps

Normal mapping has been available for a long time inside Minko. At first as part of the minko-lighting plugin and now in the core framework (in the dev branch only for now). The support for specular maps was added recently. Combining normal mapping and specular maps really gives good results. To show how far you can get, here is a little video demonstrating both techniques in the Minko editor:

New Documentation for Collada

Minko’s Collada plugin makes it possible to easily import Collada (*.dae) files using the assets loading API. Working with the API is quite simple and there already is a detailed tutorial about that. But the Collada format has its flaws and it is sometimes very complicated to get your exported files to work properly.

That’s why I’ve compiled a new documentation article that explains how to properly export Collada files from 3D Studio Max:

Export Collada files from 3D Studio Max on Aerys Hub

The article details the export procedure itself but also provides guidelines to make sure the exported file will display properly with Minko. It also give details about the supported features, including material, material properties and how the engine will use them. Similar articles will be released for different editors and formats according to the community’s needs.

Of course, as soon as the Minko editor and the MK format will be available we will strongly discourage the use of Collada files in production for many reasons (mainly performances). But you will still need to import those Collada files into the editor first… So here is a little video to show how simple importing assets is with the editor:

Yes: it’s as simple as a drag’n’drop! You will also notice that the textures load automatically and that the editor will let you play the animations and make sure everything works out of the box. The editor will be available in open beta in March…

Anamorphic Lens Flare

Update: I’ve just pushed a new SWF with a much better enhanced effect. I’ve tweaked things like the number of vertical/horizontal blur passes – which are now up to 3/6 – but also the flares’ brightness, contrast and dirt texture. I think it looks way better now!

Tonight’s experiment was focused on post-processing. My goal was to implement a simple anamorphic lens flare post-processing effect using Minko. It was actually quite simple to do. Here is the result:

minko_anamorphic_lens_flare_vipermarkII_2

The 1st pass applies a luminance threshold filter:

Then I use a multipass Gaussian blur with 4 passes: 3 horizontal passes and 1 vertical passes. The trick is to apply those 5 passes (1 luminance threshold pass + 4 blur passes) on a texture which is a lot taller than wide (32×1024 in this case). This way, everything gets streched when the flare are composited with the rest of the backbuffer.

JIT Shaders For Better Performance

The subject is really vast and complex and I’ve been trying to write an article about this for quite some time now. Recently, I made a small patch to enhance this technique and I thought it was a good occasion to try to summarize how it works and the benefits of it. In order to talk about this new enhancement, I would like to draw the big picture first.

The Problem

That might look like a complicated post title… but this is rather complex than really complicated. Here is how it starts: rendering a 3D object require to execute a graphics rendering program – or “shader” – on the GPU. To make it simple, let’s just say this program will compute the final color of each pixel on the screen. Thus, the operations performed by this shader will vary according to how you want your object to look like. For example rendering with a solid flat color requires different operations than rendering with per-pixel lighting.

Any programming beginner will understand that such program will test conditions – for example whether to use lighting or not – and perform some operations according to the result of this test. Yes: that’s pretty much exactly what an “if” statement is. It might look like programming trivia to you. And it would be if this program was not meant to be executed on the GPU…

You see, the GPU does not like branching. Not one bit (literally)! For the sake of parallelization, the GPU expects the program to have a fixed number of operations. This is the only efficient way to ensure computations can be distributed over a large number of pipelines without having to care too much about their synchronization. Thus, the GPU does not know branching and each program has a fixed number of instructions that will always be executed in the same order.

Conclusion: shader programs cannot use “if” statements. And of course, loops are out of the game too since they are pretty much pimped out “if” statements. Can you imagine what such logic would imply on your daily programming tasks? If you simply try to, you will quickly understand that instead of writing one program that can handle many different situations you will have to write many different programs that will handle a single situation. And then manually choose which one should be launched according to your initial setup…

Workarounds…

Mutables

The simplest workaround is to find “some way” to make sure useless computations do not affect the actual rendering operations. For example, you can “virtually disable” lighting by setting all lights diffuse/specular components to 0.

As you can imagine, this is really a suboptimal option. Performance wise, it’s actually the worst possible idea: a lot of computations happen and most of them are likely to be useless in most cases.

If/else shader intrinsic instructions

After a few years, shaders evolved and featured more and more instructions. Those instructions are now usable through higher level languages such as CG or GLSL. Those languages feature “if” statements (and even loops too). How are they compiled into shader code that can run on a GPU? Do they overcome the challenges implied by parallelization?

No. They actually fit in in a very straight forward and simple way. As a shader program must feature a single fixed list of instructions, the two parts of a if/else statement will both be executed. The hardware will then decide which one should be muted according to the actual result of the test performed by the conditional instructions.

The bright side is that you can use this technique to have a single shader program that handles multiple scenarios. The dark side is that this shader is still very inefficient and might eventually break the limit number of instructions for a single program. On some older hardware, the corresponding hardware instructions simply do not exist…

So even this “brand new” feature that will be introduced in Flash 11.7 and its “extended” profile is far from sufficient.

Pre-compilation

Some engines will use high level shader programming languages (like CG or GLSL) and a pre-compilation workflow to generate all the possible outcomes. Then, the right shader is loaded at runtime according to the rendering setup. This is the case of the Source Engine, created by Valve and used in famous games like Half Life 2, Team Fortress 2 or Portal.

This solution is efficient performance wise: there is always a shader that will do exactly and strictly the required operations according to the rendering setup. Plus it does not have to rely on some hardware features availability. But pre-compilation implies a very heavy and inefficient assets workflow.

Minko’s Solution

We’ve seen the common workarounds and each of them has very strong cons. The most robust implementation seems to be the pre-compilation option despite the obvious workflow issues. Especially when we’re talking web/mobile applications! But the past 10 years have seen the rise of a technique that could solve this problem: Just In Time (JIT) compilation. This technique is mostly used by Virtual Machines – such as the JVM (Java Virtual Machine), the AVM2 (Actionscript Virual Machine) or V8 (Chrome’s JavaScript virtual machine). It’s purpose is to compile the virtual machine bytecode into actual machine opcodes at runtime in order to get better performances.

How would the same principle apply to shaders? If you consider your application as the VM and your shader code as this VM execution language, then it all falls into place! Indeed, your 3D application could simply compile some higher level language shader code into actual machine shader code according to the available data. For example, some shader might compile differently according to whether lighting is enabled or not or even according to the number of lights.

With Minko, we tried to keep it as simple as possible. Therefore, we worked very hard to find a way to be able to write shaders using AS3. As the figure above explains, the AS3 shader code you write is not executed on the GPU (because that’s simply not possible). Instead, the application acts as a Virtual Machine and as it gets executed at runtime, this AS3 shader code transparently generates what we call an Abstract Shader Graph (ASGs). You can see it as an Abstract Syntax Tree for shaders (you can even ask Minko to output ASGs in the consoleas they get generated using a debug flag). This ASG in then optimized and compiled into actual shader code for the GPU.

For example: everytime you call the add() method in your AS3 shader code, it will create a corresponding ASG node. This very node will be linked with the rest of the ASG as you use it in other operations until it is finally used as the result of the shader. This result node becomes the “entry point” of the ASG.

Here is what a very simple ASG that just handles a solid flat color rendering looks like:

Here is what a (complicated) ASG that handle multiple lights looks like:

Your AS3 shader code is executed at runtime on the CPU to generate this ASG that will be compiled into actual shader code that will run on the GPU (in the case of Flash it will actually output AGAL bytecode that will be translated into shader machine code by the Flash Player). As such, you can easily perform “if” statements that will shape the ASG. You can even use loops, functions and OOP! You just have to make sure the shader is re-evaluated anytime the output might be different (for example when the condition tested in a “if” changes). But that’s for another time…

Using JIT shaders, Minko can efficiently dynamically compile shaders shaped by the actual rendering settings occuring at runtime. Thus, it combines the high performance of a pre-compilation solution while leveraging all the flexibility of JIT compilation. In my next articles, I will explain how JIT shaders compilation can be efficiently automated and how multi-pass rendering can also be made more efficient thanks to this approach.

If you have questions, hit the comments or post in the Minko category on Aerys Answers!

Teasing: Simple Minko Physics Stacking Stress Test

Minko physics is probably one of the most awaited features for this new year. I will not cover it extensively right now – I’ll rather post a few demos in a later post – but if you must know it was designed to be the fastest 3D physics engine for ActionScript 3 and the Flash platform.

I will share extensive details about this new physics engine during the Stage3D online conference in February. Make sure you attend: it’s online and it’s free! 🙂

The Demo

Anyway, I just wanted to tease the community with one of the stress tests we’re using here to benchmark the engine. It’s a really simple test and it uses only a very very small subset of the available features.

Here you go (just press “space” to throw some balls):

Read more about performances after the jump…