Implement geometry instancing in OpenGL

Source: Internet
Author: User

Disclaimer: This article is for personal study and exchange only and is copyrighted by the original author.

Translator: tyxxy

Emial: tyxxyhm@hotmail.com.

If you need to reprint please indicate the source: http://tyxxy.spaces.live.com/

Original address: http://blog.benjamin-thaut.de /? P = 29

I am also in the learning stage. If the translation is inaccurate, please kindly advise me. You are also welcome to discuss with colleagues of the same interest. In order to ensure that the original intention is not misinterpreted, the original English text is reserved. If there is any unclear Translation, please refer to the original text.

Introduction

Introduction

To coincide with the release of dx10 class GPUs, instancing has become available in OpenGL due to the ext_draw_instanced extension.

To synchronize with dx10 GPU releases, OpenGL has added the extended ext_draw_instanced to enable OpenGL to be instantiated.

By itself this extension, which enables you to draw a vertexbuffer multiple times in conjunction with a instance id accessable in the vertexshader, is of little use.

The extension itself allows you to draw the same vertexbuffer multiple times, And vertexshader can access the instance id, but it is not very useful if you have such extensions.

But if you look closer at the extension string you will notice another new extension "ext_bindable_uniform" which enables you to specify a buffer object as data source for an uniform. with these the glsl shader has access to much more data. with a geforce8 it is possible to have 12 of these buffers each having a maximum of 64kb, thus in total of 768kb can be stored. the most important use of these buffers is that data only has to be uploaded once to the card, where it can later by reaccessed without the need to resend the data. this allows you to store the worldmatrix transformations of the drawn objects on the graphics card; The subsequentual performance increase is obvious.

However, if you carefully check the extended string, you will notice another new extended "ext_bindable_uniform". You can use this extended extension to specify a cache object as a unified parameter data source. With this, glsl shader can access much more data. In the geforce8xxx video card, there can be 12 such caches, each of which can store up to 64 KB data, so that a total of KB data can be stored. The most important purpose of these caches is that data can be transmitted to the video card at a time, and data will not need to be sent again for access. You can store the world matrix of an object in a video card, which can significantly improve the performance.

How-

How to do

First we have to create a buffer on the graphics card which we store the objects world matrix data in, since the amount of data a buffer can hold is limited we have to divide the data between varous buffers.

First, create a buffer on the video card to store the world matrix of the object. Since the data size that a single buffer can store is limited, You have to split it and store each part in a different buffer.

See list of programs 1

C ++ code:

List 1: create world matrix buffers on the video card

1. mat4 * worldmats;

2. // how should of the objects we wish to draw

3. Int inumberofinstances;

4. // buffer Array

5. gluint * uniformbuffers;

6. // the size of a single Buffer

7. int * uniformbufferssize;

8. // total number Buffers

9. Int anzbuffers;

10.

11. Void Init (){

12.

13 ....

14.

15. inumberofinstances = 65535;

16. // create the world matrix of all instances

17. worldmats = new mat4 [inumberofinstances];

18.

19 ....

20.

21. # define draws 512.

22. Int remaining = inumberofinstances;

23.

24. uniformbuffers = new gluint [anzbuffers];

25. uniformbufferssize = new int [anzbuffers];

26.

27. For (INT I = 0; I <anzbuffers; I ++ ){

28. // the size of the remaining/Current Buffer

29. uniformbufferssize [I] = remaining;

30. If (uniformbufferssize [I]> draws)

31. uniformbufferssize [I] = draws;

32. // create and bind the buffer

33. glgenbuffers (1, & uniformbuffers [I]);

34. glbindbuffer (gl_uniform_buffer_ext, uniformbuffers [I]);

35. // establish the size and sort of the buffer

36. // The buffer has to be at least the same size

37. // The uniform in the shader

38. glbufferdata (gl_uniform_buffer_ext, 16 * sizeof (float) * draws, null, gl_static_read );

39. // send the data

40. glbuffersubdata (gl_uniform_buffer_ext, 0, 16 * sizeof (float) * uniformbufferssize [I], & worldmats [I * draws]);

41. // count down the remaining matrixs

42. Remaining-= draws;

43 .}

44.

45. // finished, thus unbind the buffer

46. glbindbuffer (gl_uniform_buffer_ext, 0 );

}

Now that the data is stored the graphics card, we can turn to the actual rendering (knodge DGE of vertexbuffers and glsl shaders is assumed)

In this way, the data is saved to the video card. Now let's look at the actual rendering code (assuming you already have knowledge about vertex buffer and glsl shader ). Code List 2.

List 2: rendering code

1. Void draw (){

2.

3 ....

4.

5. // loop through the buffers

6. For (INT I = 0; I <anzbuffers; I ++ ){

7. instancingshader-> bindbuffertouniform (0, uniformbuffers [I]); // bind the cache to a unified Parameter

8.

9. // bind the instancing shader

10. instancingshader-> Use ();

11. // draw

12. wuerfel-> drawinstanced (uniformbufferssize [I]);

13. // unbind the current instancing shader

14. unloadshader ();

15 .}

16. // unbind the buffer (bind to 0)

17. instancingshader-> bindbuffertouniform (0, 0 );

18 .}

Instancingshader-> bindbuffertouniform (0, uniformbuffers [I]);

Inside this function I bind the buffer to the uniform, with the OpenGL Function

Gluniformbufferext (Program, location, buffer)

1/The handle/ID of the shader program object

2/The location of the uniform

3/The buffers ID.

In the instancingshader-> bindbuffertouniform (0, uniformbuffers [I]); function, call the OpenGL function gluniformbufferext (Program, location, buffer) to bind the cache to the unified parameter.

The gluniformbufferext parameter of the function is parsed as follows:

1. Program: shader program object handle/ID;

2. Uniform: Uniform parameter location

3. Buffer: cache ID

The determination of this uniforms location is similar to the usual method of locating uniforms in glsl. it's very important that the binding of the buffer happens before the use of the shader. if the shader is currently in use the binding attempt will be simply ignored.

The location determination of the unified parameter is similar to the positioning method of the common unified parameter in glsl. It is important to bind the cache before using the cache by the shader. If the current shader is in use, the attempt to bind will be ignored.

Wuerfel-> drawinstanced (uniformbufferssize [I]);

The actual rendering. this is the same as the standard vertexarray methods should t that gldrawarraysinstancedext is used instead of gldrawarrays with the last parameter containing the number of instances to be drawn. for indexed VBO's this wocould be gldrawelementsinstancedext. instancing objects that are not constructed from triangles or quads are more difficult to draw since multidrawarraysinstanced etc are not available. to draw models that are constructed from triangle strips you must use an extra instance for each triangle strip.

The function wuerfel-> drawinstanced (uniformbufferssize [I]) is actually called for plotting. The called function gldrawarraysinstancedext has one more parameter that indicates the number of instances to be drawn, and is similar to gldrawarrays for standard vertex arrays. The gldrawelementsinstancedext function is used for VBO indexing. It is difficult to draw an instance object that is not built by a triangle or a quadrilateral, because there is no multidrawarraysinstanced method. To draw a model built on multiple triangle strip bands, you must use an additional instance for each triangle strip.

Last but not least the glsl instancing shader

Last but not least, glsl instantiates the shader.

List 3: glsl shader

C:

1. # versions 120

2. # extension gl_ext_bindable_uniform: Enable

3. # extension gl_ext_gpu_shader4: Enable

4.

5. Bindable uniform mat4 worldmats [2, 512];

6.

7. Void main (void ){

8. vec4 position = worldmats [gl_instanceid] * gl_vertex;

9. Position = gl_modelviewmatrix * position;

10. gl_position = gl_projectionmatrix * position;

11.

12. vec3 normal = mat3 (worldmats [gl_instanceid]) * gl_normal;

13. Normal = mat3 (gl_modelviewmatrix) * normal;

14.

15. vec3 lightvectorview = normalize (gl_lightsource [0]. position. XYZ-position. XYZ );

16.

17. gl_frontcolor = (gl_lightsource [0]. Diffuse * max (dot (normal, lightvectorview), 0.0) + gl_lightsource [0]. Ambient + 0.2) * gl_color;

18 .}

The defines at the beginning are necessary to specify that we use the Shader Model 4.0 and the ext_bindable_uniform extension. the most important parts of the shader are the first 3 lines of the main function. there the individual world matrix of each instance is accessed with the instance id to compute the correct position of each vertex. in this case the view matrix wocould be the OpenGL Model View matrix. the rest of the main functions creates a simple per vertex diffuse lighting as the fixed function pipeline does. to avoid problems with transforming normals into the WorldSpace, avoid scaling within the matrices. if you want this method to work in all cases you have to compute a normal matrix per instance by yourself and pass it to the shader too.

The definition at the beginning is required and is used to indicate that the Shader Model 4.0 and ext_bindable_uniform extensions will be used. The most important part of the shader is the first three sentences of the main function. The world matrix of each instance is accessed by instance id and used to calculate the correct position of each vertex. In this case, the view matrix will be the modelview matrix in OpenGL (the modelview matrix without the model matrix is of course the view matrix ). The rest of the main function creates a simple vertex-by-vertex diffuse illumination, which is the same as the work done by the fixed function pipeline. Do not scale the Matrix to avoid the problem of converting the normal to the world space. If you want this method to work normally in all circumstances, you need to calculate a normalized matrix for each instance and upload it to the shader.

Performance

Performance Evaluation

In the following disince we compare the three drawing methods (X axis is the number of drawn instances per frame, the Y axis shows the frames per second)

In, we compared three painting methods (the X axis is the number of drawing instances per frame, and the Y axis is FPS)

 

We can conclude that ext_draw_instanced is about twice as fast as possible Dias pseudo instancing which in turn is about twice as fast as the standard drawing method. with instancing a geforce 8800 GTX is capable of drawing 131072 cubes 45 times a second.

We infer that ext_draw_instanced is about twice faster than the Dias pseudo-instantiation, and the pseudo-instantiation is twice faster than the conventional painting method. Using instantiation technology, geforce8800gtx can draw 131072 square boxes 45 times per second.

Since the number of objects the user wishes to draw at the same time varies, I 've benchmarked various sizes. in the following dimo-i 've drawn 131072 cubes (X axis is the number of cubes drawn with one call, the Y axis shows the frames per second ).

Since the number of objects to be drawn is constantly changing each time the user draws, I have made a benchmark for the number of objects to be drawn. In the following chart, I drew 131072 cubic boxes (X axis indicates the number of cubic boxes drawn each time call, and Y axis indicates FPS)

Drawing 16 cubes a call, this method has no performance increase compared to pseudo instancing. With group sizes of 256 or larger the performance increase is much smaller (A 0.5fps with each doubling of the groupsize)

Due to the buffer size limitation of ext_bindable_uniform the maximum group size is 1024.

Each call draws 16 cubes. This method does not improve the performance compared with pseudo-instantiation. When the number of groups is greater than or equal to 256, the performance improvement is limited (the group size is doubled every time, increasing by FPS ).

Because of the ext_bindable_uniform cache size limit, the maximum group size is 1024.

Conclusion

Instancing performs best if the objects to be drawn are static, if the objects are moving, requiring you to update the world matrices each frame, the benefits over pseudo-instancing are greatly bounded. because you can update the data at once and not send the world matrices one by one, as it is done by pseudo instancing, it wocould be still faster than pseudo instancing. to sum up the new draw call is definitely valid tive and coupled with the Bindable uniform extension very useful. the downsides though are at the moment only a limited number of graphics cards support the extensions as well as the current driver's instability with their usage. I regularly experienced driver memory access violations when I wanted to terminate my program.

Conclusion:

Instantiation works best when the drawn object is static. If the object is constantly moving, you need to update the world matrix for each frame. This reduces the advantage of pseudo-instantiation. Because you can update data at a time, but cannot send world matrix one by one, it is just like pseudo-instantiation, but it is faster than pseudo-instantiation. In general, the new draw call must be very efficient. It is particularly useful when used together with the configurable unified parameter extension. However, currently, the video card that supports this extension is very limited, and the current driver is not stable when this method is used. When I end the program, I often encounter driver memory access violations.

 

This article from the csdn blog, reproduced please indicate the source: http://blog.csdn.net/swq0553/archive/2010/12/08/6063654.aspx

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.