Improved Particle System-GPU implementation

Source: Internet
Author: User
Tags mul

By fannyfish

Blog: http://blog.csdn.net/fannyfish

Amma@zsws.org

Introduction
The performance of the instant particle system is subject to two factors: fillrate and data transmission between CPU-GPU. Fill rate is the prime number of images that can be rendered by the GPU per frame. When the particles are large and many particles overlap, the performance will be significantly affected (for example, using the particles to simulate a large area of water mist and smoke ). The general practice is to first perform physical operations on the CPU and then transmit the calculation results to the GPU for rendering. When the number of particles is large (such as 100000), the time of calculation and the time of transmission of the CPU-GPU are unacceptable for real-time calculations (such as using particles to simulate large areas of rain and snow ).

In my project, a large number of particles are used: Special scene effects, Special attack effects, and even interface effects, which are everywhere. The current particle system uses the CPU for Physical Operations. Coupled with the physical engine, skeleton fusion, and game logic, CPU becomes a System Bottleneck. Every GPU frame in the game needs to wait for a short time. The better the graphics card, the worse the CPU. How to transfer the physical computing of the CPU to the GPU to achieve load balancing is the key to optimization.

Design
Stateless vs State-preserving)
1. State independence refers to the computation of particle data based only on attributes such as initial position and speed.

2. State correlation refers to the Data calculation of particles. It can be calculated based on the position, speed, and other attributes of the previous state.

Processing state-related particle systems on GPUs requires storing particle states with multiple textures, which requires a high level of graphics cards.

On the contrary, State-independent particle systems have low video card requirements and are relatively simple to implement. Therefore, we should first consider implementing such a particle system.

Relationship with the original Particle System
Using the original particle system involves the following steps:

1. Art creates particles through the editor

2. Specify the particle Renderer, transmitter, and effect generator in the editor. The Renderer is responsible for creating and deleting the rendering data (such as the billboard and model) corresponding to the particle and maintaining the rendering status. Includes the billboard Renderer, model Renderer, billboard tail Renderer, and model tail Renderer.

3. Use the client and submit the feedback to art adjustments.

Processing the particle system on the GPU will use different rendering data and states, so a new Renderer is derived: shader Renderer. In this way, the above steps will not change. Art only needs to be familiar with the new parameters of the Renderer.

Rendering data and status
Store the initial attributes of vertices on unused texture coordinates and color vertexbuffer. Including position, color, position on a particle Quad (upoffset, leftoffset), speed, and survival time.

Set constant registers, including the world-view-proj matrix, right vector up vector of the eyes (used to calculate different camera-oriented methods from upoffset and leftoffset), time, acceleration, and color variation.

Implementation
Engine rendering code segment:
Const static d3dvertexelement9 g_vertexelements [] =

{

{0, 0, d3ddecltype_float3, d3ddeclmethod_default, d3ddeclusage_position, 0 },

{1, 0, d3ddecltype_float2, d3ddeclmethod_default, d3ddeclusage_texcoord, 0 },

{2, 0, d3ddecltype_float2, d3ddeclmethod_default, d3ddeclusage_texcoord, 1 },

{3, 0, d3ddecltype_float3, d3ddeclmethod_default, d3ddeclusage_texcoord, 2 },

{4, 0, d3ddecltype_float3, d3ddeclmethod_default, d3ddeclusage_texcoord, 3 },

{5, 0, d3ddecltype_d3dcolor, d3ddeclmethod_default, d3ddeclusage_color, 0 },

D3ddecl_end ()

};

....

M_ppositionvb = pdevice-> createvertexbuffer (sizeof (v3dxvector3) * m_ivertexsize, m_pparent-> m_dwvbusage, m_pparent-> m_pool );

M_pdiffusevb = pdevice-> createvertexbuffer (sizeof (DWORD) * m_ivertexsize, m_pparent-> m_dwvbusage, m_pparent-> m_pool );

M_ptexcoordvb = pdevice-> createvertexbuffer (sizeof (v3uv2) * m_ivertexsize, m_pparent-> m_dwvbusage, m_pparent-> m_pool );

M_ptexcoordvb2 = pdevice-> createvertexbuffer (sizeof (v3uv2) * m_ivertexsize, m_pparent-> m_dwvbusage, m_pparent-> m_pool );

M_ptexcoordvb3 = pdevice-> createvertexbuffer (sizeof (v3dxvector3) * m_ivertexsize, m_pparent-> m_dwvbusage, m_pparent-> m_pool );

M_pindexbuffer = pdevice-> createindexbuffer (sizeof (Word) * m_iindexsize, m_pparent-> m_dwibusage, false, m_pparent-> m_pool );

... Fill in initial state data of particles...

 

...

// Set constants in HLSL

PEffect-> setmatrix ("matworldviewproj", (d3dxmatrix *) & mattransformation );

PEffect-> setvector ("rightvector", & rightvector );

PEffect-> setvector ("upvector", & upvector );

PEffect-> setvector ("time_colour", & timevec );

PEffect-> setvector ("acceleration", & acceleration );

...

M_pdevice-> drawindexedprimitive (d3dpt_trianglelist, 0, 0, prenderer-> m_ivertexsize, 0, prenderer-> m_ivertexsize/2 );

 

D3dfx code:
Struct vs_input

{

Float3 position: position;

Float2 tex0: texcoord0;

Float3 tex1: texcoord1; // upoffset, leftoffset, totaltimetolife

Float3 tex2: texcoord2; // Velocity

Float3 startdiffuse: color0;

};

 

Struct vs_output

{

Float4 position: position;

Float3 diffuse: color0;

Float2 tex0: texcoord0;

};

 

Matrix matworldviewproj; // world-view-proj Matrix

Float4 rightvector; // right vector

Float4 upvector; // up Vector

Float4 time_colour; // elasped time, Delta colour

Float4 acceleration;

 

Vs_output vs (const vs_input input)

{

Vs_output out = (vs_output) 0;

 

// Position = right + up + Pos;

Float4 right = rightvector * input. tex1.x;

Float4 up = upvector * input. tex1.y;

Float4 Pos = float4 (input. Position, 0) + right + up;

// Live time = fmod (elapsed time, totaltimetolife)

Float flivetime = fmod (time_colour.x, input. tex1.z );

 

// Position = POS + vt + 1/2 * V * T * t

Float4 deltavel = MUL (float4 (input. tex2, 0), flivetime );

Deltavel = deltavel + acceleration * flivetime;

// Deltavel. Y = deltavel. Y + time_colour.z;

Pos = POS + deltavel;

 

POs. W = 1.0;

Out. Position = MUL (Pos, matworldviewproj );

 

// Color

Out. Diffuse. x = input. startdiffuse. x + time_colour.y * flivetime;

Out. Diffuse. Y = input. startdiffuse. Y + time_colour.z * flivetime;

Out. Diffuse. z = input. startdiffuse. Z + time_colour.w * flivetime;

 

// Texcoord

Out. tex0 = input. tex0;

 

Return out;

}

 

Technique tec0

{

Pass P0

{

Vertexshader = compile vs_1_1 ();

Pixelshader = NULL;

}

}

Adjustable attributes in the editor
1. Default height

2, default width

3. the maximum number of particles indicates the number of particles that exist simultaneously.

4. Particle Orientation

5. Camera-oriented approach

6. Particle UP Vector

7. Whether it is a 2D Particle System

Transmitter

8. Supports all the unique attributes of the transmitter and transmitter (such as the inner ring size and outer ring size of the ring TRANSMITTER)

9, Angle

10, starting color

11. End color

12, Direction

13. minimum lifetime

14. Maximum Lifetime

15. Minimum speed

16, maximum speed

17, location

Effect Generator

18, supports color Attenuation

19. Linear external force is supported: "External Force" refers to acceleration A, which satisfies the formula S = vt + 1/2 * a * T * t. The Force mode does not work.


Editor
1, point emitter, color affector

 


2, box emitter, acceleration affector

 


Others
Todo
1. More emitter and affector are supported. Different FX files are dynamically compiled based on the emission and effect type owned by the particle to reduce the calculation workload)

Http://blog.csdn.net/fannyfish/archive/2006/06/22/823032.aspx

 

2. Support State-perspective particle system. (finished)

Http://blog.csdn.net/fannyfish/archive/2006/07/25/976753.aspx

3. Sorting and collision

4. You can set whether the particle is relative to the global coordinate system or the local coordinate system.

References
1, [shaderx3] Lutz latta, massively parallel participant systems on the GPU

2, [shaderx2] O 'Dell Hicks, screen-aligned Participant with minimal vertexbuffer locking

 

This article from the csdn blog, reproduced please indicate the source: http://blog.csdn.net/fannyfish/archive/2006/06/14/797512.aspx

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.