By fannyfish
Blog: http://blog.csdn.net/fannyfish
Amma@zsws.org
Introduction
The performance of the instant particle system is subject to two factors: fillrate and data transmission between CPU-GPU. Fill rate is the prime number of images that can be rendered by the GPU per frame. When the particles are large and many particles overlap, the performance will be significantly affected (for example, using the particles to simulate a large area of water mist and smoke ). The general practice is to first perform physical operations on the CPU and then transmit the calculation results to the GPU for rendering. When the number of particles is large (such as 100000), the time of calculation and the time of transmission of the CPU-GPU are unacceptable for real-time calculations (such as using particles to simulate large areas of rain and snow ).
In my project, a large number of particles are used: Special scene effects, Special attack effects, and even interface effects, which are everywhere. The current particle system uses the CPU for Physical Operations. Coupled with the physical engine, skeleton fusion, and game logic, CPU becomes a System Bottleneck. Every GPU frame in the game needs to wait for a short time. The better the graphics card, the worse the CPU. How to transfer the physical computing of the CPU to the GPU to achieve load balancing is the key to optimization.
Design
Stateless vs State-preserving)
1. State independence refers to the computation of particle data based only on attributes such as initial position and speed.
2. State correlation refers to the Data calculation of particles. It can be calculated based on the position, speed, and other attributes of the previous state.
Processing state-related particle systems on GPUs requires storing particle states with multiple textures, which requires a high level of graphics cards.
On the contrary, State-independent particle systems have low video card requirements and are relatively simple to implement. Therefore, we should first consider implementing such a particle system.
Relationship with the original Particle System
Using the original particle system involves the following steps:
1. Art creates particles through the editor
2. Specify the particle Renderer, transmitter, and effect generator in the editor. The Renderer is responsible for creating and deleting the rendering data (such as the billboard and model) corresponding to the particle and maintaining the rendering status. Includes the billboard Renderer, model Renderer, billboard tail Renderer, and model tail Renderer.
3. Use the client and submit the feedback to art adjustments.
Processing the particle system on the GPU will use different rendering data and states, so a new Renderer is derived: shader Renderer. In this way, the above steps will not change. Art only needs to be familiar with the new parameters of the Renderer.
Rendering data and status
Store the initial attributes of vertices on unused texture coordinates and color vertexbuffer. Including position, color, position on a particle Quad (upoffset, leftoffset), speed, and survival time.
Set constant registers, including the world-view-proj matrix, right vector up vector of the eyes (used to calculate different camera-oriented methods from upoffset and leftoffset), time, acceleration, and color variation.
Implementation
Engine rendering code segment:
Const static d3dvertexelement9 g_vertexelements [] =
{
{0, 0, d3ddecltype_float3, d3ddeclmethod_default, d3ddeclusage_position, 0 },
{1, 0, d3ddecltype_float2, d3ddeclmethod_default, d3ddeclusage_texcoord, 0 },
{2, 0, d3ddecltype_float2, d3ddeclmethod_default, d3ddeclusage_texcoord, 1 },
{3, 0, d3ddecltype_float3, d3ddeclmethod_default, d3ddeclusage_texcoord, 2 },
{4, 0, d3ddecltype_float3, d3ddeclmethod_default, d3ddeclusage_texcoord, 3 },
{5, 0, d3ddecltype_d3dcolor, d3ddeclmethod_default, d3ddeclusage_color, 0 },
D3ddecl_end ()
};
....
M_ppositionvb = pdevice-> createvertexbuffer (sizeof (v3dxvector3) * m_ivertexsize, m_pparent-> m_dwvbusage, m_pparent-> m_pool );
M_pdiffusevb = pdevice-> createvertexbuffer (sizeof (DWORD) * m_ivertexsize, m_pparent-> m_dwvbusage, m_pparent-> m_pool );
M_ptexcoordvb = pdevice-> createvertexbuffer (sizeof (v3uv2) * m_ivertexsize, m_pparent-> m_dwvbusage, m_pparent-> m_pool );
M_ptexcoordvb2 = pdevice-> createvertexbuffer (sizeof (v3uv2) * m_ivertexsize, m_pparent-> m_dwvbusage, m_pparent-> m_pool );
M_ptexcoordvb3 = pdevice-> createvertexbuffer (sizeof (v3dxvector3) * m_ivertexsize, m_pparent-> m_dwvbusage, m_pparent-> m_pool );
M_pindexbuffer = pdevice-> createindexbuffer (sizeof (Word) * m_iindexsize, m_pparent-> m_dwibusage, false, m_pparent-> m_pool );
... Fill in initial state data of particles...
...
// Set constants in HLSL
PEffect-> setmatrix ("matworldviewproj", (d3dxmatrix *) & mattransformation );
PEffect-> setvector ("rightvector", & rightvector );
PEffect-> setvector ("upvector", & upvector );
PEffect-> setvector ("time_colour", & timevec );
PEffect-> setvector ("acceleration", & acceleration );
...
M_pdevice-> drawindexedprimitive (d3dpt_trianglelist, 0, 0, prenderer-> m_ivertexsize, 0, prenderer-> m_ivertexsize/2 );
D3dfx code:
Struct vs_input
{
Float3 position: position;
Float2 tex0: texcoord0;
Float3 tex1: texcoord1; // upoffset, leftoffset, totaltimetolife
Float3 tex2: texcoord2; // Velocity
Float3 startdiffuse: color0;
};
Struct vs_output
{
Float4 position: position;
Float3 diffuse: color0;
Float2 tex0: texcoord0;
};
Matrix matworldviewproj; // world-view-proj Matrix
Float4 rightvector; // right vector
Float4 upvector; // up Vector
Float4 time_colour; // elasped time, Delta colour
Float4 acceleration;
Vs_output vs (const vs_input input)
{
Vs_output out = (vs_output) 0;
// Position = right + up + Pos;
Float4 right = rightvector * input. tex1.x;
Float4 up = upvector * input. tex1.y;
Float4 Pos = float4 (input. Position, 0) + right + up;
// Live time = fmod (elapsed time, totaltimetolife)
Float flivetime = fmod (time_colour.x, input. tex1.z );
// Position = POS + vt + 1/2 * V * T * t
Float4 deltavel = MUL (float4 (input. tex2, 0), flivetime );
Deltavel = deltavel + acceleration * flivetime;
// Deltavel. Y = deltavel. Y + time_colour.z;
Pos = POS + deltavel;
POs. W = 1.0;
Out. Position = MUL (Pos, matworldviewproj );
// Color
Out. Diffuse. x = input. startdiffuse. x + time_colour.y * flivetime;
Out. Diffuse. Y = input. startdiffuse. Y + time_colour.z * flivetime;
Out. Diffuse. z = input. startdiffuse. Z + time_colour.w * flivetime;
// Texcoord
Out. tex0 = input. tex0;
Return out;
}
Technique tec0
{
Pass P0
{
Vertexshader = compile vs_1_1 ();
Pixelshader = NULL;
}
}
Adjustable attributes in the editor
1. Default height
2, default width
3. the maximum number of particles indicates the number of particles that exist simultaneously.
4. Particle Orientation
5. Camera-oriented approach
6. Particle UP Vector
7. Whether it is a 2D Particle System
Transmitter
8. Supports all the unique attributes of the transmitter and transmitter (such as the inner ring size and outer ring size of the ring TRANSMITTER)
9, Angle
10, starting color
11. End color
12, Direction
13. minimum lifetime
14. Maximum Lifetime
15. Minimum speed
16, maximum speed
17, location
Effect Generator
18, supports color Attenuation
19. Linear external force is supported: "External Force" refers to acceleration A, which satisfies the formula S = vt + 1/2 * a * T * t. The Force mode does not work.
Editor
1, point emitter, color affector
2, box emitter, acceleration affector
Others
Todo
1. More emitter and affector are supported. Different FX files are dynamically compiled based on the emission and effect type owned by the particle to reduce the calculation workload)
Http://blog.csdn.net/fannyfish/archive/2006/06/22/823032.aspx
2. Support State-perspective particle system. (finished)
Http://blog.csdn.net/fannyfish/archive/2006/07/25/976753.aspx
3. Sorting and collision
4. You can set whether the particle is relative to the global coordinate system or the local coordinate system.
References
1, [shaderx3] Lutz latta, massively parallel participant systems on the GPU
2, [shaderx2] O 'Dell Hicks, screen-aligned Participant with minimal vertexbuffer locking
This article from the csdn blog, reproduced please indicate the source: http://blog.csdn.net/fannyfish/archive/2006/06/14/797512.aspx