Start with compute shader I like vertex/fragment shaders simple, they just do one thing (the vertex and color output to the screen), they do very well, but sometimes this simple limit you, when your CPU life cycle those matrices, Figure out and store on the map ...
Compute Shader solves this problem, and I will explain its basics today, and I will tell you through an example of unity myself that using structured buffer data works in Compute Shader
Compute shader can be used to control the position of the particle swarm.
What is compute shader? Simply put, compute shader is a program executed in the GPU that does not need to manipulate grid mesh and map texture data, work in OpenGL or DirectX storage (unlike OpenCL has its own storage space), and can output buffering or mapping of data To perform shared storage across threads.
Now unity only supports DIRECTX11 's compute shader, and Apple Mac users can use them if they wait until the OpenGL4.3.
This means that this is a tutorial for Windows only. If it's not a Windows machine, it might not work.
What are the pros and cons of using it? Two words: Mathematics and parallelization. Any problem contains the same (no conditional transfer conditional branching
) for each element of the calculation
The settings in the data are perfect. The more calculations you make in the GPU, the more benefits you'll get.
The conditional transfer conditional branching does degrade your performance because the GPU can't do this, but it's not the same as writing a vertex and fragment shaders, so if you've ever written shader experience it will be very simple.
But it also has a potential problem. It takes time to get the storage into your CPU from the GPU, which is like your bottleneck when compute shader is working. Ensuring that your kernel kernel is optimized to work in the smallest cache can alleviate this problem, but this problem will never be eliminated.
Do you understand me? Well, let's get started. When we use DirectX, Unity's compute shader need to use the HLSL programming language, but he can hardly discern other shader languages, so if you can write CG or GLSL you will use it well.
The first thing you need to do is create a new compute shader. There is one option in Unity's engineering panel (BO main note: Project->create->compute Shader), which is simple. If you open it, you'll see a code that's automatically generated like this (uncomment)
#pragma kernel csmainrwtexture2d<float4> Result; [Numthreads (8,8,1)]void csmain (Uint3 id:sv_dispatchthreadid) { Result[id.xy] = FLOAT4 (Id.x & Id.y, (Id.x & 1 5)/15.0, (ID.Y & 15)/15.0, 0.0);}
This is a good example of figuring out compute shader, so we look at the line
#pragma kernel Csmain
This specifies the entry function for this program (compute shader's main function), a compute shader can define many functions that you can invoke from the script.
Rwtexture2d<float4> Result;
This declares a variable that contains the data that the shader program will use. We don't need to use mesh mesh data, you need to make a clear declaration like this, your compute shader need to write what data to read. The "RW" in front of the data type name specifies that the shader can read and write.
[Numthreads (8,8,1)]
This line specifies the size of the thread group created by compute shader. The GPU leverages a lot of parallel processing so that some of the threads created by the GPU can run concurrently. Thread groups specify how to organize line derivations spawned threads, in the above code, we specify that we want each group of thread groups to contain 64 threads, which can be used like a two-dimensional array.
Determining the optimal size of your thread group is a complex issue that has a lot to do with your target hardware. In general, think of your GPU as a stream processor, each capable of executing x threads at the same time, each processor running a group of threads, so theoretically you want your thread group to include X threads to take advantage of the processor. I use this value to prepare to control them, so instead of suggesting how to better set these values, want to know more you can go to Google.
Placing the shader is a very common code. The core function determines--the thread-based ID execution function, which pixel should be used, and write some data in the result buffer
Actually running shader obviously we can't add compute shader to the grid to run the export, especially if it doesn't have grid data. Compute shader is actually going to be called with a script that looks like this:
Public Computeshader shader;void Runshader () {int kernelhandle = shader. Findkernel ("Csmain"); Rendertexture tex = new Rendertexture (256,256,24); tex.enablerandomwrite = True;tex. Create (); shader. SetTexture (Kernelhandle, "Result", Tex); shader. Dispatch (Kernelhandle, 256/8, 256/8, 1);}
There are some things marked here. First set the Enablerandomwrite tag before you create the rendertexture. This gives your compute shader the right to write the decal. If you do not set this tag you will not be able to use the decal as a write target in your shader.
Then we need a way to determine what functions we want to invoke in compute shader. The Findkernel function uses the name of a string class with the same name as the core kernel we set up in the compute shader. Remember that in a compute shader file, you can have multiple core kernel (functions).
Computeshader.settexture Let's upload the CPU data to the GPU. Moving data in different storage spaces creates a delay in your program, and the more you pass the value, the more noticeable the delay. For this reason, if you want to perform compute shader per frame, you need to optimize the actual operation of the data.
Three integers specify the number of thread groups that need to be generated by dispatch, specifying the size of each thread group in the numthreads block in compute shader, so in the example above, the total number of threads we generate is:
32*32 a thread group * 64 threads per group = 65,536 threads.
This ends the equivalent of one pixel in a render texture, so that the scene core function call can manipulate only one pixel.
So now we know to write a compute shader can manipulate the map memory, let's see what we can get him to do.
Structured buffers very surprised to decorate the map data much like Vert/frag shader, it's time to release our GPU and let him manipulate the data, yes, that's doable and sounds great.
A structured Buffer is simply a data type of an array of data. You can set conditional branching to be floating point or integral type. You can declare in Comepute shader like this:
Structuctedbuffer<float> Floatbuffer; Rwstructuredbuffer<int> Readwriteintbuffer;
A data type can also be a struct, and in the second example of this article
In our example, we will pass a set of points in our compute shader, each of which has a matrix that we transform. We can do it with two separate buffers (one is vector3s and the other is matrix4x4s), but it's easy to handle a point or matrix in a struct.
struct vecmatpair{public Vector3 point;public matrix4x4 Matrix;}
We also need to define the data type in shader, but HLSL does not have a matrix4x4 or Vector3 type.
However, it has the same data type as the storage layout. It looks like this at the end of our shader:
#pragma kernel multiplystruct vecmatpair{float3 pos;float4x4 mat;}; Rwstructuredbuffer<vecmatpair> DataBuffer; [Numthreads (16,1,1)]void Multiply (Uint3 id:sv_dispatchthreadid) { Databuffer[id.x].pos = Mul (databuffer[id.x]. Mat, float4 (Databuffer[id.x].pos, 1.0));}
Note that our thread groups are now organized into an array of spaces. This has no performance impact on the number of dimensions of the thread group, so you can choose freely in your program.
In our previous map example, it's a little difficult to build a structured buffer in a script. For a buffer, you need to specify the byte size of an element in this buffer, and the stored information and data itself are in a compute buffer object. In our structural example, the size of the bytes is just the size of the float value we store (3 vector,16 matrix) multiplied by the size of a float (4bytes), for a total of 76bytes in a struct. Set in compute shader He looks like this:
Public Computeshader shader;void Runshader () {vecmatpair[] data = new Vecmatpair[5];//initialize data herecomputebuffer Buffer = new Computebuffer (data. Length, kernel); int = shader. Findkernel ("Multiply"); shader. SetBuffer (Kernel, "databuffer", buffer); shader. Dispatch (kernel, data. Length, ();}
Now we need to get this improved data back into a format that we can use in scripts. Not like our top render
As an example of texture, structured buffers needs to be explicitly transferred from the GPU's storage space to the CPU. In my experience, when using compute shader, you have to be aware that this is a maximum performance drain and I find only one way to mitigate it is to optimize your buffering, so they want to be as small as possible until it is available, and you only need it when you pull the data out of your shader.
Getting data into the actual code in your CPU is simple. All you need is a で array with the same data type and the same size and write buffered data, if we improve the above script to write data to a second array, it looks like this:
Public Computeshader shader;void Runshader () {vecmatpair[] data = new VECMATPAIR[5]; vecmatpair[] Output = new Vecmatpair[5];//initialize data herecomputebuffer buffer = new Computebuffer (data. Length, kernel); int = shader. Findkernel ("Multiply"); shader. SetBuffer (Kernel, "databuffer", buffer); shader. Dispatch (kernel, data. Length, n); buffer. GetData (output);}
You need to look at the profiler to get the maximum data you have to move to the CPU at the exact time, but I find it really consumes
Original link: http://kylehalladay.com/blog/tutorial/2014/06/27/Compute-Shaders-Are-Nifty.html
Translation: Wolf96 http://blog.csdn.net/wolf96
Unity3d from zero compute shader