Unity Shader: Optimizing GPU Code-replacing conditional statements such as if else with step ().

Source: Internet
Author: User
Tags mul pow switch case
ordinary cartoon coloring shader:

First look at a shader, cartoon coloring. Since cartoon coloring needs to be judged by different rendering areas, it is more suitable to do cases.

Shader "Unlit/newtoonshading" {Properties {_shininess ("shininess", float) =1 _edge ("Edge scale", RA
    Nge (0,1)) =0.2 _finalcolor ("Final Color", color) = (0.5,0.5,0.5,1) _edgecolor ("Edge color", color) = (0,0,0,1) } subshader {tags {"rendertype" = "Opaque"} LOD Pass {tags {"

            Lightmode "=" Vertex "} cgprogram #pragma Vertex vert #pragma fragment Frag
                #include "unitycg.cginc" struct AppData {float4 vertex:position;
            FLOAT3 Normal:normal;

            };
                struct V2F {float4 vertex:sv_position;
                FLOAT3 n:texcoord0;
                FLOAT3 L:texcoord1;
                FLOAT3 H:texcoord2;
            FLOAT3 V:texcoord3;

            };
            float _shininess;
            float _edge;
            FLOAT4 _finalcolor; Float4_edgecolor;

            FLOAT4 _lightposition_world;

                v2f Vert (AppData v) {v2f o= (v2f) 0;

                FLOAT4 Worldpos=mul (Unity_objecttoworld,v.vertex);

                FLOAT4 Lightpos_world=mul (unity_matrix_i_v,unity_lightposition[1]);
                O.n=normalize (Mul (unity_objecttoworld,v.normal));

                O.l=normalize (LIGHTPOS_WORLD-WORLDPOS.XYZ);
                O.v=normalize (_WORLDSPACECAMERAPOS-WORLDPOS.XYZ);

                O.h=normalize (O.L+O.V);
                O.vertex = Unityobjecttoclippos (V.vertex);
            return o;
                } fixed4 Frag (v2f i): sv_target {i.n=normalize (I.N);
                I.l=normalize (I.L);
                I.h=normalize (I.H);

                I.v=normalize (I.V);
                FLOAT4 Kd=_finalcolor;
                FLOAT4 ks=0;
                Fixed4 Col;

          Edge determination float Edge=max (dot (i.n,i.v), 0);      if (Edge<_edge) {return _edgecolor;

                }//Dark Light decision float Diffuselight=max (dot (I.N,I.L), 0);               if (diffuselight<=0.1f) {//Dark light area kd*=0.5f;                   Bright area brightness halved ks=0;
                    No highlight//if diffuselight<=0, the n,h angle is greater than 90 ', the eye or light source in the material surface behind Col=kd+ks;
                return col;

                }//Highlight decision float SPECULARLIGHT=POW (max (dot (i.n,i.h), 0), _shininess);     if (specularlight>=0.95f) {ks=float4 (1.0f,1.0f,1.0f,0.0f);
                High light} col=kd+ks;
            return col;
 } ENDCG}}}


(Above: Rendering results) the principle of optimization:

In the fragment shader, I was optimized for the logic of normal CPU programming, for example, if (Edge<_edge) {return _edgecolor;}, if this pixel is judged to be an edge, then the edge color is returned directly, then the subsequent operation is no longer necessary. And so on and then use if else to separate the high-light, light, dark areas of the judgment. However, this optimization is not valid for GPU programming. Because for the GPU, each vertex pixel is doing a lot of parallel operations, each fragment shader is running synchronously, the edge zone pixel fragment shader is the first to return, but it still waits for the last return pixel. The next operation is performed only if all pixels are fully computed, and in the fragment shader, each fragment processor operates on hundreds of pixels per instruction, and if some fragments (pixels) take one branch and some fragment does not take another branch, all fragments perform two branches. But only the registers should be written on the branch that each fragment should take. In addition, process control operations such as IF/ENDIF have a higher overhead (4 clock cycles, Geforce6) Modify 1 _{Modify 1}. Therefore, in GPU programming, conditional statements such as if else, switch case, and too complex logic are not recommended. Accordingly, it is possible to replace the function with step () and construct the conditional statement with the thought of ladder function. This way, all threads execute exactly the same code, which is useful in many ways for the GPU. Optimized shader:

The step () function version above shader:

Shader "Unlit/newtoonshading_stepversion" {Properties {_shininess ("shininess", float) =1 _edge ("Ed GE scale ", range (0,1)) =0.2 _finalcolor (" Final color ", color) = (0.5,0.5,0.5,1) _edgecolor (" Edge color ", color) =
            (0,0,0,1)} subshader {Tags {"rendertype" = "Opaque"} LOD Pass {

            Tags {"Lightmode" = "Vertex"} cgprogram #pragma Vertex vert #pragma fragment Frag
                #include "unitycg.cginc" struct AppData {float4 vertex:position;
            FLOAT3 Normal:normal;

            };
                struct V2F {float4 vertex:sv_position;
                FLOAT3 n:texcoord0;
                FLOAT3 L:texcoord1;
                FLOAT3 H:texcoord2;
            FLOAT3 V:texcoord3;

            };
            float _shininess;
            float _edge;
       FLOAT4 _finalcolor;     FLOAT4 _edgecolor;

            FLOAT4 _lightposition_world;

                v2f Vert (AppData v) {v2f o= (v2f) 0;

                FLOAT4 Worldpos=mul (Unity_objecttoworld,v.vertex);

                FLOAT4 Lightpos_world=mul (unity_matrix_i_v,unity_lightposition[1]);
                O.n=normalize (Mul (unity_objecttoworld,v.normal));

                O.l=normalize (LIGHTPOS_WORLD-WORLDPOS.XYZ);
                O.v=normalize (_WORLDSPACECAMERAPOS-WORLDPOS.XYZ);

                O.h=normalize (O.L+O.V);
                O.vertex = Unityobjecttoclippos (V.vertex);
            return o;
                } fixed4 Frag (v2f i): sv_target {i.n=normalize (I.N);
                I.l=normalize (I.L);
                I.h=normalize (I.H);

                I.v=normalize (I.V);
                FLOAT4 Kd=_finalcolor;
                FLOAT4 ks=0;
                Fixed4 Col; Edge determination float Edge=max (dot (i.n,i.v), 0);

                Edge=step (Edge,_edge);

                if (Edge<=_edge) edge=1, else edge=0 _edgecolor*=edge;

                High Light decision float SPECULARLIGHT=POW (max (dot (i.n,i.h), 0), _shininess);        Specularlight=step (0.95f,specularlight); If specularlight>=0.95f specularlight=1 else = 0//Dark Light judgment float Diffuselight=max (dot (i .

                N,I.L), 0); Diffuselight=step (0.1f,diffuselight);      if (diffuselight>=0.1f) diffuselight=1 else diffuselight=0 ks=specularlight*diffuselight; If diffuselight=0, ks=0;      else Ks=specularlight (1 or 0) diffuselight=diffuselight*0.5f+0.5f; Change 1 or 0 to 1 or 0.5//0.5kd or Kd 1or0 1or0 0or1 0orEdgeColor col= (kd*        
                DIFFUSELIGHT+KS) * (1.0f-edge) +_edgecolor;
            return col;
 } ENDCG}}}
For example, explain:

In HLSL, step (A, a, b) returns 1 when B>=a is not, otherwise returns 0, in other words, 1 when a<=b is returned, otherwise 0. Therefore, the position of a or B can be inserted flexibly, and the comparison of less than or greater is done. Since the return value is 0 or 1, it cannot directly override if else logic, but it can be done through the transformation algorithm, for example:

                Edge Determination
                Float Edge=max (dot (i.n,i.v), 0);

                if (Edge<_edge) {
                    return _edgecolor;
                }

In the above, the directly returned _edgecolor will change to a 000 or an RGB variable that retains its own value, and the edge becomes 0 or 1 and participates in the final color calculation in the final calculation step:

                Edge Determination
                Float Edge=max (dot (i.n,i.v), 0);

                Edge=step (Edge,_edge); if (Edge<=_edge) edge=1, else edge=0

                _edgecolor*=edge;
                //... Intermediate process slightly ...
                            0.5Kd or Kd  1or0     1or0    0or1    0orEdgeColor    
                col= (kd*diffuselight+ks) * (1.0f-edge) +_ Edgecolor;

If this pixel is an edge and the edge is 1, then in the final color calculation, whatever the other variable, it becomes a 0+_edgecolor value, which is both the edge color. If this pixel is non-marginal, the edge is 0,_edgecolor 0, then the final color is "other colors" *1+0, and the edges color is rejected.

And so on, the return value of the high-light, bright, and dark areas of the original is changed into a variable into the final color calculation. For specific reasoning analysis, please use the step () to follow the comments on each line. Test

Two versions of FPS small fluctuations are basically the same, it is possible that the calculation is too small or this shader content is not sensitive to this issue, but at least prove that if the other version of the CPU thinking ahead of time to return the relative to the Step () version of all calculations are no advantage. The first may be that the calculation is too small to cause bottlenecks to performance. The second possibility is that the step version eliminates 3 if judgments, but adds 3 step functions and several calculations, which are too weak to counteract. Modify 2 _{Modify 2} assembly version:

Post-assembly fragment shader code (partial interception):
If Else version:

   0:DP3 r0.x, V1.xyzx, V1.xyzx 1:rsq r0.x, r0.x 2:mul r0.xyz, r0.xxxx, V1.xyzx 3:dp3 r0.w, V4.xyzx, V4.xyzx 
   4:RSQ R0.W, R0.W 5:mul r1.xyz, r0.wwww, V4.xyzx 6:dp3 r0.w, R0.xyzx, R1.xyzx 7:max r0.w, R0.W, L (0.000000)  8: lt r0.w, r0.w, cb0[2].y
 9: if_nz r0.w
 10: mov o0.xyzw, cb0[4].xyzw
 11: ret 
 12: endif 
 13: dp3 r0.w,  V2.xyzx, V2.xyzx 14:rsq r0.w, R0.W 15:mul r1.xyz, r0.wwww, V2.xyzx 16:dp3 r0.w, R0.xyzx, R1.xyzx 17:max R0.W, R0.W, L (0.000000) 18:ge R0.W, L (0.100000), R0.W 19:if_nz r0.w 20:mul o0.xyzw, CB0[3].XYZW, L (0.500000, 0.50000 0, 0.500000, 0.500000) 21:ret 22:endif 23:dp3 r0.w, V3.xyzx, V3.xyzx 24:rsq r0.w, R0.W 25:mul r1.xyz, R 0.wwww, V3.xyzx 26:dp3 r0.x, R0.xyzx, R1.xyzx 27:max r0.x, r0.x, L (0.000000) 28:log r0.x, r0.x 29:mul r0.x, R0 . x, cb0[2].x 30:exp r0.x, r0.x 31:ge r0.x, r0.x, L (0.950000) 32:and r0.xyzw, R0.xxxx, L (0x3f800000, 0x3f800000, 0 x3f800000, 0) 33:addO0.XYZW, R0.XYZW, CB0[3].XYZW 34:ret  

Step () Version:

   0:DP3 r0.x, V3.xyzx, V3.xyzx
   1:rsq r0.x, r0.x
   2:mul r0.xyz, r0.xxxx, V3.xyzx
   3:dp3 r0.w, V1.xyzx, v1.xyzx< C4/>4:RSQ R0.W, R0.W
   5:mul r1.xyz, r0.wwww, V1.xyzx 6:dp3
   r0.x, R1.xyzx, R0.xyzx
   7:max r0.x, r0.x, L (0.000
   8:log r0.x, r0.x
   9:mul r0.x, r0.x, cb0[2].x
  10:exp r0.x, r0.x
  11:ge r0.x, r0.x, L (0.950000) 
  12:DP3 r0.y, V2.xyzx, V2.xyzx
  13:rsq r0.y, r0.y
  14:mul r0.yzw, r0.yyyy, v2.xxyz
  15:dp3 r0.y, R1.xyzx, R0.yzwy
  16:max r0.y, R0.y, L (0.000000)
  17:ge r0.y, R0.y, L (0.100000)
  18:and r0.xz, R0.xxyx, L (0x3f80000 0, 0, 0x3f800000, 0)
  19:MOVC r0.y, R0.y, L (1.000000), L (0.500000)
  20:mul r0.x, r0.z, r0.x
  21:mad r0.xyzw , CB0[3].XYZW, r0.yyyy, r0.xxxx
  

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.