蘋果的 Metal 工程

最後更新：2017-10-01 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：log 紋理 dex eve 類型傳遞資料 gpu ima size

Basic Buffers

當向頂點著色器傳遞資料過多（大於 4096 位元組）時， setVertexBytes:length:atIndex: 方法不允許使用，應該使用 setVertexBytes:length:atIndex: 方法提高效能。
這時，參數應該是 MTLBuffer類型，可以被 GPU 訪問的記憶體。
_vertexBuffer.contents 方法返回可以被 CPU 訪問的記憶體介面，即這塊兒記憶體被 CPU 和 GPU 共用。

Basic Texturing

MTLPixelFormatBGRA8Unorm 的像素格式。?

2D 紋理的座標
?

Reading a texel is also known as sampling

Hello Compute

data-parallel computations using the GPU.

在 GPU 發展曆史中，平行處理的架構一直沒有變化，而處理核心的可程式化特性越來越強。這使得 GPU 從 fixed-function pipeline 轉向 programmable pipeline，也使得通用 GPU 編程 (GPGPU) 變得可行。

一個 MTLComputePipelineState 對象可以直接由一個 kernel function 產生。

 // Create a compute kernel functionid <MTLFunction> kernelFunction = [defaultLibrary newFunctionWithName:@"grayscaleKernel"];// Create a compute kernel_computePipelineState = [_device newComputePipelineStateWithFunction:kernelFunction

把映像分塊平行處理

    // Set the compute kernel's thread group size of 16x16    _threadgroupSize = MTLSizeMake(16, 16, 1);    // Calculate the number of rows and columsn of thread groups given the width of our input image.    //   Ensure we cover the entire image (or more) so we process every pixel.    _threadgroupCount.width  = (_inputTexture.width  + _threadgroupSize.width -  1) / _threadgroupSize.width;    _threadgroupCount.height = (_inputTexture.height + _threadgroupSize.height - 1) / _threadgroupSize.height;    // Since we're only dealing with a 2D data set, set depth to 1    _threadgroupCount.depth = 1;   [computeEncoder dispatchThreadgroups:_threadgroupCount               threadsPerThreadgroup:_threadgroupSize];

CPU and GPU Synchronization

CPU 和 GPU 是兩個非同步處理器，但是它們共用快取，因此需要在並行的同時避免同時讀寫資料。
?

在中，每一幀中，CPU 和 GPU 不會同時工作，雖然避免了同時讀寫資料，但是降低了效能。
?

在中，CPU 和 GPU 會同時讀寫相同的資料，引起競爭。
?

可以用多個緩衝區來達到提高效能和避免資料同時讀寫的問題。CPU 和 GPU 不同時讀寫相同的緩衝區。
當 GPU 執行完 command buffer 後，會調用這個 handler。

 [commandBuffer addCompletedHandler:^(id<MTLCommandBuffer> buffer){    dispatch_semaphore_signal(block_sema);}];

LOD with Function Specialization

level of detail (LOD)

細節越逼真，消耗的資源越多。因此要在效能和細節的豐富度之間做權衡。

if(highLOD){    // Render high-quality model}else if(mediumLOD){    // Render medium-quality model}else if(lowLOD){    // Render low-quality model}

但是使用 GPU 寫出上面的代碼的話，效能不高。GPU 可以並行的指令數依賴於為函數分配的寄存器數目。GPU 編譯器需要為函數分配可能用到的最大數目寄存器，即使有些分支永遠不可能執行。因此，分支語句顯著增加了需要的寄存器數目，並顯著降低了 GPU 的並行數目。

蘋果的 Metal 工程

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More