Cuda-accelerated LUT converter for di Workflow

Source: Internet
Author: User
Abbreviations
    • Di Digital Intermediate
    • SP stream processor
    • CM Color Management
    • Lut lookup-table
    • NF negtive Film
    • PCS profile connection Space
    • Imsr input medium to scene-referred Transform
    • Srom scene-referred to output medium Transform
    • Omrd output medium to reference display Transform
    • Rdpd reference display to Preview display Transform
Recommended Reading Materials
    • A proposal for openexr Color Management
    • Color and mastering for digital cinema
    • Digital Color Management
    • Digital color imaging Handbook
    • Digital video and HDTV algorithms and interfaces

After di introduces the film production process, you can directly use digital devices and software to modify the original images collected by the film. The first step is the imsr process, which generates DPX for storing the film to the San using a scanning device such as Thomson's Grass valley spirit 4 k film plugin. However, there are differences between different devices, that is, the ing between the NF color density value and the DPX logarithm space is often different, and this is a strictly device-related parameter. The correction to this process is often played by the 1D LUT inside the device. The color space transformation directly uses a matrix or 3D LUT. Of course, these can be found in cm-related documents.

The advantage of LUT is that it is fast and portable. In the later stage, the computing nodes inside the software can often establish LUT through the input value, but the mesh itself cannot be copied. For example, the shake synthetic mesh cannot export and import data from fusion, although they are essentially extremely simple Dag nodes. After a LUT is generated, you only need to perform simple text processing operations to achieve mutual universality. It is expected that the general LUT format recommended by ampas/ASC will be popularized in the future, and the LUT between later software will be applicable, in this way, the efficiency of CM is higher in the process of using a variety of later software in combination with production.

The first is the numerical conversion process, that is, the integer is converted to a floating point number according to the range of input and output values set by LUT. In this process, the CPU that has been highly optimized (mainly OpenMP and SIMD)CodeThe execution speed exceeds the GPU because the CPU clock speed is high. Unlike GPU, it is divided into two parts: core frequency and SP frequency. The existing openexr fp16 is inefficient due to lack of native support from hardware and compiler. Then the 3D LUT value is generated. Because the volume of 3D LUT is relatively large compared with 1D Lut, the space occupied by a 3D LUT such as 32 ^ 3*3*4 is 0.375 MB. Here is the test comparison. The result of Cuda calculation is on the left, and the result of Houdini apprentice calculation is on the right. The difference between the two is very small by comparing the images of the original HD version. This difference comes from the GPU difference, and also from the downsample produced by the sampling transform function when the LUT source (the LUT is generated from Houdini) outputs, which improves the 3D LUT size, however, there is also an upper limit.

1DLut

3 dlut

Note: This test image is cropped from Kodak digital glad test image. The original format is 10bit log DPX.

I have both Win32 and x64 Linux versions. If you are interested, you can ask for them.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.