Hadoop 3.1.1-yarn-Use FPGA

Source: Internet
Author: User
Prerequisites for using FPGA on Yarn
  • Yarn currently only supports FPGA resources released through intelfpgaopenclplugin
  • The driver of the supplier must be installed on the machine where the yarn nodemanager is located and the required environment variables must be configured.
  • Docker containers are not supported yet.
Configure FPGA Scheduling

InResource-types.xml,Add the following configuration

<configuration>  <property>     <name>yarn.resource-types</name>     <value>yarn.io/fpga</value>  </property></configuration>

In the yarn-site.xml,DominantresourcecalculatorMust be configured to enable FPGA scheduling and isolation. Use the following parameters in the capacity-scheduler.xml to configureDominantresourcecalculator:

Parameters Default Value
Yarn. schedity. Capacity. Resource-calculator Org. Apache. hadoop. yarn. util. Resource. dominantresourcecalculator
FPGA isolation Yarn-site.xml
<property>  <name>yarn.nodemanager.resource-plugins</name>  <value>yarn-io/fpga</value></property>

This enables the FPGA isolation module on nodemanager.

If the preceding parameters are configured, yarn automatically detects and configures FPGA. If the Administrator has special requirements, the following parameters must beYarn-site.xml.

1) Running FPGA Devices

Parameters Default Value
Yarn. nodemanager. resource-plugins.fpga.allowed-fpga-devices Auto

FPGA Devices managed by yarn nodemanager are separated by commas. The number of GPU cards will be reported to ResourceManager for scheduling. The default value auto indicates that yarn will automatically discover the GPU card from the system.

If the Administrator only wants some FPGA Devices to be managed by yarn, manually specify the available FPGA Devices. Because currently only one master device number can be configured in the c-e.cfg, the FPGA device is identified by the second device number. For intel devices, you can run the aocl diagnose command and parse the uevent corresponding to the device name to obtain the device number.

2) discover executable programs of FPGA Devices

Parameters Default Value
Yarn. nodemanager. resource-plugins.fpga.path-to-discovery-executables  

If yarn. nodemanager. Resource. FPGA. Allowed-FPGA-devices = auto is specified, yarn nodemanager executes the executable program detected by FPGA (currently only intelfpgaopenclplugin is supported) to collect FPGA information. If the value is blank (default), yarn nodemanager will search for it based on the supplier's plug-in options. For example, intelfpgaopenclplugin searches for aocl information from the directory of the Environment alteraoclsdkroot.

3) FPGA plug-in used

Parameters Default Value
Yarn. nodemanager. resource-plugins.fpga.vendor-plugin.class Org. Apache. hadoop. yarn. server. nodemanager. containermanager. resourceplugin. FPGA. intelfpgaopenclplugin

Currently, only intel opencl SDK for FPGA is supported. The IP Program (. aocx file) running on FPGA must be provided based on opencl of Intel Platform.

4) cgroups mounting

FPGA isolation uses the cgroup device controller to isolate FPGA Devices. To automatically mount devices to cgroups, add the following configurationsYarn-site.xmlFile. Otherwise, the Administrator must manually create device subdirectories to use this function.

Parameters Default Value
Yarn. nodemanager. linux-container-executor.cgroups.mount True

For more information about how yarn uses cgroup, see use cgroups.

Container-executor.cfg

You usually need to add the following configuration to the container-executor.cfg. FPAG. Major-device-number and allowed-device-minor-numbers are optional parameters that indicate the allowed FPGA Devices.

[FPGA] module. Enabled = truefpga. Major-device-number = # The FPGA master device number, which is 246 by default. We strongly recommend that you set this parameter. FPGA. Allowed-device-minor-numbers =## unique device numbers separated by commas (,). A null value indicates all FPGA Devices managed by yarn.

If you want to run the FPGA Program in a non-docker environment:

[Cgroups] # system cgroup root directory (cannot be blank or "/") root =/cgroup # yarn cgroup parent directory yarn-hierarchy = Yarn
Use distributed-shell + FPGA

In addition to memory and virtual cores, the distributed shell program can also apply for more resource types.

Run the distributed shell program without using docker (. bashrc configures SDK-related environment variables ):

yarn jar <path/to/hadoop-yarn-applications-distributedshell.jar>   -jar <path/to/hadoop-yarn-applications-distributedshell.jar>   -shell_command "source /home/yarn/.bashrc && aocl diagnose"   -container_resources memory-mb=2048,vcores=2,yarn.io/fpga=1   -num_containers 1

For started tasks, you will see the following output in the log:

aocl diagnose: Running diagnose from /home/fpga/intelFPGA_pro/17.0/hld/board/nalla_pcie/linux64/libexec------------------------- acl0 -------------------------Vendor: Nallatech ltdPhys Dev Name  Status   Informationaclnalla_pcie0Passed   nalla_pcie (aclnalla_pcie0)                       PCIe dev_id = 2494, bus:slot.func = 02:00.00, Gen3 x8                       FPGA temperature = 54.4 degrees C.                       Total Card Power Usage = 32.4 Watts.                       Device Power Usage = 0.0 Watts.DIAGNOSTIC_PASSED---------------------------------------------------------

Specify the IP address that yarn needs to configure before starting the container

For FPGA resources, containers can use the environment variable requested_fpga_ip_id to allow yarn to download and assign an IP address to it. For example, requested_fpga_ip_id = "matrix_mul" will trigger the search for the IP file (. aocx file) whose name contains matirx_mul in the local directory of the container. The program must first distribute the file to each container. Currently, only one IP address can be allocated to all devices. If the environment variable is not set, the user program searches for the IP file. It is important to note that it is not necessary to download the IP address and re-program it in advance in yarn, because the opencl program can find the IP address file and re-program the device at runtime. However, yarn completes this step for containers to achieve the fastest re-programming.

Hadoop 3.1.1-yarn-Use FPGA

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.