Prerequisites for using FPGA on Yarn
- Yarn currently only supports FPGA resources released through intelfpgaopenclplugin
- The driver of the supplier must be installed on the machine where the yarn nodemanager is located and the required environment variables must be configured.
- Docker containers are not supported yet.
Configure FPGA Scheduling
InResource-types.xml,Add the following configuration
<configuration> <property> <name>yarn.resource-types</name> <value>yarn.io/fpga</value> </property></configuration>
In the yarn-site.xml,DominantresourcecalculatorMust be configured to enable FPGA scheduling and isolation. Use the following parameters in the capacity-scheduler.xml to configureDominantresourcecalculator:
Parameters |
Default Value |
Yarn. schedity. Capacity. Resource-calculator |
Org. Apache. hadoop. yarn. util. Resource. dominantresourcecalculator |
FPGA isolation Yarn-site.xml
<property> <name>yarn.nodemanager.resource-plugins</name> <value>yarn-io/fpga</value></property>
This enables the FPGA isolation module on nodemanager.
If the preceding parameters are configured, yarn automatically detects and configures FPGA. If the Administrator has special requirements, the following parameters must beYarn-site.xml.
1) Running FPGA Devices
Parameters |
Default Value |
Yarn. nodemanager. resource-plugins.fpga.allowed-fpga-devices |
Auto |
FPGA Devices managed by yarn nodemanager are separated by commas. The number of GPU cards will be reported to ResourceManager for scheduling. The default value auto indicates that yarn will automatically discover the GPU card from the system.
If the Administrator only wants some FPGA Devices to be managed by yarn, manually specify the available FPGA Devices. Because currently only one master device number can be configured in the c-e.cfg, the FPGA device is identified by the second device number. For intel devices, you can run the aocl diagnose command and parse the uevent corresponding to the device name to obtain the device number.
2) discover executable programs of FPGA Devices
Parameters |
Default Value |
Yarn. nodemanager. resource-plugins.fpga.path-to-discovery-executables |
|
If yarn. nodemanager. Resource. FPGA. Allowed-FPGA-devices = auto is specified, yarn nodemanager executes the executable program detected by FPGA (currently only intelfpgaopenclplugin is supported) to collect FPGA information. If the value is blank (default), yarn nodemanager will search for it based on the supplier's plug-in options. For example, intelfpgaopenclplugin searches for aocl information from the directory of the Environment alteraoclsdkroot.
3) FPGA plug-in used
Parameters |
Default Value |
Yarn. nodemanager. resource-plugins.fpga.vendor-plugin.class |
Org. Apache. hadoop. yarn. server. nodemanager. containermanager. resourceplugin. FPGA. intelfpgaopenclplugin |
Currently, only intel opencl SDK for FPGA is supported. The IP Program (. aocx file) running on FPGA must be provided based on opencl of Intel Platform.
4) cgroups mounting
FPGA isolation uses the cgroup device controller to isolate FPGA Devices. To automatically mount devices to cgroups, add the following configurationsYarn-site.xmlFile. Otherwise, the Administrator must manually create device subdirectories to use this function.
Parameters |
Default Value |
Yarn. nodemanager. linux-container-executor.cgroups.mount |
True |
For more information about how yarn uses cgroup, see use cgroups.
Container-executor.cfg
You usually need to add the following configuration to the container-executor.cfg. FPAG. Major-device-number and allowed-device-minor-numbers are optional parameters that indicate the allowed FPGA Devices.
[FPGA] module. Enabled = truefpga. Major-device-number = # The FPGA master device number, which is 246 by default. We strongly recommend that you set this parameter. FPGA. Allowed-device-minor-numbers =## unique device numbers separated by commas (,). A null value indicates all FPGA Devices managed by yarn.
If you want to run the FPGA Program in a non-docker environment:
[Cgroups] # system cgroup root directory (cannot be blank or "/") root =/cgroup # yarn cgroup parent directory yarn-hierarchy = Yarn
Use distributed-shell + FPGA
In addition to memory and virtual cores, the distributed shell program can also apply for more resource types.
Run the distributed shell program without using docker (. bashrc configures SDK-related environment variables ):
yarn jar <path/to/hadoop-yarn-applications-distributedshell.jar> -jar <path/to/hadoop-yarn-applications-distributedshell.jar> -shell_command "source /home/yarn/.bashrc && aocl diagnose" -container_resources memory-mb=2048,vcores=2,yarn.io/fpga=1 -num_containers 1
For started tasks, you will see the following output in the log:
aocl diagnose: Running diagnose from /home/fpga/intelFPGA_pro/17.0/hld/board/nalla_pcie/linux64/libexec------------------------- acl0 -------------------------Vendor: Nallatech ltdPhys Dev Name Status Informationaclnalla_pcie0Passed nalla_pcie (aclnalla_pcie0) PCIe dev_id = 2494, bus:slot.func = 02:00.00, Gen3 x8 FPGA temperature = 54.4 degrees C. Total Card Power Usage = 32.4 Watts. Device Power Usage = 0.0 Watts.DIAGNOSTIC_PASSED---------------------------------------------------------
Specify the IP address that yarn needs to configure before starting the container
For FPGA resources, containers can use the environment variable requested_fpga_ip_id to allow yarn to download and assign an IP address to it. For example, requested_fpga_ip_id = "matrix_mul" will trigger the search for the IP file (. aocx file) whose name contains matirx_mul in the local directory of the container. The program must first distribute the file to each container. Currently, only one IP address can be allocated to all devices. If the environment variable is not set, the user program searches for the IP file. It is important to note that it is not necessary to download the IP address and re-program it in advance in yarn, because the opencl program can find the IP address file and re-program the device at runtime. However, yarn completes this step for containers to achieve the fastest re-programming.
Hadoop 3.1.1-yarn-Use FPGA