How to speed up the BIOS boot in the server

Source: Internet
Author: User

For the primary server for the first line, the average annual downtime is an important measure of its stability. Therefore, it is particularly important to recover as soon as possible after a system failure. In today's high-end servers, the CPU has multiple, memory capacity is increasing, some up to 512G or even to a few t capacity, access to the PCIe card is more and more, which greatly increased the BIOS system self-test and device scanning time. In addition, due to the traditional BIOS design, the cold start of the specific heat start itself requires several times more time. For example, I tested on a new Intel E5 dual CPU-based server, which showed that cold boot to grub takes 195 seconds and the hot start takes only 52 seconds. Why does cold start with a specific heat start take much more time? Is there any measure to speed up the BIOS boot?


With these questions, combined with previous work experience in a domestic CPU BIOS, carefully read the Intel BIOS Write Guide, there are some thoughts and discoveries, hoping to play a role.


Each Intel E5 chip is multi-core, and during the BIOS boot process, it determines which CPU is Bootsrap Processor (BSP) and which CPU is application Processor (AP), based on the CPU ID and contract mechanism. Normally the BSP is used to do most of the system's initial work, including the initial cache/tlb/memory Controler/pcie Root complex and devices, while the AP waits most of the time until the BSP completes the main task, Sends an inter-core interrupt to wake the AP. In fact the BSP completes the main task, the AP can also assist to complete. For each core, its own internal cache, TLB, can be initialized independently of itself. Connect multiple PCIe Root ports and corresponding cards on the peripheral, or it can be done by the AP. In this way, the initialization time of the internal modules, such as the Cache/tlb/iio/pcie of the K processor, can be scaled down to the time that the internal cache/tlb/iio/pcie of a processor is initialized after concurrent execution.


In addition, when the memory controller is initialized, the memory layout scan, controller parameter training, memory read-write self-test and so on. At this stage, the AP can also share the work of the BSP controller and complete the self-test of the memory, which can be done to maximize the parallel system self-test and device scan initialization. This assumes that before the system initialization takes time t, the system has n processors, and after parallelization, the initialization time can be reduced to t/n. Many optimizations can also be made at this stage:


First, the controller parameters are re-trained only in the case of memory layout and serial number changes. Some parameters of memory and specific memory models, motherboard cabling and other related, in order to adapt to these differences, the first start or cold start after the need to train these parameters to fit the specific board and configuration. However, when the hot start, if the memory layout and model are detected by SMBUS/I2C, you can use the last saved parameter directly, and then do the memory read and write self-test later. This saves time in the training of memory parameters. For general DDR controllers, memory parameter training often takes up most of the memory initialization time. This requires that after each training of the DDR controller's control parameters, these parameters can be saved to nonvolatile storage media, such as Flash or eerom. Because the memory is not initialized at this time, the stack cannot be built up, and it needs to be read-write Flash/eerom through the more complex assembly language. Of course, if the calling interface is agreed and enough registers are available, this phase should also be implemented using the C language programming based on the register stack.


Second, the use of multiple processors concurrently perform memory self-test. A key step in memory initialization is the need to perform multiple patterns of memory read-write testing to ensure that memory parameters are used to support memory read-write errors in many situations. The traditional approach is to run multiple test patterns on the CPU one at a to determine whether the read-write values are consistent. In fact, this stage can fully take advantage of the multi-core processor, the other cores on the same processor can be assigned different modes at the same time, and then execute simultaneously. This verifies both the correctness of the parameters and a stress test on the DDR controller attached to the CPU.


Third, the memory initialization is performed concurrently with multiple processors. Today's high-end servers are likely to have more than one processor, either to improve performance or to implement the Activate-active/active-pass availablilty. There is one or more memory controllers on each processor that need to be initialized. The traditional approach is that all of this work is done by BSP and the AP is just waiting for it to complete. In fact, each memory controller can theoretically be handed over to a single core on a multi-core processor, typically with a processor core next to the memory controller. This allows the time that the M memory controller initializes to be optimized to approximate the time of a memory controller initialization.


In addition to the common approach to parallel processing described above, the BIOS boot can be accelerated based on the features provided by the CPU. Some CPU reboots support both the extraction from LPC flash and the execution of the reference from SPI Flash. The speed at which commands are taken from the LPC bus is much higher than that from the SPI Flash, and the boot mode changes from LPC to SPI to significantly speed up the BIOS boot. This is why most processors now refer to SPI Flash. Some CPUs also support the advanced cache operation, the ability to cache the recent access instructions or data adjacent instructions or data cached to the cache, the subsequent access will be executed from the cache, which is more than from the SPI Flash access to data or instructions block several orders of magnitude. Finally, if the BIOS boot phase has many outputs directed to the slow interface (such as the serial port), it can also significantly affect the BIOS boot speed, so as far as possible to remove unnecessary serial output, it will also improve the BIOS boot speed.


In summary , you can accelerate BIOS startup by parallelization, CPU features, reduce system downtime and improve user experience.



This article from "Storage Chef" blog, declined reprint!

How to speed up the BIOS boot in the server

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.