As an important device for implementing failover, 8717PCIE switch supports both Virtual Switch mode and basic mode, as well as its DMA and address windows are configured in different ways. Multiple 8717 cards can be combined into a single NT Active-passive mode, or can be combined into a dual NT Active-active mode, which relies on the correct setting of the EEPROM on 8717 switch. Therefore, in the commissioning phase, engineers often need to modify its settings, a careless, it may burn into the wrong EEPROM settings, resulting in the PCIe switch does not work, serious or even cause the BIOS will not boot. So how do we recover from this serious situation?
First, we can consult the technical support of the Board, in general they will be based on the board model and BIOS version number, provide a recover BIOS version. If this is not possible, they may be advised to replace or repair the card. If you are not already under warranty, you may be charged. In addition, we can also consult PLX8717 's technical support, they often encounter a variety of 8717 startup or setup problems, can provide very valuable advice. There are several ways to summarize these recoveries:
1, the recovery by recover BIOS image, this method is suitable for the BIOS image is corrupted or the case of false burning. For some of the super-Granville motherboard, for example, you can put a BIOS image named Super.rom in the root directory of the USB flash disk, restart the machine and enter the Ctrl+home key, until the BIOS printed on the screen "Enter recover mode ...", Then release the Ctrl and Home buttons;
2, if the EEPROM is determined to write errors caused by the failure to start, there is a i2c/spi burner and confirm that the EEPROM chip can be re-welded to the case, you can find the problem on the motherboard EEPROM, blow it down, put in the burner, and then connect the SDK or directly burn the EEPROM chip ;
3, if the above conditions are not available, you can find the problem of the EEPROM chip, according to the chip model or board schematic diagram to determine the chip so and GND pin, using the fly line, in the BIOS enumeration phase to short it, and then wait until the OS up, into the operating system, disconnect so and gnd pins, Use the SDK to re-burn the EEPROM.
Theoretically speaking, these methods can be restored EEPROM, so that the system can be restarted, but whether it is blown off the EEPROM chip or using the fly line short connect so and GND, hardware is a more troublesome changes, all need to use a soldering iron, then there is no easier way? Below share my two experiences that the BIOS failed to boot due to the EEPROM error setting.
We are using the most recent server, a set of servers have two nodes, each node has a 8717, we configured into a single NT mode. The first time that the 8717 Link Port node bar2/3 window size from 2M to 128M, after the restart of the machine after the BIOS stay in the "Enumearete PCI bus ... 91 "This place is not going to go down. We tried the reboot machine, the power reset machine didn't work, and the node at the end of the link port was never able to enumerate this phase through PCI. Hard to try a day without any progress, when the people are exhausted at work, is about to give up, suddenly remembered 8717 Databook said: "For the link port node, 8717 is a PCIe device." That is, the link port node, 8717 can be used as a card, then I can be switched off by the virtual port node to "unplug" the card, so that the link port node can theoretically get up. According to this inference, the experiment was done immediately, shut down the virtual port node, and then restart the link port node, sure enough, the link port port node successfully started! Then reboot the virtual port node, update the link port BAR2/3 size setting to the default value via the SDK software of the virtual port node, and restart the link port machine, and discover that it started as successfully as expected!
About a week later, due to a reckless carelessness in the process of updating the 8717 EEPROM, the system BIOS stays in the "Enumerate PCI bus ... 91 "Can't go down, but this time we are not so lucky, because this problem is set on the virtual port side of the node, virtual port is located in the NTB on this side of the node, there is no link port node shut down to" pull "Drop virtual port. We first recover from the recover BIOS image provided by the technical support, repeated attempts several times, did not have any effect, this should be because the recover BIOS also cannot skip the PCI scan and enumeration to NTB virtual port. Then try to bypass 8717 EEPROM chip method, but found a few laps, did not find the location of the chip, so through the short-circuit EEPROM so and gnd pin method is not feasible. Plus there's no EEPROM/SPI programmer on hand, nor dare you take the risk of blowing the EEPROM down. Had to try the software method. Combined with previous work experience, think of a way to try is to disable the BIOS in the 8717 of the PCIe root complex enumeration, first to ensure that the BIOS successfully boot to the OS, and then in the OS to the PCIe rescan re-scan all PCIe devices, This will make it possible to detect virtual port and then load the appropriate driver to update the EEPROM with the SDK. To do this, I looked for the option of disable 8717 on the IIO port in the BIOS setup menu of the link port, very sorry not to find it, just saw some options to modify the link width/speed settings, though I didn't feel much hope, I still change its link speed from 8gt/s to 2.5gt/s, and the setting of its 16 PCIe LAN from auto to 4 pciex4, I hope this modification will let the 8717 and Iio link traning fail. After a quick experiment, we found that the link speed of the 8717 Iio Port had become 2.5gt/s, but the NT link port still exists, which means that on the virtual port, I cannot bypass the 8717NT in the BIOS. Virtual Port PCI Enumeration, so the "enumerate PCI bus ..." problem is bound to continue to exist. Finally no way, had to base on the advice of 8717 support, buy a soldering iron, open the motherboard, looking for 8717 The position of the EEPROM, starting to prepare the so and gnd pins of the short-term EEPROM. Unfortunately, we nen did not find the EEPROM, but Vista, we accidentally found in the 8717 chip near a disable/enable PLX manual did not mention the jumper! According to the instructions on the Board, we inserted an extended wire on the jumper pin, led to the host outside, first disable PLX, so that the system through the BIOS of the PCI self-test and enumeration, smoothly into the OS, and then through the extension of the wire to enable PLX, and then let 8717 of the IIO Rescan all of the following PCIe devices, successfully found 8717 under the upstream port, but did not scan out virtual port, although the SDK can find 8717 of the EEPROM, but I used it to burn write when found to fail. Analysis of the reason, probably because the manual rescan some of the settings are not quite right, for this I re-do the above experiment, the difference is that this time after the Biso boot, the kernel load before the re-enable EEPROM, let the kernel itself to enumerate all the devices. After the operating system, you can still see 8717 of Upsteram, but cannot find virtual port, glad that this time the SDK can re-burn the virtual port EEPROM. We restored him to the default correct setting, restarted the machine, and did not see the annoying "Enumearte PCI bus ..." Hang.
Summarize two times recovery 8717 experience and lessons, the first lesson is to burn the bridge to write the EEPROM must be cautious, never aggressive, sloppy, otherwise the recovery process is quite painful and troublesome, the second is if the EEPROM caused the system can not start the problem, First, you need to locate the problem port: in virtual port or link port, if it is only in link port, can be shutdown virtual port to enable the link port to start smoothly, you can also by simply modifying the virtual The EEPROM settings on the port end are restarted later. Of course if the problem is on the virtual port side, you need to find the Bypass/skip PLX Bridge or EEPROM method to ensure that the system can enter the system again, then enable the PLX or EEPROM, and then try to recover the EEPROM by software to the correct settings.
This article from "Storage Chef" blog, reproduced please contact the author!
Multifunction PCIe Switch II: EEPROM recovery and troubleshooting