Source: http://www.eefocus.com/b3574027/blog/15-05/312609_2e5ad.html
The following analysis is based on the Xilinx 7 series
CLB is the Xilinx basic logic unit with two slices per CLB, each slices consisting of 4 (a,b,c,d) 6 input Lut and 8 registers.
Two slices in the same CLB do not have a direct line connection and belong to two different columns. Each column has a separate fast carry chain resource.
Slice is divided into two types of slicel, Slicem. Slicel can be used to generate logic, arithmetic, ROM. Slicem can also be configured as a distributed RAM or 32-bit shift register in addition to the above functions. Each CLB can contain two slicel or a slicel with a slicem.
The Lut for the 7 series contains 6 input a1-a6, two output O5, and O6.
Can be configured as 6 input lookup table, O6 at this time as output. or two 5 input lookup table, a1-a5 as input A6, o5,o6 as output.
A LUT contains 6 inputs with a logical capacity of 2^6bit, 2^7 capacity is required for implementing 7 input logic, and for more inputs. Each slices has 4 lut,256bit capacity for up to 8bit input logic. To implement this feature, each slices also includes 3 MUX (multi-channel selectors)
The F7amux is used to generate a logic function of 7 inputs for connecting A, B, two Lut
The F7bmux is used to generate a logic function for 7 inputs to connect c,d two Lut
F8mux logic function for generating 8 inputs for connection of 4 Lut
Using more than one slices for logic greater than 8 input increases the delay of the logical implementation.
4 registers in a slices can either connect the output of a lut or MUX, or be bypassed directly without any logical resources. The position/reset end of the register is active high. Only the CLK end can be set to two polarity, other inputs to change the level need to insert logical resources. For example, a low-level reset requires additional logical resources to reverse the RST-side input. However, setting the trigger register on the rise/fall edge does not result in additional consumption.
Distributed RAM
The SLICEM can be configured as distributed RAM, and a SLICEM can be configured to the following capacity
Multiple bits of the case need to increase the corresponding multiples of the Lut parallel.
The choice of distributed RAM and BLOCK RAM follows the following methods:
1. Distributed implementations with a capacity of less than or equal to 64bit
2. The depth is between 64~128, if there is no additional block available distributed RAM. Requires the use of distributed RAM for asynchronous reads. Block RAM When the data width is greater than 16.
3. Distributed RAM has better timing performance than block RAM. The distributed RAM is in the logical resource CLB. Block RAM in the dedicated memory column, will produce a large delay in wiring, layout is also constrained.
Shift Register (SLICEM)
The Lut in the Slicem can be set to a 32bit shift register without the use of a trigger, and 4 LUT can cascade into a shift register of 128bit. Moreover, the cascade of SLICEM can be formed to form a more large-scale shift register.
MUX
A LUT can be configured as a 4:1mux.
Two lut configurable up to 8:1 MUX
Four Lut can be configured as 16 MUX
It is also possible to achieve larger designs by connecting multiple slices, but because slice does not have a direct connection, the need to use cabling resources increases the latency.
Carry Chain
Each slice has a carry chain of 4bit. Each bit consists of a carry MUX (MUXCY) and an XOR gate, which generates carrying logic when the addition/subtraction is implemented. The Muxcy and XOR can also be used to generate general logic.
FPGA Fundamentals 3 (Xilinx CLB Resource details--slice, distributed RAM, and block RAM)