Learn Static Timing Analysis - Learn VLSI

DMSA STA

learnvlsiadmin — Sun, 22 Sep 2024 16:43:15 +0000

Primetime provides an efficient way to analyze timing at different corners and different operating modes. MMMC (multi-mode multi corner) refers to performing timing analysis at various modes and corner.

What is DMSA in STA?

Distributed Multi-Scenario Analysis (DMSA) refers to timing analysis at different scenarios in a distributed manner. Scenarios are combination of various operating corners(process, voltage, temperature, RC corners) and different operating modes (functional mode, test mode, sleep mode, read mode, write mode etc).
Scenarios can be different for timing and power analysis. Some significant scenarios for timing and power are mentioned below,
- RC_worst_slowProcess_HighTemp_LowVoltage: This scenario is generally used for checking setup time. RC_worst indicates the worst values of Resistance and worst values for coupling and ground capacitance on interconnects are taken into the account.
- Slow process indicates variation across the chip that can slow down the performance like slow operation of both pmos and nmos in a CMOS configuration.
- High temperature increases the resistance and low voltage (VDD) increases the charging time of output capacitance; both reducing the performance. Hence this slow corner is used for checking setup time along with any operating mode of the chip.
- Hold timing is also checked at this corner but most of the time it will be clean as hold is getting checked at slow corner.
- RC_best_fastProcess_LowTemp_HighVoltage: This scenario is generally used for checking hold time. RC_best indicates the best values of Resistance and best values for coupling and ground capacitance on interconnects are taken into the account ; fast process indicates variation across the chip that can speed up the process like fast operation of both pmos and nmos in a CMOS configuration.
- Low temperature reduces the resistance and high voltage (VDD) decreases the charging time of output capacitance; both speeding up the performance. Hence this fast corner is used for checking hold time along with any operating mode of the chip.
- Setup timing is also checked at this corner but most of the time it will be clean as setup is getting checked at fast corner.
- RC_worst_fastProcess_LowTemp_HighVoltage: This scenario is worst case check for power analysis. RC worst parasitic corner results in high power dissipation in interconnects, whereas fast process, low temperature and high voltage result in fast transition of the signal.
- Again, this corner can be used to analyze power at any functional mode.

DMSA

Distributed Multi-Scenario Analysis (DMSA) can analyze timing and power in multi-scenario environment in parallel. To invoke prime time in DMSA, use multi_scenario option,
pt_shell ‘-multi_scenario’

A single session gets open (called as master), along with different hosts(worker processes) which will be managed by master.
User only communicates with master. Master allocates task to slaves, each one of which is dedicated to only one scenario. Number of worker processes (pt sessions) will be most of the time equal to the number of scenario.
A scenario is a combination of operating condition (process, voltage, temperature) and operating mode (like functional mode, test mode). So, if there are 8 scenarios defined, then 8 worker processes will be launched which will be managed by single master.
Dividing the data needed for multi-scenario timing analysis into common data and specific data, will speed up the process as certain data is common among all the scenarios like netlist and hence didn’t need to be defined separately for each scenario, and some data will be very specific for each scenario which will not be used by any other scenario like spefs.

DMSA ECO Fixes

1. Transition Time Fixing: Transition time can be fixed in 3 ways, i) swapping the cell to lower vts, ii) up-sizing the cell, iii) inserting buffer.

To avoid any area penalty i.e. to avoid up-sizing of cell or adding any buffer; swapping to lower vts is a safe option to fix transition time on the paths which are violated by less margin. Moreover, inserting a buffer may hamper the setup timing of that particular path. Following is the command used for fixing transition time,

fix_eco_drc -type max_transition -methods size_cell -setup_margin value
–hold_margin value

The size_cell option in above command may up-size the driver cell also. In order to avoid this up-sizing, set PT variable eco_alternative_area_ratio_threshold to 1.0; as setting it to 1.0 will ensure that area of cell will remain same. Other method for fixing transition time is insert_buffer.
If insert_buffer method is used to fix the transition time, then the list of buffers should also need to be mentioned that should be used during buffer insertion. We can also specify setup and hold margin so as to preserve both the timing parameter when fixing transition time.

2. Setup Time Fixing: The very basic step in fixing setup timing is to swaps higher vts to lower vts. Other fixes like up-sizing, cross-talk analysis can be done afterwards if the timing is not getting improved by swapping of the cell. Following is the command used for fixing setup time,

fix_eco_timing -methods size_cell -type setup -hold_margin value -slack_lesser_than value

Setting PT variable eco_alternative_area_ratio_threshold to 1.0 will ensure that size_cell option in above command will prevent any area changes of a cell. If it is set to 2.0, it will allow to increase the cell area by 2 times the original area. Setting it to 0 will impose no restriction on area and tool may size it to any driver strength which is not recommendable.

3. Power Optimization: Cells in the timing path that have higher setup margin can be converted to higher vts, their driver strength can be reduced (downsizing) to optimize leakage and dynamic power, or extra redundant buffers can also be removed if it has enough setup and hold slack. Following is the command that is used for fixing power,

fix_eco_power -setup_margin value -pattern_priority [list …]

When -pattern_priority is specified, tool will only swaps the lower vts to higher vts depending upon the priority given to VT cells, and will not do any downsizing. If -pattern_priority option is not specified than tool will downsize the cells to optimize the dynamic power.

4. Hold Time Fixing: Hold fixing can be done by either swapping lower vts to higher vts or by inserting buffers. For fixing hold using swapping, set the PT variable eco_alternative_area_ratio_threshold to 1.0. Following is the command for fixing the hold timing through swaps,

fix_eco_timing -type hold -methods size_cell -setup_margin value
-slack_lesser_than value

-setup_margin option in the above command will make sure that the setup timing will be preserved by the value specified. If hold fix is to be done by buffer insertion than list of buffer need to be specified that are to be used during buffer insertion. Option -physical_mode will insert buffers by extracting cell placement information from lef and def. Following command for hold fixing is used,

fix_eco_timing -type hold -methods insert_buffer -setup_margin value -physical_mode mode -slack_lesser_than value

The post DMSA STA appeared first on Learn VLSI.

How to fix setup timing violation?

learnvlsiadmin — Sun, 01 Sep 2024 13:00:10 +0000

13 ways to fix setup timing violation

Reducing the Clock Frequency (not preferrable)
𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒𝑙𝑎𝑡𝑒𝑛𝑐𝑦
• The easiest and simplest solution is to reduce the frequency (increase the period) of the clock to add time to the capture time
• Doing this degrade the performance (Data rate / CPU speed / Operations per second / etc)
• The decision to reduce the clock frequency is left to the architecture team and can’t be modified individually by RTL or PNR engineers
• Sometimes this solution is not acceptable because the product standard requires specific data rate that needs to be met

2. Pipelining
𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒𝑙𝑎𝑡𝑒𝑛𝑐𝑦

What is Piepelining?

• The most common way to fix setup in RTL design is to add pipeline registers.
• The idea of pipelining is to split a large 𝑇𝑐𝑜𝑚𝑏 into multiple clock cycles.
• For example, to implement the equation 𝐴 + 𝐵 ∗𝐶, one can do all the operations in one cycle or do the multiplication in one cycle then the addition in the next
cycle as shown in the diagram

• The disadvantage of pipelining is:
o More area due to the pipeline registers
o More latency: Instead of finishing the operation in one cycle we finish it in multiple cycles.
o Synchronization
. Since the data is delayed by the pipeline registers, the downstream logic that will receive the data have to account for this delay. Notice also how we needed to add pipeline on A as well to synchronize 𝐴1 with 𝐵1 ∗𝐶1 otherwise we would have added 𝐴2 from next sample to 𝐵1 ∗𝐶1

3. Multicycle Path

What is Multicycle Path?

• This method has some similarity to pipelining. Similarly, we will let the combinational path finish in multiple cycles.
• The difference is we won’t add pipeline registers. Instead, we will capture the data at another capture clock edge
• This can be done in 2 ways1:
o Use a control circuit to mask the 1st capture edge and allow another one.
o Use a divided clock for the capture FF as shown in the diagram below

𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒𝑙𝑎𝑡𝑒𝑛𝑐�

Multi Cycle Path vs Pipelining
𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒𝑙𝑎𝑡𝑒𝑛𝑐𝑦
• At first it might appear that multi cycle path and pipelining are the same. But a deep look shows the big difference

• In the case of pipelining:
o In the 1st cycle A,B,C enters the 1st stage of the pipeline. In the 2nd cycle A,B,C enters the 2nd stage while a new sample enters 1st stage of the pipeline
o We receive an output every clock cycle and the added latency due to the pipeline registers affects us at the beginning only

• In the case of MCP:
o In the 1st cycle A,B,C enters the circuit. In the 2nd cycle, the circuit is still busy and we can’t insert a new sample until it finishes.
o We receive an output every 2 clock cycles
• This shows that pipelining fix setup and have high processing speed while MCP slows down the processing speed
• You can think of MCP as reducing the clock frequency but selectively in parts of the circuit and not on the entire circuit

4. Retiming

What is Retiming in VLSI

• In this method if 𝑇𝑐𝑜𝑚𝑏 is large to fit in the clock cycle, we split the logic and move part of it to another cycle.

• Consider the example below:
o The red and green logic combined make a 𝑇𝑐𝑜𝑚𝑏=𝟕𝟎𝟎𝑝𝑠 which causes a setup violation.
o We move the green logic to the next clock cycle to be combined with the blue logic.
o This reduces 𝑇𝑐𝑜𝑚𝑏 between FF1 and FF2 to 𝟓𝟎𝟎𝑝𝑠 instead of 𝟕𝟎𝟎𝑝𝑠 which passes setup.
o But increases 𝑇𝑐𝑜𝑚𝑏 between FF2 and FF3 to 𝟑𝟎𝟎𝑝𝑠 instead of 𝟏𝟎𝟎𝑝𝑠 but this is okay because it also passes setup. If the blue logic was big this method won’t.

• Retiming can be done manually by the RTL designer or automatically by the synthesis tools

o In the example below, the purple logic takes as input A and B. If we move the green logic to the next cycle, we get B one cycle later than what was
expected. When we wait for this one cycle, 𝑨𝟏 will be gone and a new 𝑨𝟐 will arrive which will get computed with sample 𝑩𝟏. This will break the
functionality of the circuit

o Synthesis tools will avoid any retiming that breaks the functionality as this example did.
o The RTL designer has full control over the code so he can fix this issue by, for example, adding a pipeline register before the purple logic to delay it one
cycle and handle any new issues that will appear due to this added register
o Hence, the RTL designer can do more aggressive retiming compared to the synthesis tools but with extra effort.

Retiming + Pipelining
𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒𝑙𝑎𝑡𝑒𝑛𝑐𝑦

• The previous example shows how retiming can be combined with pipelining.
• Lets Consider the same example of 𝑨 + 𝑩 ∗𝑪
o We can move the adder to the next clock cycle if there is margin there.
o However, we get the same issue in the previous slide that A is not synchronized with B*C. So we add a pipeline register.
o This way we fixed the setup violation and saved the area of the 𝐵 ∗ 𝐶 pipeline registers

6. Optimizing Synthesis

• Synthesis tools have lots of features and switches that the engineer can use to enhance the timing and control the trade-offs between the PPA metrics.
• This topic is very large and needs a tutorial on its own, so we will demonstrate just a few of what can be done.

o Increase the timing effort
: Most synthesis tools have switches that controls the effort the tool will put to fix a certain PPA metric or to do a certain optimization. Higher effort leads to better optimization but higher runtime while a lower effort leads to less optimization but better runtime.

o Decrease or disable area and power efforts : Area and power optimizations usually degrade the timing of the circuit. Reducing the effort of these optimizations or disabling them all together may enhance the timing but worsen the area and power of your chip

o Enable Flattening: The RTL code consists of several modules connected to each other. By default, synthesis tools will synthesize each module separately and then connect them together in the top module, thus preserve the hierarchy and boundaries between the modules.

Another approach is to remove the module boundaries and make all cells in one hierarchy. This is called flattening and generally produce better timing result.

7. False Path

Applying False Paths in the Constraints
𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒𝑙𝑎𝑡𝑒𝑛𝑐𝑦

• False paths are timing paths that can’t possibly occur due to the logic of the circuit
• Consider the example below:
• Both muxes have the same select signal. This means we have 2 possible timing paths. The one going through both red logics (200 + 300 = 500𝑝𝑠) and the one going through both blue logics (100 +500 = 600𝑝𝑠)
• The paths going through a red logic then a blue logic (200+ 500 = 700𝑝𝑠) or blue logic then red logic (100 +300 = 400𝑝𝑠) is impossible to happen.

• If we don’t apply correct constraints on these paths, not only do we get fake setup
violations, but we hinder the synthesis and
violating timing paths
• Unless we instruct the tool to ignore these false paths, they will be considered for timing analysis leading to the large 𝑇𝑐𝑜𝑚𝑏 of the red to blue path which will violate setup.
𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑝𝑎𝑡ℎ𝑠
PnR tools ability to optimize the other real because the tools apply extreme optimizations only on the critical and worst paths and it won’t consider the less critical paths for these optimizations unless
they solve the most critical ones.

8. Optimizing the Floorplan

• Floorplaning is the 1st step in the PNR flow and involves things like creating the chip size and boundaries, manually placing the major blocks (analog, SRAM, etc) in the chip, and placing the chip ports
• Here are some of the things that affects the setup in the circuit
o A small chip area might cause the cells to get closer to each other and closer to the ports which in turn will reduce the wire delays. However, if the size is too small several issues will appear such as big voltage drop, cell congestion, routing detours, crosstalk, etc1.

o The placement of the major blocks in the chip affects the timing. The example on the left shows how the placement of the SRAMs near the IO ports might block the standard cells from being placed near their relevant ports. Not only that but they will block the routing resulting in longer wire delays to go
around them.

o The placement of the ports also affects the timing. The example on the right shows how a bad placement of the ports can lead to long wire delays and buffering which will worsen 𝑇𝑐𝑜𝑚b

9. Optimizing the wire delay

• In part 1 we showed how a signal propagating through an RC circuit will have a delay proportional to the resistance and the capacitance. Hence, to reduce this delay we need to reduce the resistance and capacitance of the wire.

• This will also decrease the load cap of the cell that drives the wire which will speed up the cell too.

Reducing the resistance 𝑹 =𝝆𝑳/A

Reducing the length 𝑳 of the wire will reduce the delay. We showed some examples on how to reduce it using a better floorplan.
Increasing the width will decrease the delay. Higher metal layers have higher default width and also bigger thickness
hence larger area 𝑨. PNR tools will use these higher layers for long and critical nets to reduce their delay. The PNR engineer can manually move the wires to higher layers during ECO or apply non-default routing rules (NDR) on these nets to make the router route them in higher layers.

Reducing the capacitance 𝑪 =𝝐𝑨/d

Increasing the spacing 𝒅 by moving the two wires aways from each other will reduce the capacitance between them.
We can apply NDR on specific nets to tell the router that we want no nets to get routed very close to these nets
Reducing the common distance. When two wires move along each other for a long distance the common area 𝑨 will be big leading to bigger capacitance. We can move one of the two wires to another layer to reduce the delay.

10. Relaxing the Power Grid
𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒𝑙𝑎𝑡𝑒𝑛𝑐𝑦

• The power grid is the metal connection that delivers the power from higher metal layers down to the standard cells
• We showed how the wire delay is affected by things like spacing and width, etc. A wide and compact power grid will leave few routing resource for the signal nets leaving no option for increasing spacing or width.
• However, relaxing the power grid will increase the resistance of the power network causing bigger voltage to drop. So, the PNR designer has to trade-off between enhancing timing and fixing voltage drop.

11. Upsizing

What is Upsizing in VLSI?

We showed in part 1 how the MOSFET size affects the propagation delay of the cell. So to fix setup we can use larger cells that has less propagation delay

There are several considerations when doing this method:

Bigger cells means more area and power consumption

Bigger cells has larger gate capacitance. This will slow down the cell that drives them because it now has
larger load capacitance. The enhancement of upsizing the cell should overcome the slow down of the
driving cell.

Since big cells consume more power they are likely to cause big voltage drop on the cells around them.

During ECO flow there might not be enough area to accommodate the bigger cell which require you to
move the cells around it and then reroute the nets to their pins. The moving of the cells and the reroute
could worsen the timing for these cells

12. Increasing the Driving Strength
𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒𝑙𝑎𝑡𝑒𝑛𝑐𝑦

• When we discussed upsizing we showed that when a cell drives a large load capacitance its output transition time gets slower which in turn will slow down the load cells.
Increasing the driver strength will enhance the transition time which in turn will enhance the load cells delay

• There are several ways to enhance the driving strength
o Upsizing the driver cell:
Bigger cells produce larger current and hence charge the load capacitance faster. This method combine the benefit of speeding up the driver by upsizing and the benefit of speeding up the load cells because they see a better input transition time.

o Downsizing the load cells:
this will decrease the load capacitance of the driver which will speed up the propagation and transition time which in turn will speed up the load cells. However, smaller cells has larger delay, so for this method to work the gain from enhancing the driving strength should overcome the increase in delay due to downsizing

o Fanout splitting :
Instead of one cell driving all the fanout we can duplicate the driver and split the fanout among them as shown in the diagram. But note that the driver of the driver is now seeing double the load cap which increases its delay. So, you have to balance things to make the overall gain overcome the increase in delay

o Side load isolation:
Add a small buffer that isolates a large load from the driver. In the example shown, the driver now sees the small cap of the buffer instead of the large cap of the large NAND. This will fix the green paths but will worsen the red path because the small buffer will add a delay that increases the overall delay of the red path.

For this method to work, the red path should be passing setup check and have good a margin to accommodate the increase in delay.

13. Breaking up Long Nets

• When a cell drives a very long wire with big capacitance it will have bad propagation and transition times. By breaking the long wire with buffers, the overall enhancement could overcome the delay of the added buffers
• If the wire is very long, we can split it with an inverter pair instead of a buffer. This is better because the delay of an inverter is less than that of a buffer of the same size1. This way we get more cuts in the wire (less load cap for each cell) with roughly the same delay of the added buffer.

The post How to fix setup timing violation? appeared first on Learn VLSI.

7 Simple points to understand Setup Time in VLSI

learnvlsiadmin — Sat, 31 Aug 2024 19:18:12 +0000

At time T=𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑒𝑑𝑔𝑒, Data A is launched from FF1 to FF2. The data needs to make it to FF2 before the next clock edge arrives at FF2 at time 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒𝑒𝑑𝑔𝑒. The next clock edge will arrive after a clock period

𝐿𝑎𝑢𝑛𝑐ℎ = 𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑒𝑑𝑔𝑒

𝐶𝑎𝑝𝑡𝑢𝑟𝑒 = 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒𝑒𝑑𝑔e

2. The clock takes some time to reach FF1 due to the buffers. The launch won’t happen
exactly at T=𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 but after the delay/latency of the clock buffers.

𝐿𝑎𝑢𝑛𝑐ℎ = 𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑒𝑑𝑔𝑒 + 𝑻𝒍𝒂𝒖𝒏𝒄𝒉𝒍𝒂𝒕𝒆𝒏𝒄𝒚
𝐶𝑎𝑝𝑡𝑢𝑟𝑒 = 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔e

3. As we saw in part 1, once the clock reaches the FF it takes some time to push the data out to the Q pin. We called this time 𝑇𝑐𝑞. This is the 1st delay data A encounters to reach FF2.

𝐿𝑎𝑢𝑛𝑐ℎ = 𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑙𝑎𝑡𝑒𝑛𝑐𝑦
𝐷𝑒𝑙𝑎𝑦 =𝑻𝒄𝒒
𝐶𝑎𝑝𝑡𝑢𝑟𝑒 = 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔e

4. Data A will propagate through the combinational path to reach FF2. This is the
2nd delay it encounters.

l𝑎𝑢𝑛𝑐ℎ =𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑒𝑑𝑔𝑒+𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑙𝑎𝑡𝑒𝑛𝑐𝑦
𝐷𝑒𝑙𝑎𝑦=𝑇𝑐𝑞+𝑻𝒄𝒐𝒎𝒃
𝐶𝑎𝑝𝑡𝑢𝑟𝑒=𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔e

5. As we saw in part 1, the FF requires the data to arrive some time before the clock edge in order to avoid metastability. We called this time 𝑇𝑠𝑒𝑡𝑢𝑝. Hence, we shouldn’t capture data at 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒𝑒𝑑𝑔𝑒 but at 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒𝑒𝑑𝑔𝑒−𝑇𝑠𝑒𝑡𝑢p

𝐿𝑎𝑢𝑛𝑐ℎ =𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑒𝑑𝑔𝑒+𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑙𝑎𝑡𝑒𝑛𝑐𝑦
𝐷𝑒𝑙𝑎𝑦=𝑇𝑐𝑞+𝑇𝑐𝑜𝑚𝑏
𝐶𝑎𝑝𝑡𝑢𝑟𝑒=𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒−𝑻𝒔𝒆𝒕𝒖p

6. The clock takes some time to reach FF2 due to the buffers. The capture won’t happen
exactly at 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒−𝑇𝑠𝑒𝑡𝑢𝑝 but after the delay/latency of the clock buffers.

𝐿𝑎𝑢𝑛𝑐ℎ =𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑒𝑑𝑔𝑒+𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑙𝑎𝑡𝑒𝑛𝑐𝑦
𝐷𝑒𝑙𝑎𝑦=𝑇𝑐𝑞+𝑇𝑐𝑜𝑚𝑏
𝐶𝑎𝑝𝑡𝑢𝑟𝑒=𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒𝑒𝑑𝑔𝑒−𝑇𝑠𝑒𝑡𝑢𝑝+𝑻𝒄𝒂𝒑𝒕𝒖𝒓𝒆𝒍𝒂𝒕𝒆𝒏𝒄y

7. To make sure a setup violation doesn’t happen, we need to make sure data A arrives
at FF2 before the required capture time.
The difference between the required and arrival time is called the slack. If the slack is positive, we pass setup and if negative, we fail. The launch FF is called the start point of the timing path and the capture FF is called the endpoint.

𝐿𝑎𝑢𝑛𝑐ℎ +𝐷𝑒𝑙𝑎𝑦 ≤ 𝐶𝑎𝑝𝑡𝑢𝑟𝑒
𝐴𝑟𝑟𝑖𝑣𝑎𝑙 ≤ 𝑅𝑒𝑞𝑢𝑖𝑟𝑒𝑑
𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒𝑙𝑎𝑡𝑒𝑛𝑐y

Setup Timing Report

The example we have shown is for a full cycle path where the 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒𝑒𝑑𝑔𝑒 comes one clock cycle after 𝑇𝑙𝑎𝑢𝑛𝑐ℎ𝑒𝑑𝑔𝑒.
• This is not always the case. The capture edge could come half cycle later, multiple cycles later or from another clock.
o Half cycle paths occur when the launch and capture FFs use different clock edges
o Multi cycle paths occur when the first capture edge is masked by a control circuit and another edge is used
Multi clock paths occurs when the launch and capture FFs use different clocks from each other.
The diagram shows that there could be more than one launch/capture edges combination. The STA tools will consider the worst case (The purple one)1
• All what we learned still apply and nothing changes. We will just plug different values for the clock edges into the setup equation

𝑻𝒍𝒂𝒖𝒏𝒄𝒉𝒆𝒅𝒈𝒆 + 𝑻𝒍𝒂𝒖𝒏𝒄𝒉𝒍𝒂𝒕𝒆𝒏𝒄𝒚 + 𝑻𝒄𝒐𝒎𝒃 < 𝑻𝒄𝒂𝒑𝒕𝒖𝒓𝒆𝒆𝒅𝒈𝒆 − 𝑻𝒔𝒆𝒕𝒖𝒑 + 𝑻𝒄𝒂𝒑𝒕𝒖𝒓𝒆𝒍𝒂𝒕𝒆𝒏𝒄y

The post 7 Simple points to understand Setup Time in VLSI appeared first on Learn VLSI.

Types of Delays in VLSI

learnvlsiadmin — Thu, 29 Aug 2024 06:57:18 +0000

Source Delay (or Source Latency)

It is known as source latency also. It is defined as “the delay from the clock origin point
to the clock definition point in the design”.
Delay from clock source to beginning of clock tree (i.e. clock definition point).
The time a clock signal takes to propagate from its ideal waveform origin point to the
clock definition point in the design.

Network Delay(latency)

It is also known as Insertion delay or Network latency. It is defined as “the delay from the
clock definition point to the clock pin of the register”.
The time clock signal (rise or fall) takes to propagate from the clock definition point to a
register clock pin.

Insertion delay

The delay from the clock definition point to the clock pin of the register.

Transition delay

It is also known as “Slew”. It is defined as the time taken to change the state of the signal.
Time taken for the transition from logic 0 to logic 1 and vice versa . or Time taken by the
input signal to rise from 10%(20%) to the 90%(80%) and vice versa.

Transition is the time it takes for the pin to change state

Slew

Rate of change of logic. See Transition delay.

Slew rate is the speed of transition measured in volt / ns.

Rise Time

Rise time is the difference between the time when the signal crosses a low threshold to
the time when the signal crosses the high threshold. It can be absolute or percent.
Low and high thresholds are fixed voltage levels around the mid voltage level or it can be
either 10% and 90% respectively or 20% and 80% respectively. The percent levels are
converted to absolute voltage levels at the time of measurement by calculating
percentages from the difference between the starting voltage level and the final settled
voltage level.

Fall Time

Fall time is the difference between the time when the signal crosses a high threshold to
the time when the signal crosses the low threshold.
The low and high thresholds are fixed voltage levels around the mid voltage level or it
can be either 10% and 90% respectively or 20% and 80% respectively. The percent levels
are converted to absolute voltage levels at the time of measurement by calculating
percentages from the difference between the starting voltage level and the final settled
voltage level.
For an ideal square wave with 50% duty cycle, the rise time will be 0.For a symmetric
triangular wave, this is reduced to just 50%.

Absolute Rise Time

In absolute rise time, the low and high thresholds are fixed voltage levels around the mid
voltage level.

Percent Rise Time

In percent rise time, the low and high thresholds are percent levels, and are usually either
10% and 90% respectively or 20% and 80% respectively. The percent levels are
converted to absolute voltage levels at the time of measurement by calculating
percentages from the difference between the starting voltage level and the final settled
voltage level.

Definition of Fall Time

Fall time is the difference between the time when the signal crosses a high threshold to
the time when the signal crosses the low threshold. It can be absolute or percent.

Absolute Fall Time

In absolute fall time, the low and high thresholds are fixed voltage levels around the mid
voltage level.

Percent Fall Time

In percent fall time, the low and high thresholds are percent levels, and are usually either
10% and 90% respectively or 20% and 80% respectively. The percent levels are
converted to absolute voltage levels at the time of measurement by calculating
percentages from the difference between the starting voltage level and the final settled
voltage level.

Significance of Rise Time & Fall Time

This is best explained by comparing a square wave with a triangular wave. In an ideal
square wave with 50% duty cycle, the rise time will be 0 and the signal will be above
threshold for 100% of the half period time.

In a symmetric triangular wave, this is reduced to just 50%. More severely affected is the total area above the threshold, which is reduced to 25% of that of square waves.

Though the information about loss of time above threshold is conveyed by many other parameters, the information about loss of area above threshold is only conveyed by rise and fall times.

Rise Time & Fall Time Requirements

The rise time & fall time should be small compared to the clock period. A factor of 10 is
considered good. Very large rise or fall times have the risk of the cycles going undetected.

Also, large rise or fall times mean that the signal will be hovering around mid level for too long, making the system highly susceptible to noise and multiple triggering if there is not enough hysteresis.

This might make you think that the faster rise & fall times are, the better the system is.
Not really. Very fast rise or fall times are not free from trouble.

They might cause severe ringing at the receiver resulting in reduction in voltage & timing margins or even double triggering. Or the fast edges can & will get coupled to the adjacent signal lines causing
false triggering on them or reducing the voltage margins.

Path delay

Path delay is also known as pin-to-pin delay. It is the delay from the input pin of the cell
to the output pin of the cell.

Net Delay (or wire delay)

The difference between the time a signal is first applied to the net and the time it reaches
other devices connected to that net.
It is due to the finite resistance and capacitance of the net.It is also known as wire delay.
Wire delay =fn(Rnet , Cnet+Cpin)

Propagation delay

For any gate it is measured between 50% of input transition to the corresponding 50% of
output transition.
This is the time required for a signal to propagate through a gate or net. For gates it is the
time it takes for a event at the gate input to affect the gate output.
For net it is the delay between the time a signal is first applied to the net and the time it
reaches other devices connected to that net.
It is taken as the average of rise time and fall time i.e. Tpd= (Tphl+Tplh)/2.

Intrinsic delay

Intrinsic delay is the delay internal to the gate. Input pin of the cell to output pin of the
cell.
It is defined as the delay between an input and output pair of a cell, when a near zero slew
is applied to the input pin and the output does not see any load condition.It is
predominantly caused by the internal capacitance associated with its transistor.
This delay is largely independent of the size of the transistors forming the gate because
increasing size of transistors increase internal capacitors.

Extrinsic delay

Same as wire delay, net delay, interconnect delay, flight time.
Extrinsic delay is the delay effect that associated to with interconnect. output pin of the
cell to the input pin of the next cell.

Input delay

Input delay is the time at which the data arrives at the input pin of the block from external
circuit with respect to reference clock.

Output delay

Output delay is time required by the external circuit before which the data has to arrive at
the output pin of the block with respect to reference clock.

The post Types of Delays in VLSI appeared first on Learn VLSI.

What is False Path?

learnvlsiadmin — Wed, 28 Aug 2024 16:49:48 +0000

Static timing analysis is exhaustive by nature. Timing tool will exhaustively look at all
possible timing paths and will perform timing checks.

Because of this, it will also perform timing checks on timing paths which cannot really happen. Best way to
understand this is by examples.

This type of circuit configuration is very common in digital circuits. Functional clock
is active only in functional mode and test clock is active only test mode.

This means when a timing path starts with functional clock launching data at functional flop(FF)
output QF, it should be captured by receiving flop(RF) and capture clock should only
be functional clock.

Because of the exhaustive nature of timing tools, it will also time a path where
functional clock launches data at QF output of function flop(FF) and is captured at D
input of the receiving flop(RF) through test_clk.

Given that functional clock and test clock are not active at the same time, this timing path is false and can never happen.

When functional clock launches data at QF output of functional flop(FF) and it
captured at D input of receiving flop(RF), it can only be sampled through functional
clock and not test clock, as only functional clock will be active at that time.

To drive this point further, take a look at the following circuit.

In the above figure a mux is used to select between functional clock and test clock. In
functional mode only functional clock is active and test clock in inactive.

In test mode only test clock active and functional clock is turned off.
For the above circuit there are only two valid timing paths.

First timing path is where functional clock launches the data at Q output of launch flop
(LF) and this data is captured again by functional clock at capture flop(CF) input D.

Second timing path is similar but with test clock, i.e. where test clock launches the
data at Q output of launch flop(LF) and this data is captured by test clock at capture
flop(CF) input D.

But because of the exhaustive nature of the static timing analysis, timing tool by
default come up with four timing paths.
1) Functional clock launch => Functional clock capture.
2) Functional clock launch => Test clock capture.
3) Test clock launch => Test clock capture.
4) Test clock launch => Functional clock capture.

As you can see only paths 1) and 3) are valid and paths 2) and 4) are false. An explicit
exception or override needs to be provided to the timing tool to address this false
paths

The post What is False Path? appeared first on Learn VLSI.

Multicycle Path

learnvlsiadmin — Wed, 28 Aug 2024 15:57:05 +0000

What are multi cycle paths?

By default, timing paths are single cycle long. Here is what it really means. In digital
circuits, memory elements like flip flops or latches, launch new data at the beginning
of the clock cycle.

During the clock cycle, the actual computation is performed
through the combinational logic and at the end of the clock cycle data is ready and is
captured by the next memory element at the rising edge of the next clock cycle, which
is the same as ending of the current clock cycle.

Following figure illustrates this.

As shown in the figure, the launching flop keeps generating new set of data at the
output pin Q of the launch flop with every rising edge of the clock cycle. Similarly
capture flop keeps sampling input data every rising edge of the clock cycle.

As you can see in the figure the data launched on rising edge ‘1’ (in red) is supposed to be
captured by capture edge ‘1’ (in blue). Similarly capture edge ‘2’ corresponds to
launch edge ‘2’ and so on.
This is called a single cycle timing path.

There is one clock cycle from the launch of the data to the capture of the data. By default, timing tools assume this to be the circuit behavior. Timing tools will perform a setup check with respect to a capture clock edge, which is one clock cycle after the launch clock edge.

But this may not be the case every time. Many times what happens is that the
combinational delay from the launch flop to the capture flop is more than one clock
cycle.

In such cases, one cannot keep launching data at the beginning of every clock cycle and hope to capture correct data at the end of every clock cycle. In such cases data launched at the beginning of a clock cycle will just not reach the capture edge at the end of clock cycle.

When this is the case, the circuit designer has to account for this fact and design of the
circuit.

If the combinational delay from launch flop to the capture flop is more than
one clock cycle, but less than two clock cycles, the circuit designer has to design the
circuit in such a way that data is not launched from the launch flop at every clock
cycle but is launched at every other clock cycle.

And the data launched at the beginning of a clock cycle is captured not after one clock cycle, but two clock cycles.
Following figure depicts this.

As shown in the figure, let’s say that we have a circuit where we know that
combinational delay from launch flop to the capture flop is more than one clock cycle
but quite a bit less than two clock cycles, such that it can meet setup time requirements
comfortably in two clock cycles, but it doesn’t meet setup in one clock cycle.

You can see in the figure the data launched at the launch clock ‘1’, approximately
arrives at the capture flop (Data to be captured (D)) after about one and half clock
cycles.

As it was mentioned earlier, by default timing tools think that all timing paths
are one clock cycle long.

In other words, if the data was launched at launch clock ‘1’, the timing tool will think that it needs to be captured at the capture edge which is one
clock cycle after the launch edge, which is the capture edge shown in the figure with
black rising arrow

Timing tool by default will check setup with respect the capture edge shown in figure
with black rising arrow and will report that input data to the capture flop (Data to be
captured (D)), fails the setup to the capture flop as it arrives later than the capture
edge.

This check is shown in figure with dotted line. In reality we know that this is
false setup check. The capture flop input setup check should be against the capture
edge shown by the blue color.

As stated earlier, our design here is such that we expect
data to be take two clock cycles to travel from launch flop to capture flop and we have
designed our circuit such that launch flop doesn’t launch new data every clock cycle,
but it launches every other clock cycle as shown by the red color launch edges.

In such scenario, we need to provide the timing tool with an exception or an override and we need to tell the timing tool that, it needs to postpone its default setup check by one clock cycle.

In other words, we need to ask timing tool to give the one additional clock cycle time for the setup check. Usually this is achieved by something like following.

set_multi_cycle 2 -from -to Where ‘2’ is the clock cycle count. It instructs timing tool to use 2 clock cycles and not just default ‘1’ for the cases where we want timing tool to use 2 clock cycles.

The post Multicycle Path appeared first on Learn VLSI.

Lockup latch to avoid Hold violation

learnvlsiadmin — Tue, 27 Aug 2024 13:16:05 +0000

How does lockup latch help with avoiding hold violations?

If you understand hold time check very well, or if you have been analyzing the
waveforms for hold time check, you will realize that hold time issues start happening
as soon as launch and capture clock edge align with each other or are very close to
each other.

We know that more spread apart launch and capture edge are in such a way that launch
edge is later than the capture edge, less of a hold time concern there is.

We know that when launch and capture clock are from the same source and have same
waveform, the greatest distance between an edge in launch clock and an edge in
capture clock can not be greater than clock phase.

Because if try to do that you will approach one of the edge closer on the other side.
If the falling edge of clock is the launch edge and rising edge of clock is capture edge,
we know that launch and capture edge would be a phase apart and as long as launch
edge happens after capture edge, we would have a phase worth of margin for hold
check.

This is true for the case where falling clock edge is capture edge and rising
edge is launching edge. The key is that they are a clock phase apart and launch happens
later than capture.

This is what exactly a lock up latch achieves. It changes the launch edge from rising to
falling edge and capture edge remains rising.

So, we get launch and capture edges to be farthest apart (clock phase) giving us best possible hold time protection.

Also launch happens later than capture, which is what we want. Lets take a look at the figure below
to better understand this.

Here we are assuming launch and capture flops to be rising edge triggered. As shown
in figure before lock up latch, it’s a simple setup and hold check.

The issue is this type of hold check (also called race as launch and capture edges are the same, it is like a
data race), it could be very difficult and expensive to fix this type of hold violations, if launch and captured clock common points are far apart, there could be quite a large
clock uncertainty.

This is very typical for scan or test clocks where last flop in one scan chain is in a specific clock domain and first cell of next scan chain is in a different clock domain. There could be large hold violations for such paths.

Low phase latch, launches data at the falling edge of the clock and remains transparent
during low phase. Essentially by introducing the lockup latch, we moved launch edge
from rising to falling, and now our launch and capture edges are a clock phase apart,
we have a clock phase worth of margin(slack) to meet the hold time requirement.

As shown in figure there can be a large uncertainty between testclock_a and
testclock_b. If you recall from the hold margin equation, larger the clock uncertainty
larger the negative slack that will have to be fixed.

In such situations the lockup latch is introduced between the two chains to address the
hold violation

One has to realize that lockup latch doesn’t come completely free. Because it changes
the launch edge from rising to falling, we are modifying our setup or max timing path
from the original launch flop to capture flop from a full clock cycle to half clock
cycle(clock phase).

Normally you would think of adding lockup latch only if you had
hold issues to begin with which means, there was not a setup problem to begin with.
Because if you had hold problems that means there wasn’t much path delay from the
launch flop to capture flop.

Does location of lockup latch matter? What if in previous example
you moved lockup latch from near launch flop to capture flop?

The location of lockup latch very much matters. When you introduce lockup latch in between two flops, you are essentially breaking timing path into two segments.

One path from the original launch flop to the lockup latch and other timing path from the
lockup latch to the original capture flop.

There is a reason why we didn’t bother about the timing path from the launch flop to
the lockup latch. Original launch flop launches data at rising edge of the clock and low
phase lockup latch captures data at the rising edge of the clock as well.

This could be a hold time issues, but it really is not because we clocked the lockup latch with the same
clock that was clocking the launch flop. In Fact it is essential we do this and place low
phase lockup latch right next to the launch flop.

Doing so will ensure that there is no hold time issue from the launch flop to the lockup latch, as essentially it is the same clock net that is driving both, hence there can not be a data race from the launch flop
to the lockup latch

As shown in this figure, there is a hold check that is supposed to happen from the
launch flop to the lockup latch, but it really is not an issue because of the same clock
edge first launching data and then capturing the data.

Many timing tools understand this configuration and might not report this hold check, and even if timing tool reports this hold check, it should pass.

You can see that once the lockup latch is moved close to capture flop, the hold
violation from the launch flop to the lockup latch becomes the real issue as both clocks
are now different and could come from different domain as we saw in test clock
example and lockup latch are really not serving any purpose to fix the hold violation.
Hence it is vital to place the lockup latch at correct location with correct clock.

The post Lockup latch to avoid Hold violation appeared first on Learn VLSI.

Hold Failure to a Flipflop

learnvlsiadmin — Tue, 27 Aug 2024 12:49:02 +0000

What is Hold time?

As we saw in previous question about setup time, for any sequential element e.g. latch
or flip-flop, data needs to be held stable when clock-capture edge is active.

Actually, data needs to be held stable for a certain time after clock-capture edge deactivates,
because if data is changing near the clock-capture edge, sequential element can get
into a metastable state and can capture wrong value at the output.

This time requirement that data needs to be held stable for after the clock capture-edge
deactivates is called hold time requirement for that sequential.

Hold Time Failure to a Flipflop

Like setup, there is a ‘Hold’ requirement for each sequential element (flop or a latch).
That requirement dictates that after the assertion of the active/capturing edge of the
sequential element input data needs to be stable for a certain time/window.

If input data changes within this hold requirement time/window, output of the sequential element could go metastable or output could capture unintentional input data. Therefore, it is very crucial that input data be held till hold requirement time is met for the sequential in question.

In our figure below, data at input pin ‘In’ of the first flop is meeting setup and is
correctly captured by first flop. Output of first flop ‘FF1_out’ happens to be inverted
version of input ‘In’.

As you can see once the active edge of the clock for the first flop happens, which is
rising edge here, after a certain clock to out delay output FF1_out falls. Now for sake
of our understanding assume that combinational delay from FF1_out to FF2_in is very
very small and signal goes blazing fast from FF1_out to FF2_in as shown in the figure
below.

In real life this could happen because of several reasons, it could happen by design
(Imagine no device between first and second flop and just small wire, even better think
of both flops abutting each-other), it could be because of device variation and you
could end up with very fast device/devices along the signal path, there could be
capacitance coupling happening with adjacent wires, favoring the transitions along the
FF1_out to FF2_in, node adjacent to FF2_in might be transitioning high to fall)
with a sharp slew rate or slope which couples favorably with FF2_in going down and
speeds up FF2_in fall delay.

In short in reality there are several reasons for device delay to speed up along the
signal propagation path. Now what ends up happening because of fast data is that
FF2_in transitions within the hold time requirement window of flop clocked by clk2
and essentially violates the hold requirement for clk2 flop.

This causes the falling transition of FF2_in to be captured in first clk2 cycle where
as design intention was to capture falling transition of FF2_in in second cycle of clk2.

In a normal synchronous design where you have series of flip-flops clocked by a grid
clock (clock shown in figure below) intention is that in first clock cycle for clk1 &
clk2, FF1_out transitions and there would be enough delay from FF1_out to FF2_in
such that one would ideally have met hold requirement for the first clock cycle of clk2
at second flop and FF2_in would meet setup before the second clock cycle of clk2 and
when second clock cycle starts, at the active edge of clk2 original transition of
FF1_out is propagated to Out.

Now if you notice there is skew between clk1 and clk2, the skew is making clk2 edge
come later than the clk1 edge (ideally we expect clk1 & clk2 to be aligned perfectly,
that’s ideally !!).

In our example this is exacerbating the hold issue, if both clocks
were perfectly aligned, FF2_in fall could have happened later and would have met
hold requirement for the clk2 flop and we wouldn’t have captured wrong data!

If Hold Violation exist in the design, Is it ok to signoff?

You cannot sign off the design if you have hold violations. Because hold violations
are functional failures. Setup violations are frequency dependent.

You can reduce frequency and prevent setup failures. Hold violations stemming from the same clock
edge race, are frequency independent and are functional failures because you can end
up capturing unintended data, thus putting your state machine in an unknown state.

The post Hold Failure to a Flipflop appeared first on Learn VLSI.

Setup Failure of a Flipflop

learnvlsiadmin — Tue, 27 Aug 2024 12:38:42 +0000

What is setup time?

For any sequential element e.g. latch or flip-flop, input data needs to be stable when
clock-capture edge is active.

Actually, data needs to be stable for a certain time before clock-capture edge activates, because if data is changing near the clock-capture edge, sequential element (latch or flip-flop) can get into a metastable state, and it could take unpredictable amount of time to resolve the metastability and could settle at at state which is different from the input value, thus can capture unintended value at the
output.

The time requirement for input data to be stable before the clock capture edge
activates is called the setup time of that sequential element

Timing Propagation from one Flipflop to another Flipflop

Following is a simple structure where output of a flop goes through some stages of
combinational logic, represented by pink bubble and is eventually samples by
receiving flop.

Receiving flop, which samples the FF2_in data, poses timing requirements on the input data signal.
The logic between FF1_out to FF2_in should be such that signal transitions could
propagate through this logic fast enough to be captured by the receiving flop.

For a flop to correctly capture input data, the input data to flop has to arrive and become
stable for some period of time before the capture clock edge at the flop.
This requirement is called the setup time of the flop.

Usually, you’ll run into setup time issues when there is too much logic in between two flop or the combinational delay is too small. Hence this is sometimes called max delay or slow delay timing issue, and the constraints is called max delay constraint.

In figure there is max delay constraint on FF2_in input at receiving flop. Now you can
realize that max delay or slow delay constraint is frequency dependent.

If you are failing setup to a flop and if you slow down the clock frequency, your clock cycle time
increases, hence, you’ve larger time for your slow signal transitions to propagate
through and you’ll now meet setup requirements.

Typically, your digital circuit is run at certain frequency which sets your max delay constraints. Amount of time the signal falls short to meet the setup time is called setup
or max, slack or margin

Setup time failure of a Flipflop

Following figure describes visually a setup failure. As you can see that first flop
releases the data at the active edge of clock, which happens to be the rising edge of the
clock. FF1_out falls sometime after the clk1 rises.

The delay from the clock rising to the data changing at output pin is commonly
referred to as clock to out delay. There is finite delay from FF1_out to FF2_in through
some combinational logic for the signal to travel.

After this delay signal arrives at second flop and FF2_in falls. Because of large delay
from FF1_out to FF2_in, FF2_in falls after the setup requirement of second flop,
indicated by the orange/red vertical dotted line.

This means input signal to second flop FF2_in, is not held stable for setup time requirement of the flop and hence this flop goes metastable and doesn’t correctly capture this data at it’s output.

As you can see one would’ve expected ‘Out’ node to go low, but it doesn’t because of
setup time or max delay failure at the input of the second flop.

Setup time requirement dictates that input signal be steady during the setup window ( which is a certain time before the clock capture edge ).
As mentioned earlier if we reduce frequency, our cycle time increases and eventually
FF2_in will be able to make it in time and there will not be a setup failure. Also notice
that a clock skew is observed at the second flop.

The clock to second flop clk2 is not aligned with clk1 anymore and it arrives earlier, which exacerbates the setup failure.

This is a real world situation where clock to all receivers will not arrival at same time
and designer will have to account for the clock skew. We’ll talk separately about clock
skew in details

The post Setup Failure of a Flipflop appeared first on Learn VLSI.

Launch edge and Capture edge

learnvlsiadmin — Tue, 27 Aug 2024 12:28:52 +0000

What is a Launch edge?

In synchronous design, certain activity or certain amount of computation is done
within a clock cycle.

Memory elements like flip-flop and latches are used in
synchronous designs to hold the input values stable during the clock cycle while the
computations are being performed.

Beginning of the clock cycle initiate the activity and by the end of the clock cycle
activity has to be completed, and results have to be ready.

Memory elements in a design transfer data from input to output on either rising or the falling edge of the
clock. This edge is called the active edge of the clock.

During the clock cycle, data propagates from output of one memory element, through
the combinational logic to the input of second memory element.

The data has to meet a certain arrival time requirement at the input of the second memory element.

As shown in the above figure, the active edge of the clock(shown in red) at the first
memory element makes new data available at the output of the memory element and
starts data to propagate through the logic.

Input ‘in’ has risen to one before the first active(rising) edge of the clock, but this value of ‘in’ is transferred to Q1 pin only when clock rises.

This active edge of the clock is called the launch edge, because it
launches the data at the output of first memory element, which eventually has to be
captured by next memory element along the data propagation path.

What is Capture edge?

As we discussed in previous question, the way synchronous circuits work, certain
amount of computation has to be done within a clock cycle.

At the launch edge of the clock, memory elements transfer fresh set of data at the output pin of the launching memory elements. This new data, ripples through the combinational logic that carries
out the stipulated computation.

By the end of the clock cycle, new computed data has to be available at the next set of
memory elements. Because next active clock edge, which signifies the end of one
clock cycle, captures the computed results at the D2 pin of the memory element and
transfers the results to the Q2 pin for the subsequent clock cycle.

This next active edge of the clock, show in blue at figure 1, is called the capture edge, as it really is
capturing the results at the end of the clock cycle.
There are some caveats to be aware of. The data D2 has to arrive certain time before
the capture edge of clock, in order to be captured properly. This is called setup time
requirement, which we will discuss later.

Although it is said that computation has to be done within one clock cycle, it is not
always the case. In general, it is true that computation has to be done within one clock
cycle, but many times, computation can take more than one cycle. When this happens
we call it a multi cycle path.

The post Launch edge and Capture edge appeared first on Learn VLSI.