Mediatek Interview QA

1. Why are we not checking the hold before CTS?

Before CTS, clock is ideal that means exact skew is not there. All the clocks reaching the flops at the same time. So, we don’t have skew and transition numbers of the clock path, but this information is sufficient to perform setup analysis since setup violations depends on the data path delay. Clock is propagated only after CTS (actual clock tree is built, clock buffers are added & clock tree hierarchy, clock skew, insertion delay comes into picture) and that’s why hold violations are fixed only after CTS.

2. Can both Setup and Hold violations occur in same start and end points?

Yes, if they have different combo paths.

3. What is the derate value that can be used?

For setup check derate data path by 8% to 15%, no derate in the clock path.
For hold check derate clock path by 8% to 15%, no derate in the data path.

4. What are the corners you check for timing sign-off? Is there any changes in the derate value for each corner?

Corners: Worst, Best, Typical.
Same derating value for best and worst. For typical it can be less.

5. Where do you get the WLM’s? Do you create WLM’s? How do you specify?

Wire Load Models (WLM) are available from the library vendors.
We dont create WLM.
WLMs can be specified depending on the area.

6. Where do you get the derating value? What are the factors that decide the derating factor?

Based on the guidelines and suggestions from the library vendor and previous design experience derating value is decided.
PVT variation is the factor that decides the derating factor.

7. Setup Fixes during placement and Setup and hold fixes during CTS?

SETUP FIXES

During Placement Stage:

Timing path groups: We can use this option to resolve Setup timing during placement stage. Groups a set of paths or endpoints for cost function calculations. The delay cost function is the sum of all groups (weight * violation), where violation is the amount for which setup was violated for all paths within the group. If there is no violation within a group, its cost is zero. Groups enable you to specify a set of paths to optimize even though there might be larger violations in another group. When endpoints are specified, all paths leading to those endpoints are grouped.
ICC Syntax:
group_path [-weight weight_value] [-critical_range range_value] -name group_name [-from from_list] [-through through_list] [-to to_list]
Eg: group_path -name “group1” -weight 2.0 -to {CLK1A CLK1B}
Create Bounds: We can constrain the placement of relative placement cells by defining move bounds with fixed coordinates. Both soft bounds and hard bounds are supported for relative placement cells, and both rectangular bounds and rectilinear bounds are supported. To constrain relative placement by using move bounds, use the create_bounds command.
ICC Command:
create_bounds -coordinate {100 100 200 200} “U1 U2 U3 U4” -name bound1
If the design is having timing violation, we can rerun place_opt with the -timing and -effort high options.
ICC Command: place_opt -mtiing-driven -effort high
Timing driven placement tries to place cells along timing critical paths close together to reduce net RCs and meet setup timing.
Change the Floorplan (macros placement, macros spacing and pin orientation) to meet the better timing.

During CTS Stage

Increase the drive strength of data-path logic gates: A cell with better drive strength can charge the load capacitance quickly, resulting in lesser propagation delay. Also, the output transition should improve resulting in better delay of proceeding stages. A better drive-strength gate will have a lesser resistance, effectively lowering the RC time constant; hence, providing less delay. This is illustrated in figure below. If an AND gate of drive strength ‘X’ has a pull down resistance equivalent to ‘R’, the one with drive strength ‘2X’ will have R/2 resistance. Thus, a bigger AND gate with better drive strength will have less delay.
Use data-path cells with lower threshold voltages: HVT Swap. Means change HVT cells into SVT/RVT or into LVT. Low Vt decreases the transition time and so propagation delay decreases. So, replace HVT with RVT or LVT will speed up the timing.
Buffer insertion – If net length is long, then we insert Buffer to boast. It decreases the transition time, which decreases the wire delay. If the amount of wire delay decreases due to decreasing of transition time > cell delay of buffer, then overall delay decreases.
Reduce the amount of buffering in the path: It will reduce the cell delay but increase the wire delay. So, if we can reduce more cell delay in comparison to wire delay, the effective stage delay increases.
Route the net using Higher metal layers
Replace buffers with 2 inverters: Adding inverter decreases the transition time 2 times then the existing buffer gate. Due to that, the RC delay of the wire decreases. Cell delay of 1 buffer gate = cell delay of 2 inverter gate.
Play with clock skew: Positive skew helps improve the setup slack. So, to fix setup violation, we may either choose to increase the clock latency of capturing flip-flop, or decrease the clock latency of launching flipflop. However, in doing so, we need to be careful regarding setup and hold slack of other timing paths that are being formed from/to these flip-flops. This is called Useful Skew. So, basically, Useful skew is nothing but adding delay intentionally in the clock path in order to meet the better timing.

Ways to fix Hold Violation

Increase the drive strength of data-path logic gates
Use data-path cells with higher threshold voltages
Buffer insertion/removal
Route the net using Higher metal layers
Increase the clk->q delay of launching flip-flop

8. Why don’t you derate the clock path by -10% for worst corner analysis?

We can do. But it may not be accurate as the data path derate.

9. What are the importance and need of an MMMC file in VLSI physical design?

Multi-Mode Multi corner (MMMC) file during the physical design gives the analysis of the design over varied modes & corners.
VLSI design can be modeled in either functional or test mode etc., with each mode at varied process corners.
We need to ensure that the design is stable across all corners, to be specific in Tech terms PVT Corners (Process, Voltage & Temperature).
During the process flow of physical design, (prescribed Tool-Cadence, synopsys etc.) MMMC file takes all relevant details for obtaining the desired design.

10. What is Timing DRV/’s, explain the Causes and its fixes?

Timing Drvs :

Max Tran
Max Cap
Max Fanout

Causes:

HVT cells give slower transition: The HVT cells have larger threshold voltages compared to LVTs and RVTs. Hence, they take more time to turn ON resulting in larger transition time.
Weak Driver: The driver won’t be able to drive the load resulting in bad transition of the driven cell. Thus the delay increases.
Load is more: The driving cell cannot drive load more that what it is characterized for. This is set in .lib using max cap value. If the load that a cell sees increases beyond its maximum capacitance value, then it causes bad transition and hence increases delay.
Net length is large: Larger the net length, larger the resistance, worser the transition. Thus results in trans violation. The RC Value of a long net will increase the load seen by a cell causing max cap violations as well.
Fanout is too large: If the fanout number increases beyond the limit of what the driver cell in characterized for, it causes max fanout violations. The increased load results in max cap violation which indirectly causes max tran violation as well.

Fixes:

Replace HVT cells with LVT cells.
Up size the driver.
Reduce the net length by adding buffers. Longer the nets, larger the resistance. Puting a buffer at the middle of a long net splits the resistance into half.
Reduce the load by reducing fanout and downsizing the driven cell.
Up size the driver.
Split long nets by buffering.
Reduce the load by reducing the fanout (by load spliting) or by downsizing the driven cell.
Reduce the fanout by load spliting by buffering or cloning.

11. Why do we emphasize on setup violation before CTS and hold violation after CTS?

Setup time of a valid timing path depends on: Max data network computation time vs clock edge arrival time at the sink. Until POST CTS stage, we assume all clocks as ideal networks and it could reach in 0 time to every possible clock sink of the chip!!

What we need to focus is in implementing the data path in such a way that it should at least not take more than one single clock period of the clock from start point to end point. (assuming a full cycle valid timing path). And out of the two components of the setup timing check, one is always a constant (time period of the clock) and the other variable is data path delay which we have all the options to play with till CTS stage completes.

If we can’t meet this stretch goal before CTS, there is going to be a hard time in closing the timing later. Hence until CTS stage, we focus on getting data path synthesis or data network physical implementation alone.
I hope it is clear why we focus on setup timing before CTS stage.

Let’s see in the other view, why don’t we just focus on hold time ?

Hold time of a path depends on : minimum data path delay vs clock edge time. Since Clock reaches in zero time every sink of the chip and at the minimum, data path delay will be always greater than the hold req of a flop/timing path end point.

So that’s it, unless if there is going to be a change in the clock path network delay, there is no point of analyzing hold timing of a valid path right? (But at bare minimum, one can review the gross hold timing paths just to see if it’s a FP/MCP.)

12. What should we do if there is a setup violation after placements even though we completed the optimization?

Setup violation after placement is nothing to worry about. Well, unless it comes from a really bad placement of modules. Have a look at the macro placement and the module placement and see if something looks bad. For example, if there’s a module for instruction fetch and it’s getting split and placed in two or three different clusters then we may want to attack this with module placement guides or bounds.

Let the tool have the right constraints at place stage, and maybe give another round with timing effort flag marked as high.

Let more rounds of optimization from CTS and routing stages happen. Each of these will try to revisit this issue and will have some improvements to make. I have seen some bad slacks like -500 ps and 30000 plus paths failing, but these are aggressively dealt with by timing team in STA. (Utilizing Stuff like upsizing, fixing max cap, max fanout and max transition, putting in lvt cells)

**Additional note : the routing engine and timing engine used in place stage are not signoff quality, and do not come anywhere close to what a tool like tempus or primetime can evaluate.

13. What is meant by insertion delay in VLSI physical design?

the insertion delay concept comes into picture in clock tree synthesis.
While building the clock tree, cts starts building the clock from the clock source to the sinks.
Once The clock was build and now the clock signal has to travel from the source to the sinks. The amount of time taken by the clock signal to travel from source to sinks is called the insertion delay.

Ex:
At point A the clock source was there, so clock started building from point A it has to reach the sinks (flops) points B,C,D (flops) . So from point A to point B C D the clock signal has to travel. But in between it will build some logic to balance all three sinks because signal has to reach 3 sinks B C D at a time it is called Skew Balancing (main aim of CTS)

The amount of time taken by clock signal from point A to B C D is called insertion delay.

You can refer to LATENCY concepts for more in depth information.

14. Why don’t we do routing before CTS in VLSI Physical Design?

Routing should be done once your design is at a stage where all of your data and clock logical nets are balanced and synthesized properly. Laying down the actual metal routing requires all of the design objects (cells) to be placed at legal sites. Post placement stage is when we reach to that. But it doesn’t mean that your design is ready for routing, you should consider other high fanout nets and clock network signals post placement. Till this stage clocks are ideal networks (assuming can drive any number of loads without any buffering).
During logic synthesis we do not balance HFN and Clock nets, so a single clock port might be driving thousands of flops (with a vIRtual route even after placement). CTS is the stage where this kind of loading is synthesized into a balanced tree to arrive with a min skew and latency for all sinks (flops).
Until you finish logical synthesis of clocks, you are not allowed to route anything. As soon as you finish up with CTS, you can start routing the design clocks first followed by data signals. Let me know if any clarifications are required.

15. What is a path group in VLSI, and why is it done?

As the name indicates, it is a group of paths.
The reason why paths are grouped is to guide the efforts of the synthesis engine.
for e.g. Let us assume that you start with all paths in a single path group.
In this case the synthesis engine will spend most of its time on optimizing the logic of the worst case violators. and once it meets timing will move on to the next worst case violator and so on.
Now looking at the initial timing report you might have identified
Some path’s that need an architectural change (e.g. cascade of adders/multipliers to be replaced by pipelined logic) so you do not want the synthesis engine to spend much time on optimizing this logic. Make it a separate path group with lower priority
Low violation Paths that did not get optimized because all effort was spent on high violation paths. Make separate path groups of these two sets.

16. What is the benefit of having separate path groups for I/O logic paths in VLSI?

Path groups formed the fundamental of optimization function in tools doing synthesis and PnR. Now the more realistic path groups make easier for tool to attain optimal for in all respects.
Now most of the times our I/O constraints are budgeted and cannot be actual. Also they might not be clean from clock domain prospective. So they might impact the qor if they are kept in same group as internal paths. Also the tool work on critical most path and tries to optimize to below certain range called critical range. If an IO path came as most critical path than tool might not work on internal paths and hence sub optimal design.

17. While fixing timing, how do I find a false path in VLSI design?

False path is a very common term used in STA. It refers to a timing path which is not required to be optimized for timing as it will never be required to get captured in a limited time when excited in normal working situation of the chip. In normal scenario, the signal launched from a flip-flop has to get captured at another flip-flop in only one clock cycle. However, there are certain scenarios where it does not matter at what time the signal originating from the transmitting flop arrives at the receiving flop. The timing path resulting in such scenarios is labeled as false path and is not optimized for timing by the optimization tool.

18. What makes meeting timing on clock gating paths very challenging? What makes it more critical than a regular setup/hold flop to flop timing path?

While building clock tree, we try to balance all the flops. This makes the clock gate (CG) driving bunch of flops early in clock tree by delay of the CG itself. This makes the available time to meet setup for clock gating latch clock period minus delay, and hence making it tighter to meet.
Now if the fanout of cg is more than it’s driving capability than a small bigger tree (or may be 2 parallel buffers) will come, making arrival of Clock at CG even early and hence making meeting setup more difficult.

19. What is the difference between a static IR drop and a dynamic IR drop analysis?

Static IR drop is the voltage drop, when a constant current draws through the power network with varying resistance. this IR drop occurs when the circuit is in steady state. Dynamic IR drop is the drop when the high current draws the power network due to the high switching of the cell data. Due reduce static, you should increase width of the power network, or a robust power grid has to be designed, where as to reduce Dynamic IR drop, reduce the toggle rate or place decap cells near high switching cells

20. What is the need of Static IR drop analysis?

IR drop is the voltage drop in metal wires from the power grid before it reaches the VDD pins of standard cells. Due to the IR drop, there can be timing issues due to the change in VDD value.

21. Explain Clock Tree Options for building better Clock Tree?

Five special clock options are available to address this situation. They greatly expand your ability to control the clock building.

Clock phase:

A clock phase is a timer event that is associated with a particular edge of the source clock.
Each clock domain created with two clock phases:
- The rising edge
- The falling edge.

The clock phases are named after the timing clock with R or F to denote rising or falling clock phase.
These phases propagate through the circuit to the endpoints, so that events at the clock pins can be traced to events driven by the clocks defined.
Because Tool is capable of propagating multiple clocks through a circuit, any clock pin can have two or more clock phases associated with it.
For example, if CLKA and CLKB are connected to the i0 and i1 inputs of a 2:1 MUX, all clock pins in the fan-out of this MUX have four clock phases associated with them—CLKA:R, CLKA:F, CLKB:R, and CLKB:F. (This assumes that you allow the propagation of multiple clock phases).

Skew phase:

A skew phase is a collection of clock phases.
Each clock phase is placed into the skew phase of the same name.
When a clock is defined, skew phases are also automatically created. They are created with the same names as the clock phases that are created.

Skew group

Clock tree skew balancing is done on a per-skew group basis.
A skew group is a subdivision of a clock phase.
Normally, all pins in a clock phase are in group 0 and are balanced as a group.
If you have created a set of pins labeled as group 1,
For example,
Then the skew phase containing these pins will be broken into two skew groups: one containing the user-specified group, and one containing the “normal” clock pins.
This feature is useful if we want to segregate certain sets of clock pins and not balance them with the default group. We can now define multiple groups of pins and balance them independently.

Skew anchor or Sink Point

A skew anchor is a clock endpoint pin that controls downstream clock tree.
For example, a register that is a divide-by-2 clock generator has a clock input pin that is a skew anchor, because the arrival time of the clock at that clock pin affects the arrival times of all the clocks in the generated domain that begins at the register Q pin.

Skew offset

The skew offset a floating point number to describe certain phase relationships that exist, when placing multiple clocks with different periods or different edges of the same clock different phases into the same skew phase.
Use the skew offset to adjust the arrival time of a specific clock phase when you want to compare it to another clock phase in the same group.

22. How does a skew group relate to the clock phase and skew phase?

A skew group is a set of clock pins that have been declared as a group. By default, all clock pins are placed in group 0. So each skew phase contains one group.
If the user has created a group of pins labeled by the number 1, for example, then the skew phase that contains these pins will be broken into two skew groups:
This is useful for segregating groups of clock pins that have special circumstances and that you do not want to be balanced with the default group.
Skew optimization is performed on a skew-group basis that takes place after the basic clock is inserted

23. Why to reduce Clock Skew?

Reducing clock skew is not just a performance issue, it is also a manufacturing issue.
Scan based testing, which is currently the most popular way to structurally test chips for manufacturing defects, requires minimum skew to allow the error free shifting of scan vectors to detect stuck-at and delay faults in a circuit.
Hold failures at the best-case PVT Corner is common of these circuits since there are typically no logic gates between the output of one flop and the scan input on the next flop on the scan chain.
Managing and reducing clock skew in this case often resolves these hold failures.