Design for Testability: The Need for Modern VLSI Design
DFT is the acronym for Design for Testability. DFT is an important branch of VLSI design and in crude terms, it involves putting up a test structure on the chip itself to later assist while testing the device for various defects before shipping the part to the customer.
Have you ever wondered how the size of electronic devices is shrinking? Mobile phone used to be big and heavy with basic minimal features back in 90s.
But nowadays, we have sleek phones, lighter in weight and with all sorts of features from camera, bluetooth, music player and not to forget with faster processors.
All that’s possible because of the scaling of technology nodes. Technology node refers to the channel length of the transistors which form the constituents of your device. Well, we are moving to reduced channel lengths. Some companies are working on technology nodes as small as 18nm. Smaller is the channel length, more difficult it is for the foundries to manufacture. And more are the chances of manufacturing faults.
Possible manufacturing faults are: Opens and shorts.

The figure shows two metal lines one of which got “open” while other got “shorted”. As we are moving to lower technology nodes, not only the device size is shrinking but that also enables to pack more transistors on the same chip and hence density is increasing. And manufacturing faults have become therefore indispensable. DFT techniques enable us to test these (other kinds as well) faults.
Kinds of defects:
- Opens and shorts, as mentioned above, can cause functional failures. A kind of open and shorts, where any node might get shorted to ground is referred to as stuck-at-0 (SA0) fault or in cases where the node might get shorted to the power supply is referred to as stuck-at-1 (SA1) fault.
- Speed Defect: May arise due to coupling of a net with the adjacent net and hence affecting the signal transition on it.
- Leakage Defect: Where a path might exist between the power supply and ground and this would cause excessive leakage power dissipation in the device.
DFT Modes: Perspective
As more number of transistors are finding their way onto a single SoC, Design for Testability (DFT) is becoming an increasing important function of the SoC design cycle.
As the technology nodes are shrinking consistently, the probability of the occurrence of the faults is also increasing which makes DFT an indispensable function for modern sub-micron SoCs.
What are the possible faults within an SoC and whhat all ways are possible to detect them? We will take them up briefly.
Imagine that you own a chip manufacturing company for the automotive industry. The end application can be something meant for infotainment, engine management, rear view camera, ethernet connectivity, power glasses or for a critical application like collision detector or air-bag control etc. You wouldn’t like to sell a faulty chip to your customers for two main reasons:
- Trust of the customer which would impact the goodwill of the company.
- Loss of business: Maybe because the customer opted for some other semiconductor vendor or even worse, the chip failed at the user end and he sued your customer who ultimately sued you!
Hence, it is pretty important to test the chip before shipping it out to the customers.
Types of fault and their detection:
- Structural Fault: Basically refers to the faults due to faulty manufacturing at the fabs. Even a tiny dust particle has cause shorts or opens in an SoC.
Let’s try to understand it from our example. Let’s say you manufactured the chip but there is a fair probability that there might be some structural inadequacies in the form of shorts or opens. - Imagine any digital circuit. Single short or a single open can cause the entire functionality of the device to go haywire. Structural testing is done during the DFT tests or modes called as Shift and Stuck-At Capture. We’ll discuss these in detail in the upcoming posts. Note that these tests are conducted after manufacturing, before shipping the part to the customer.
- Transition Faults: Signal transitions are nothing but the voltage levels while they switch from either ‘high’ to ‘low’ or vice-versa. There is a designated time before the clock edge when all the signals should be stable at the input of the Flop (a very crude definition of setup time) and also a designated time after the clock edge when none of the signals should change their states at the input of the Flop (a very crude definition of hold time). Any such fault in the transition times (conversely: setup or hold violations) is referred to as a transition fault.
- Going back to our example. Suppose that you first filtered out the chips which had some structural fault. Now you would test the remaining chips for transition faults. What would happen if you ship a chip with a transition fault to a customer? If it had a setup violation, the chip will not be able to work at the specified frequency.
- However, it will be able to work at a slower frequency. If it had a hold violation, the chip will not be able to work at all! One possible consequence from our example could be that in an event of a collision you would expect a few micro- or nano- seconds for the air bag to open up, it might ending up taking seconds! Unfortunately, it would be too late.
- The At-Speed test is used to screen the chip for transition faults.
Broadly speaking, there are only two types of the faults as discussed above. However, there’s another possibility which can arise.
Imagine that your car has an SoC which senses a collision and opens the air bag within a few micro-seconds of the collision. You would expect it to open up if such a scenario arises. - But what if your car is, let’s say, 6 years old and the chip is now not functioning as expected. In this case, you would like to test the chip first. And if it is fine, you may proceed on to ignite the engine and start the car. Such a scenario would demand conducting a test while the chip is in operation.
- Such a DFT test is called LBIST Test (Logical-Built-in-Self Test). In an LBIST test, one would be testing the entire chip or a sub-part of it for structural and/or transition faults. Such a test for memory is referred to as MBIST Test (Memory-Built-in-Self-Test).
An important characteristic of a built in self test is that it is self sufficient. It would carry out the test internally, following a software trigger, without any external input; carry the test; and send out a signature in terms of PASS/FAIL.
Two Pillars of DFT: Controllability & Observability
I haven’t given an equitable share of attention to DFT, and now, it’s time to make some amends!
Just like Timing is built on two pillars: Setup & Hold, entire DFT is built on two pillars: Controllability & Observablity.
Very often you would find DFT folks cribbing that they can’t control a particular node, or don’t have any mechanism to observe a particular node in question. You may like to review the previous post: DFT Modes: Perspective before proceeding further.
Shifting our attention to the pillars of DFT, let’s define the two.
- Controllability: It is the ability to have a desired value (which would be one out of 0 or 1) at any particular node of the design. If the DFT folks have that ability, they say that that particular node is ‘controllable’. Which in turn means that they can force a value of either 0 or 1, on that node!
- Observability: It is the ability to actually observe the value at a particular node whether it is 0 or 1 by forcing some pre-defined inputs. Note that, unlike the circuit that we make on paper, the SoC is a colossal design and one can observe a node only via the output ports. So, DFT folks actually need a mechanism to excite a node and then fish that value out of the SoC via some output port and then ‘observe’ it!
Ideally, it is desired to have each and every node of the design controllable and observable. But reality continues to ruin the life of DFT folks! (Source: Calvin & Hobbes).
It is not always possible or rather practical to have all the nodes in a design under your control, because of the sheer complexity that modern SoCs possess.
And therefore, it is the reason you would hear them talk about ‘Coverage’. Let’s say coverage is 99%, this means that we have the ability to control and observe 99% of the nodes in the design (A pretty big number, indeed!).
Now let’s take some simple examples.

In the above example, if we have control the flops such that the combo cloud results in 1 at both the inputs of AND gate, we say that the node X is controllable for 1.
Similarly, if we can control any input of AND gate for 0, we say that node X is controllable for 0. Similarly, let’s say we wish to observe the output of FF1. If we can somehow replicate the value of FF1 by making the combo clouds and AND gate transparent to the value at FF1, we say that output of FF1 is observable.
Intuition tells us that for AND gate to be transparent, we should have the controllability of other node for 1. Because when one input of AND gate is 1, whatever is the value at the other input, it is simply passed on!!
Dynamics of Scan Testing
In accordance with the Moore’s Law, the number of transistors on integrated circuits doubles after every two years. While such high packing densities allow more functionality to be incorporated on the same chip, it is, however, becoming an increasingly ponderous task for the foundries across the globe to manufacture defect free silicon.
This predicament has exalted the significance of Design for testability (DFT) in the design cycle over the last two decades. Shipping a defective part to a customer could not only result in loss of goodwill for the design companies, but even worse, might prove out to be catastrophic for the end users, especially if the chip is meant for automotive or medical applications.
Scan testing is a method to detect various manufacturing faults in the silicon. Although many types of manufacturing faults may exist in the silicon, in this post, we would discuss the method to detect faults like- shorts and opens.
Figure 1 shows the structure of a Scan Flip-Flop. A multiplexer is added at the input of the flip-flop with one input of the multiplexer acting as the functional input D, while other being Scan-In (SI).
The selection between D and SI is governed by the Scan Enable (SE) signal.

Figure 1: Scan Flip-Flop
Using this basic Scan Flip-Flop as the building block, all the flops are connected in form of a chain, which effectively acts as a shift register. The first flop of the scan chain is connected to the scan-in port and the last flop is connected to the scan-out port.
The Figure 2 depicts one such scan chain where clock signal is depicted in red, scan chain in blue and the functional path in black. Scan testing is done in order to detect any manufacturing fault in the combinatorial logic block.
In order to do so, the ATPG tool try to excite each and every node within the combinatorial logic block by applying input vectors at the flops of the scan chain.

Figure 2: A Typical Scan Chain
Scan operation involves three stages: Scan-in, Scan-capture and Scan-out. Scan-in involves shifting in and loading all the flip-flops with an input vector.
During scan-in, the data flows from the output of one flop to the scan-input of the next flop not unlike a shift register. Once the sequence is loaded, one clock pulse (also called the capture pulse) is allowed to excite the combinatorial logic block and the output is captured at the second flop.
The data is then shifted out and the signature is compared with the expected signature. Modern ATPG tools can use the captured sequence as the next input vector for the next shift-in cycle.
Moreover, in case of any mismatch, they can point the nodes where one can possibly find any manufacturing fault. Figure 3 shows the sequence of events that take place during scan-shifting and scan-capture.

Figure 3: Waveforms for Scan-Shift and Capture
Shift Frequency: A trade-off between Test Cost and Power Dissipation
It must be noted that the number of shift-in and shift-out cycles is equal to the number of flip-flops that are part of the scan chain. For a scan chain with, let’s say, 100 flops, one would require 100 shift-in cycles, 1 capture cycle and 100 shift-out cycles.
The total testing time is therefore mainly dependent on the shift frequency because there is only capture cycle. Tester time is a significant parameter in determining the cost of a semiconductor chip and cost of testing a chip may be as high as 50% of the total cost of the chip.
From timing point of view, higher shift frequency should not be an issue because the shift path essentially comprises of direct connection from the output of the preceding flop to the scan-input of the succeeding flop and therefore setup timing check would always be relaxed.
Despite the fact that higher shift frequency would mean lower tester time and hence lower cost, the shift frequency is typically low (of the order of 10s of MHz). The reason for shifting at slow frequency lies in dynamic power dissipation.
It must be noted that during shift mode, there is toggling at the output of all flops which are part of the scan chain, and also within the combinatorial logic block, although it is not being captured.
This results in toggling which could perhaps be more than that of the functional mode. Higher shift frequency could lead to two scenarios:
- Voltage Droop: Higher rate of toggling within the chip would result in drawing more current from the voltage supply. And hence there would be a voltage droop because of the IR drop. This IR drop could well drop the voltage below the safe margin and the devices might fail to operate properly.
- Increased Die Temperature: High switching activity might create local hot-spots within the die and thereby increase the temperature above the worst-case temperature for which timing was closed. This could again result in failure of operation, or in the worst case, it might cause thermal damage to the chip.
