Testability or Design For Test
There are two different term to talk about the subject of this course: testability and design for test.
For sure the activity we are covering is directly link to the test of component. However, the term testability does not show clearly the impact of such activity in terms of design. That’s the reason why I do prefer using design for test DFT, and this is the term used in the professional world.
The Design for Test is a group of design elements and techniques that allow catching manufacturing defect on a given device. For a final user point of view, none of the DFT part is seen, since it does not bring to them any usable feature.
The DFT is all about making sure a piece out of fab is free of silicon defect, and is then ready for usage.
Be careful with the target of DFT. This is not a way to catch conception bugs. In case there is a mistake in the design (wrong functionality, issue with performances), the test phase run on each device thanks to the DFT element will not be able to detect it. This is not the scope of DFT. Bugs are supposed to be catch earlier in the development flow, thanks to verification (simulation at RTL and gate level) and validation (simulation on FPGA prototype).
As most of other activities, the final goal of DFT is making money or rather not to lose money because of manufacturing defects.
As a System On Chip (SoC) provider, you must insure that any device you sell to a customer is a working part, free of silicon defect. However, even if you have the best DFT ever and the most robust process and technology, you will never be able to cover a SoC at 100% during test. You must then achieve the best coverage you can, in order after test, having the best confidence on the device you are shipping to your customer.
By contract, silicon device providers insure to their customer a DPPM objective. This is the Defect Pieces Per Million. It gives you a number of bad device you are “allowed” to sell (by error) to your customer. If you have more failing parts, you will be charged with a financial penalty. The DPPM objective is customer dependent, however there are some standard values used for the different market:
- 100 DPPM for consumer market (mobile phone, computer…)
- 10 DPPM for automotive, aeronautics and military.
Test use cases.
As we mentioned earlier, there is no way to cover at 100% a given device. It means that in some case, a manufacturing defect is not catch by the test run on the die.
There are then 4 different situations that may occur:
There are two cases where the test can deliver the good diagnosis (fortunately this is the more common case if the DFT work has been done correctly).
There is then the case where you trash a functional die. For sure this is a direct loss of money because this is a piece you could have sold to a customer. But this is not the worst case.
Worst case is when the test sequence is not able to catch there is a manufacturing defect on the die. This is the case that need to be prevent as much as possible. In such situation, a not-functional component is shipped to your customer. He will integrate is in its own system and then sell it to its own customers. After a period of time, the final user may identify an issue with his device. When your customer identify the problem is coming from the piece you sold to him, it will be count at one more piece in the DPPM count. When the max value is raised, you will be charge of financial penalties that can represent a significant amount of money.
Structural versus Functional test
Now we know what kind of situation must be avoided, we need to identify how to test any single die out of fab with a high level of fault coverage.
The classical methodology to test a component is to run what we call functional test. In the case of a microcontroller, it means writing some embedded code and making sure that the component is working as expected. The problem with this kind of methodology is the efficiency that corresponds to the ratio between coverage and test time. Functional testing is slow and to cover at 90% actual microcontrollers it would take about 50 years for any single piece!
A 50 years test time is not acceptable, in terms of cost and of course if you need years to test each device, you can be sure you would miss you market opportunity.
Design For Test techniques are mostly based on structural approach. You do not care any more about the function itself, but you need to make sure that the atomic function (each logic gate) is satisfying its own spec.
With structural approach, this is possible to cover actual microcontrollers with a 95%+ coverage in seconds.
DFT job: make it fast, make it small
Nobody wants to pay for test. Your customers expect you to deliver good device. They would not pay for test.
Silicon area is expensive. Since the test does not bring visible new functionalities, there is no simple way to be rewarded for it.
When you create DFT element, you must then keep in mind you should keep it as tiny as possible. In another hand, you need to increase the efficiency of the new or existing DFT mechanisms.
In other words, you need to solve the problematic of creating something small, that test a component with a very high coverage in a short period of time.
DFT is intrusive
Never forget the D of DFT. Since there is element of design there is a significant impact on the functional conception itself. This is not because you create a functional IP that is will be ready or friendly for DFT. And if an IP cannot be tested, it cannot be sold!
Test logic usually represents about 10% of the total amount of logic in the design.
Good practice is to keep in mind the potential impact of DFT. When you design a module (digital or analog) you must make sure it will be testable. This is particularly the case with the analog part for which the test solution must be designed at the same time as the functional one.
A side effect of DFT
A manufacturing defect can make a device being functional but with degraded performances. You can optimize your gain by not trashing a die that is not reaching its expected performance.
This is the speed binning: when a device is not working at a given frequency, you test it at a slower period and see if the test passes. If so, you can sell it at a lower price. It was introduced massively by Intel with its first Pentium product family.
We’ve seen that testing the design exhaustively is mandatory to avoid financial penalties. But we need to understand what a manufacturing defect is, or rather know how to model them.
For digital logic (DFF, combinatory…) there are 3 main fault models:
- Transition fault
Memories (such as RAM, ROM…) has the same kind of fault for their digital part and the following kind for its memory array:
- Neighborhood coupling
For analog IPs, the fault model depends on the IPs itself and it can be generalized (even if stuck-at issue is still relevant whatever the case)
The stuck-at fault represents a VDD or a VSS shortcut. It makes the functionality of a logic gate to change in some situations.
Following is the “normal” truth table of this AND gate:
Let’s then consider there is a short cut on the B input of the gate. We call this short cut to VSS stuck-at 0. It means that whatever the value on B input is, the AND gate always see a 0 logic value. The truth table is now the following one:
In most of the case there is no change, by when A and B are 1, the output of the logic AND gate is no longer correct! This is a defect that can make our entire device to have functional issue.
The component must be then tested in order to catch that kind of situation. As illustrated above, a stuck-at 0 on input B might be detect applying a 1 logic level on both A and B a looking at the value of output Y. If the output is 0, there is a stuck-at 0 on B, if this is a 1, the logic gate is correct.
You may notify that this is not possible to make the difference between and stack-at 0 on A or a stuck-at 0 on B. This is true, but we do not mind. With A and B at 1, seeing a 0 on output Y means there is a defect on the gate, and the component need to be trashed.
What the methodology look likes then? This is always the same algorithm to be done.
- Put the opposite value of the stuck-at on the point you want to check (put a 1 for a stuck-at 0 and a 0 for a stuck-at 1)
- Propagate the required value until a point that can control from the outside (an input pin). It might require going through some gates.
- Propagate the result value of the stuck-at to an observable point (an output pin) making required gates transparent.
Following is an example:
We call test pattern the group of value to be set on controllable point and the value that need to be observed. In our example, the test pattern is:
A=1 / B=1 / C=0 / D=0 / output=0
The stuck-at fault model is a static model which means there is no impact of the frequency of the design. Even if that kind of issue still occurs on silicon, performances degradation must be considered. This is the purpose of the Transition Fault Testing that introduces two new fault models:
- Slow to rise
- Slow to fall
It aims to model a defect on the slew rate of a signal when a state switch. Ideally, the transition of a given signal is instantaneous, but in real life, this is never the case. Because of cross coupling effect, the slew rate of a given signal might be affected. It makes then a transition to need more than a clock cycle to go from an output of a given flipflop to the next one.
In such a situation, the only way to make the component to work is by reducing the clock frequency. But then, the expected frequency is not met.
Catching such a fault is a bit more complex than for the stuck-at approach. You have to create a transition on a given point (might be the output of a DFF, but also any single pin of a logic gate), and then catch the design response one cycle after. This is then an “at-speed” test that required working at the functional frequency to remain meaningful.
There are two steps to cover a transition fault:
- Launch phase: create an edge on the source point.
- Capture phase: get the design reaction one clock cycle after the edge creation, and check whether the value capture is the expected one.
At this stage you should be comfortable with the purpose of DFT and its mission. This is important to keep in mind that the main mission is to make money by catching manufacturing defect.
We’ve presented two fault models (stuck-at and transition fault) that refer to the digital logic. Testing such fault required the capability to control and to observe all internal point of a design. We’ll see next how the SCAN technics allows performing efficient test for both stuck-at and TFT.