.jpeg)
Last October, I wrote a blog on Functional Safety, which provided a high-level overview of this critical, complex topic. The blog received very strong interest, proving to be one of my most popular to date. Given the strong interest in FuSa (or ISO 26262), it seemed appropriate to do a deeper dive on it, with a specific focus on random fault coverage, which was very briefly covered in the previous blog.
To recap, the key points regarding FuSa are as follows:
Now that the reader is well versed on FuSa and ready to be a Safety Manager (a real term for an individual who is responsible for overseeing the safety efforts of the company, a requirement of being compliant to the specification), we are going to spend some time taking a more in-depth look at random fault coverage.
Random hardware failures occur unpredictably over the lifetime of a product – however, they tend to be probabilistic in nature. These errors are the basis for the term probabilistic metric for random hardware failures (PMHF), and occur for various reasons, which are independent of design and quality rigor. Typically, random failures occur at different rates over the lifetime of the product during three distinct periods.
As part of the safety analysis of a device, a thorough analysis of the potential failure modes, including those due to neutron strikes, are evaluated. Random failures are measured in failures in time (FIT). One FIT is equal to one in 1 billion operating hours, or 114,000 years. To say these specifications are stringent is perhaps an understatement, but these types of extremely low failure rates are important when considering that the electronics ultimately have control over the vehicle.
In addition to evaluating the PMHF of the device, there is also an analysis which is conducted that looks at how well a design can withstand a single-point fault, which is referred to as the single point fault metric (SPFM). This metric evaluates the effectiveness of the safety mechanisms to both detect and handle single-point / isolated faults. In other words, to understand if there is a case in which a single fault of a specific type can overwhelm the safety mechanism.
Lastly, the final key metric that is evaluated in the context of achieving a given ASIL is referred to as the latent fault metric (LFM). This is a metric that determines the effectiveness of a system’s safety mechanisms in detecting faults that may go undetected for extended periods of time. The required values for the various metrics by ASIL are shown in the table below.
Consistent with the points that were made earlier, increasing ASILs drives more stringent requirements.
And yet again, we have only scratched the surface on this topic. But it is probably easiest to get your arms wrapped around this topic by taking small, bite size pieces. There are many other topics to cover in this complex field, which is of extreme importance, as growing numbers of semiconductor devices with increasing complexities are taking greater control over the vehicle.
In upcoming blogs, we will look at the concept of decomposition, or how to achieve ASIL D random fault coverage at the system level, while employing devices that are only certified to support ASIL B random fault coverage.