ISO14971 - the effect of resources

Risk management is an extremely powerful tool in making decisions based purely on risk. Used properly, it can strip away preconceptions, vested interests and lay the a problem out in bare scientific, objective terms, allowing both decisions on the need for and effectiveness of risk control to made with reasonable confidence. The importance of risk management with respect to medical devices cannot be understated, due to wide, varied and highly device specific hazards which cannot be easily addressed by published technical standards.

But like all powerful tools, it takes a lot of energy (resources) to operate. And this is where the modern application of risk management to medical devices falls down. There is a lot of discussion about problems with schemes to estimate risk and various other complications associated with practical application of risk management. But there seems to be little discussion on the overall impact of the resources needed for effective risk management. Instead, there is often an implicit assumption that the available resources are unlimited.

Yet it may be that resource limitation is the root cause of all problems associated with practical application. Resource limitation has the effect of dumbing down the risk evaluation, and in the process introducing errors due to oversimplification. A non-clinical example, detailed below, shows how the limited information given in a typical risk management summary table (also called a traceability matrix) often does not stand up to scientific review.

But while our first response is that manufacturers must improve the quality of the analysis, for a manufacturer to survive, dumbing down is the only option. As the example shows, a single line in a table is replaced by around 1000 words (2 pages) just explaining the background behind the decisions of risk acceptability, and the analysis in the example also points to the need to for more research to prove the risk control is effective.

In the regulatory context risk management is often considered a top level requirement, forming an umbrella over all possible sources of risk over the lifetime of the medical device. One interpretation of this is that the risk management process must cover every possible sequence of events leading to harm. Yet if we take into account the incredible complexity associated with a medical device, not only the design but also the production process, transport, installation, normal use and forseeable misuse, there would an unlimited number of events and event combinations to consider, an amount far beyond any practical capacity we have to analyze.

It may be tempting to assume that without risk management, products (medical devices) would be unsafe. This would support that we need to place ever increasing resources into risk management. There are two reasons why this is not so.

As Sir Isaac Newton once said, "If I have seen a little further it is by standing on the shoulders of Giants."

Most modern medical devices, production processes, distribution and use have evolved over the last 50-100 years. This process of evolution has involved an enormous amount of trial and error, feedback and improvement. It is a reality that along the way, people have been hurt, killed, property damaged and the environment harmed. Some of these adverse events are in hindsight easily predictable, but most are not. The things we worry about rarely happen, while we get blindsided by things we didn't think about. Even when events are predictable, we often make wrong assumptions to justify that the risk is acceptable, and it is these assumptions that turn out to be wrong in the long term. But while we may often focus what went wrong, we can easily overlook those millions of technical decisions which turned out to be right in the long run.

So, we are now at a point where the majority of risk in a medical device is well below acceptable limits not due to risk management, but due to evolution.

It is certainly an uncomfortable point to realize that safety has largely come from trial and error, which exposed many people to harm along the way. But risk management purists need to realise that it is impossible to go back and document the reason for every technical decision involved in the lifetime of a medical device on the basis of risk management. A purist needs to consider that even something as simple as a screw holding a medical device together has a range of critical parameters which makes the screw work: length, diameter, pitch, material, production process and so on. In a final medical device, these parameters have been choosen based on what has been known to be effective in the past. While failure of the screw can lead to harm, it is clearly a detail that is beyond practical resources to analyse the risks associated with, for example, different pitches (angle of thread), thus arriving at the selected pitch through the risk management process.

It would seem reasonable focus resources in risk management towards new technology (features or applications), in a bid to prevent or limit more people being exposed to unnecessary risk while experience is gained on the new technology. 

The second point is to realise that the risk management process itself has an element of risk. The use of additional resources for risk management will result in quantifiable increases in the cost of a medical device and also delays to market. These in turn increase the risk to the patient, by impacting the availability of the device at the time when it is needed. Health care has a limited budget, equipment which is too expensive will not be purchased. A two year delay to market while the risk management gets completed means two years worth of patients that could not get the benefits of the devices. And, perhaps the most important point is that weight of risk management can act as a disincentive for innovation and improvement of medical devices in general, leaving us with only the current technology and a fear of trying anything new.

This is not to say that risk management has no value, only that we need to put reasonable limits on the amount of resources used in the risk management process.

How can we judge what is reasonable?

Although we often talk about balancing risk and benefit, this is somewhat of a misnomer. In reality we want the benefit to outweigh risk by a large factor. But, this itself gives us no practical means to judge how much risk is acceptable, there is no easy way to decide what "factor" is reasonable. To get around this, we could consider a new concept of "net benefit" which takes into account not only risk and benefits, but also the costs associated with the risk management process including both analysis and risk controls. There are four key parameters involved in net benefit: 

  • actual benefit of the device (treatment, diagnosis, etc)
  • residual risk, after risk controls are applied
  • costs of risk controls (design, production costs), and
  • costs of the risk management process (personnel and other costs associated risk analysis, including research, experimentation to support risk estimation, decisions on risk control and risk control effectiveness)


In theory, it is possible to estimate these parameters using a common unit, such as using money. If so, we could combine these into an equation:

Net benefit = benefit - risks - costs of risk control - costs of analysis

If we did, we could play with the various parameters to find the point of maximum net benefit. These factors are not independent, for example, reducing residual risk can increase costs; in some cases reducing risk can also reduce the actual benefit or add in other risks. And, the costs of analysis may swamp any potential net benefit.

Of course, in the real world it can be difficult to apply monetary estimates for these parameters and even this process would have costs associated. The key point here is to realize that there are limits to the amount of resources we apply to risk management. Just like the evolution of a medical device, we need to apply some initial guess as to what is appropriate, and then use feedback based on experience.

Allocation of resources is already specified in ISO 14971, under the responsibility of top management. However the standard does not give any hints on how to decide what amount of resources might be appropriate. It also fails to highlight the different options in handling risks which may be applied with differing amounts of resources. In general these can be considered as:

  • undocumented risk management, where risk is controlled though common sense technical decisions based on qualified, experienced personnel;
  • simplistic risk management, which involves largely making lists of the actions taken (or not), with management controls to ensure the action is taken, and the decision for actions based on simple risk estimates without justification;
  • analytical risk management, where decisions are based on true risk estimates, evaluated scientifically and objectively, with records are retained to support the risk estimation

A careful look at ISO 14971 indicates that it is based on simplistic risk management: the standard requires that decisions are based on estimates of risk, but the reasons (evidence) behind the risk estimations are not required to be recorded. So, improvement in the standard could be first to lay a foundation on which resource decisions can be made, and also recognises the three different methods shown above. In reality, undocumented risk management is widely used; no risk management summary table can really cover all events that may lead to harm, so this perhaps would lead to little practical change, but at least would provide users of the standard with some justification of what to include and exclude from a table. The most serious omission is that analytical risk management, with detailed (costly) analysis, is rarely performed. The option should be raised, with responsibility placed on top management to decide when to use it, with guidance indicating that use should be considered in cases where net benefit could be improved.


Dumbing down: example of unscientific risk evaluation

A relatively simple non-clinical example best illustrates the effect of oversimplification in modern risk management: consider a manufacturer of a medical device such as an diagnostic ECG that uses a thermal printer. When a user needs to replace the paper, they might touch the printer’s thermal head, exposing themselves to a burn related hazard. The risk summary table (traceability matrix) might typically include a line such as:

This table approach is widely used and accepted in medical device regulation. But a close analysis finds it to be missing critical information and having questionable decisions.

The first problem here is that the hazard itself has yet to be characterized - yes it’s a burn, but what kind of burn? To know this, we need to know characteristics of the hazard, which are usually not shown in the traceability matrix format. In this case, critical characteristics are the temperature of the thermal head, and also the kind of material (thermal conductivity) since both of these determine the extent of a burn. The traceability matrix does not contain this information, yet it is critical to knowing the severity, and also plays an important role in knowing the effectiveness of any risk controls.

Let’s assume when questioned, the manufacturer says they think the head can get up to 100°C, and the material is metal: thus we can conclude that a moderate burn is possible. Based on published standards, temperatures above 85°C are increasingly dangerous as our touch reflex can no longer be relied on to limit the burn, and if the material is metal the thermal energy can be quickly transferred, increasing the severity of the burn. Overall, we are most likely talking about the potential for a 1st degree burn.

The second problem is the missing details of the sequence of events leading to a burn, information we need to estimate the overall probability of harm, which in turn directly impacts the estimated risk. Again this detail is missing from the table. In this case a key piece of information is how often must the user change the paper. The manufacturer replies approximately once per day; thus we can conclude that the user is exposed to the possibility of touching the heater head at least 250 times per year.

Although the user is unlikely to touch the heater head every time; it is almost certain that at some point over the course of the year the user will touch the head. For this kind of event, it is reasonable to consider units for probability of “events / device / year”; with these units, the probability of the user contacting the heater head over one year is sufficiently close to 1 that it can be considered to be 1, or should be the highest value on a qualitative scale for probability .

A 1st degree burn is not a serious event, but with such a high probability an ALARP risk classification (as low as reasonably practical) seems odd. Even just thinking about the cost of business in dealing with a customer complaints about a burn, it makes little sense to treat this lightly. Thus, not only are there details missing, something also seems to be wrong with the analysis. It is unclear if the problem is the probability scheme, severity scheme or risk acceptance table, but it needs to be investigated why the result ended up in the ALARP range.

The countermeasure is also a warning label, which should only be considered if other the options of inherent protection or protection systems have been explored and found to be impractical. In the table there is again no evidence why these options were not practical. When questioned, the manufacturer explains that it’s difficult to design the heater head to avoid the need to contact, but this does not seem to match with reason: there are several possible technical solutions that spring to mind which would either reduce the temperature (inherent) or limit access to it (protection).

For residual risk, again, the traceability matrix approach often lacks any information on why the residual risk was deemed to be acceptable. Typically it is expected that with a risk control in place, the probability of harmful events will reduce by a significant amount. The reduction in probability (and hence risk) can be estimated based on the effectiveness of a risk control. For a risk control to be significant it must usually have an effectiveness of at least 90%, that is, enough to move the risk one step in a logarithmic based risk acceptability scheme. But it is difficult to see how a label can claim to be 90% effective, a realistic expectation would be around 50% effective. Given the user is exposed 250 times per year, it remains likely that a burn will occur at least once a year even with the label in place.

At this point the manufacturer admits the printer unit is actually bought from another manufacturer, and it is widely used in other medical devices. Without saying so directly, the manufacturer is trying to invoke “state of the art” as a justification. Even if a “state of the art” argument is acceptable (which is questionable), again, this needs to be actually written down, and the residual risk acknowledged.

Then the manufacturer says that in fact the heater head is not always hot; it is only if it is run continuously that it gets to the higher temperatures. This is good news, but we need details: how long does it take to get to hot condition? How often would it be expected to be used this way? What is the overall probability that the head is over 85°C when the paper is changed?

Finally, manufacturer believes the printer manufacturer has a thermistor in the head, which is monitored by the software and used to limit the internal temperature to 100°C, with the intention of limiting long term damage to the head. They also believe that this control might keep the temperature of accessible parts below 85°C. If this is the case, then the label is not the true risk control, rather it is the temperature limiting circuit. So this is an even better outcome: we have a protection system which limits the risk. But before we can accept this, we need specifications, verification tests and production controls ensuring that the accessible parts of products in the market do indeed stay below 85°C, and consideration of fault conditions.

In summary, it turns out that the actual risk control is not the label, but the temperature limiting circuit. The manufacturer needs to rework the table information, and establish specification and verification in terms of the temperature of the part that can be contacted, and production controls if considered necessary. Faults in this temperature limiting circuit should be investigated.