Follow @kepco_careers
Advanced Search
Kepco Power Solutions
718-461-7000

FAULT TOLERANT POWER SYSTEMS

by: Paul O'Boyle, Senior Design Engineer, Kepco, Inc.

Recognizing the Need. The growing dependence of our society upon electronic data processing has created a need for continuously operational systems. The original applications for this level of performance (financial institutions, air and rail transportation, telephone systems, etc.) have been joined by requirements from almost every area of business. Local area networks, process controls, daily business transaction recording and many other systems have the potential to suffer substantial loss if system operation is disrupted for any reason. A burn-in process for microcomputer ICs which fails could cost many thousands in lost revenue. Down time on a computer which provides credit card authorizations could translate to a substantial loss in sales.

Fault Tolerance as a Solution. The solution for these and the many other applications that need dependable operation is to design fault tolerance into as many portions of the system as possible. Here we will discuss the implementation of fault tolerance in the power conversion portion of such systems. Fault tolerance has existed in software and in digital hardware for a number of years, but there has been a growing realization over the past decade that the reliability of any electronic system is dictated first by the reliability of the power conversion system. All modern electronic systems require some form of power conversion to convert the source power delivered by the a-c mains to isolated, regulated and conditioned low-voltage d-c power. Any interruption of this power conversion process can result in immediate and unexpected loss of system operation. The need to minimize this potential failure mode has resulted in the development of fault-tolerant power systems.

Common Traits. A fault-tolerant power system is one comprised of multiple power converters configured so as to maintain the integrity of the output power bus in the event of any one single-point failure within the power conversion system. Fault-tolerant power systems can be designed in a variety of configurations, but all share the following common traits:

  • Sufficient capacity to sustain power bus operation in the event of any single-point power system fault (redundancy);
  • Ability to isolate and localize a failure to a single replaceable module (fault isolation and detection);
  • Design which permits extraction of the faulty module and insertion of a replacement without interruption of the power bus (on-line replacement, or "hot-swap").
A review of these traits is provided along with some insight as to the tradeoffs which system designers must weigh in matching the appropriate power conversion system to the application.

Much attention has been given by power supply designers in recent years to redundancy issues. The simplest implementation of redundancy is to operate multiple power converters in output-parallel configuration with sufficient load current capacity (ampacity) to support the load in the event of the loss of one or more converter modules (see Figure 1). The power converters must provide a constant-current or "rectangular" type of overload characteristic, since load current delivery is based on the effective output voltage setting of each power converter in descending priority and thus only one power converter is actually operating in voltage-stabilized mode. The chief disadvantages of this technique are unequal operating stress which leads to increased power converter failure rates, and degraded transient recovery in the event of a power module failure.

Load Sharing. Nearly all new power supply designs incorporate special circuitry to provide forced load sharing, either passive or active (see Figure 2), and current-stabilized overload protection to optimize their use in parallel-redundant power system applications. Load (or current) sharing permits an even distribution of the system load current among multiple output-paralleled power converters. This results in lower operating temperatures and reduced failure rates as well as improved response time. The cost and complexity of the actual circuitry is low; some manufacturers now provide dedicated control circuits which allow implementation of load sharing with existing power converters using external circuitry. An example of this is the Kepco FCS board assemblies, which will add forced load sharing to any power supply equipped with remote error sensing.

Source Power Loss. Decisions regarding redundancy are not limited to power conversion issues. True fault-tolerant power systems should address the possible loss of source power as well as loss of power conversion. Indeed, many fault-tolerant power systems require separately generated and protected power sources for each of the multiple power converters used to generate the d-c power bus. Others use either on-line or off-line uninterruptible power sources (UPS) with battery- or generator-backup in the event of primary power loss. Still others, most notably telecommunication (telcom) systems, use a distributed power architecture consisting of a combination of all of the above applied to both source and load circuits.

Life Cycle Costs. The burden of these additional protective functions (battery chargers, maintenance, wiring, etc.) adds significant life-cycle cost which the system designer must consider against the actual performance required when determining which protection to specify. For instance, use of on-line UPS for source power redundancy involves inrush start-up current of the power converters, while specification of off-line UPS requires specifying the correct relationship between output ride-through time and UPS transfer time to preserve power bus integrity. Batteries create their own overhead burdens in the form of maintenance, charging requirements and environmental considerations.

"Hot Swap" Considerations. The recent proliferation of "hot-swap" power systems indicates a growing need for continuous system operational readiness. This requirement embodies several related functions involving human engineering as well as power bus performance. Issues such as module form factor and weight, connector insertion/extraction force and module retention mechanisms are typical intangibles which enter into the equation. The use of self-aligning (blind-mate) connectors with integral or separate mechanical keying mechanisms to prevent insertion of an incorrect replacement module is an additional factor. The ultimate goal is to enable transparent replacement of a faulty module, that is, with minimal disturbance to the power bus. The most common standard is to limit the bus transients induced by the replacement actions to the amplitude and recovery times normally associated with step-load response, although the specific system requirements will dictate to the system engineer the allowable disturbance levels.

Fault Detection/Isolation. The most critical functions of any fault tolerant power system are fault detection and fault isolation. Fault detection is the ability to accurately and consistently identify and localize a failure to a specific replaceable module, while fault isolation, as the words imply, isolates the system from any adverse effects of the failure. These functions are the basic elements of any fault-tolerant power system. Properly designed, they will maintain power bus integrity; without them, no amount of redundancy or replaceability will salvage system operation.

The fault detector function involves a complex interrelationship between the power converter modules and the power system itself. By nature, a properly designed fault-tolerant power system will endure the failure of one or more elements with no apparent effect on the power bus, yet the power system must be capable of detecting and localizing this failure to a single replaceable subassembly without the benefit of direct observation of the output. This requires simultaneous measurement of multiple parameters, both internal and external, and interpretation of their combined values to determine if the power converter is operating properly. The following examples are offered to illustrate the difficulties involved.

Consider the method for detecting an output-low failure. For a simple, non-redundant power system employing a single power converter, the fault detector need only monitor output voltage (or current, in current-stabilized applications) to determine if the output is operating within specification. Any power bus fault must be the result of failure of the one and only power converter in the system. In the case of the simplest redundant power system, that of two output-paralleled power converters, the task becomes much more complex. Assuming that N+1 redundancy is provided, if the output of one of the two power supplies fails low, the second power supply will continue to support the load, hence no output fault is present. One of the power supplies has failed, however, and the fault detector design must be capable of determining which of the two power converters is defective so that the power system can be serviced. The problem intensifies when three or more power converters comprise the power system.

Several methods can be implemented to address this problem, each with its own disadvantages. The most direct method is to insert a diode in series with each output between the power converter output and the power bus itself, and to monitor the output of the power converter itself: in the event of a low-output fault, the diode blocks the power bus voltage from forcing the power converter output high, and the fault detector of the defective converter can now measure and report the output failure.

There are problems with this approach, however. The series diode introduces a significant power dissipation penalty, since all of the load current drawn by the power bus flows through these diodes. They are therefore normally quite large and expensive and in most applications require some amount of heat sinking. These diodes are essential for on-line replacement applications. If the system does not demand on-line replacement of power modules, only redundancy and fault indication, then this is a tremendous efficiency burden to carry. Furthermore, if the diode failure mode is to be shorted, as is most common in these applications, the fault detector now monitors the power bus directly and can no longer detect low-output failures. Additional circuitry can be employed to monitor the voltage drop across the diode in order to detect a shorted device. However, the circuitry must be capable of distinguishing voltage drops of the same order of magnitude as the output ripple voltage, and in applications where power bus load current varies significantly this technique can be very inconsistent.

Similar problems exist for output-high (overvoltage) failures of the power converters. Consider again the case of two power converters operating in N+1 redundancy, now with output blocking diodes installed. If one converter fails output-high, the second converter senses an overvoltage condition and stops delivering output power thereby avoiding the pitfall of having all of the output-paralleled power converter follow the defective module into overvoltage (often termed "selective overvoltage"). The problem is that both power converters now show an output fault. If the output-high fault generates a power converter shutdown, the second converter will recover and the fault signal will be valid. If overvoltage shutdown is not achieved, however, either by design omission or failure characteristic, the system operator will not be able to determine which power converter has failed.

A better way is for the fault detector to monitor both the power bus voltage and the current delivered by each power converter, and to determine whether or not each power converter is operating properly based on logical analysis of these two readings. Fault indications are then only issued for conditions which indicate abnormal module operating conditions, significantly improving the accuracy of the detector circuit and negating the need for the blocking diode except in "hot-swap" applications. This method is not entirely foolproof, since it cannot detect shorted blocking diodes nor does it eliminate their need in on-line replaceable power systems; yet it is the most complete and accurate method presently available to determine operating status of output-parallel power converters while on-line, and represents only a modest increase in circuit complexity.

The issues affecting fault-tolerant power system design and selection are often a result of the basic performance required by the power bus. An example of this is power bus overload/short circuit protection. In conventional single-converter power systems, the maximum overload current delivered to the power bus in the event of a load failure is determined by the power rating of the converter and/or adjustment of the maximum current limit value. The use of high-redundancy power systems (N+2, N+3, etc.) creates special handling problems, especially in telecommunications applications where the power converter must operate in both voltage- and current-stabilized output regulation modes. The concept of excess capacity becomes a dangerous problem if the power bus is shorted and all of the power converters now deliver their maximum output current through the system's load wiring. Significant thermal damage and even insulation fires are possible in this event unless the system engineer recognizes the danger.

Solutions include distributed load protection devices (fuses, circuit breakers, thermistors, etc.) and sizing of load wiring based on maximum possible current delivery of the power system. Many power converter designs include either fixed or optional timeout circuits as part of the overcurrent protection circuitry which shuts down the power converter after a time period of 10-30 seconds on the assumption that long-term overloads represent major load problems and that system has already been compromised. This is not a viable option for power converters supporting battery-based power buses such as are used in many telecommunication applications, as long-term current-stabilized operation is a normal operating condition.

Available Products. Modern power converter designs incorporating many of the features discussed above are available from several manufacturers, among them HC Power (HC1010 Series), Lambda/Qualidyne (MPS Series) and Kepco, Inc. (HSP Series). All represent products specifically designed for fault-tolerant power systems used in the international marketplace. They include such features as wide-range (universal) input with power factor correction, internally-mounted output isolation diodes, forced load sharing circuitry, blind-mate connectors and fault detector circuitry with both visual and electrical indicators. The Kepco HSP Series logical fault detector with selective overvoltage shutdown provides accurate fault detection and fault isolation both with and without the optional isolation diode. The current limiting circuit includes a switch-selectable 20-second lockout timer to provide the user the option of continuous current-stabilized operation for (as for battery charger applications) or delayed shutdown for load/wiring protection if required. Another switch-selectable function activates a "current walk-in" circuit to provide slow output current rise rates, a requirement of Bellcore-type telcom battery rectifiers.

Other features of the Kepco HSP Series power converters include remote analog programming of both output voltage and current limit regulation levels, Bellcore-style signal outputs (isolated Form-C relay contacts) and an isolated remote inhibit circuit which includes TTL-level inputs for both positive and negative logic. The inhibit circuit operates from a separate 5V supply which is available to the user for loads up to 100 mA. Kepco rack adapters are available for both plug-in and fixed applications. The Kepco also makes a version of these power converters called HSM. The HSM power converters incorporate many of the features of the HSP except that they are designed for fixed installation rather than pluggable applications The elimination of visual indicators, extraction handle and retention mechanisms results in a size reduction, specifically in module length. The modules are about 15% shorter.

Future Considerations. As the world's dependence on electronic data and control increases, fault-tolerant power system applications will continue to expand both in size and performance requirements. Just as with distributed power architectures, the change will be evolutionary rather than revolutionary. Future improvements should include greater intelligence in the area of fault isolation, with feedback from all areas applied to a central controller which will have authority to reconfigure the overall system "on the fly". Multiple redundant power converters will be held in stand-by mode, to be brought on-line or off-line as power bus load conditions warrant. Bus voltage should be controllable to optimize the operation of the system based on external influences (temperature, source power conditions, etc.).

BIOGRAPHICAL INFORMATION

Electrical Engineer specializing in new product development
Polytechnic Institute of Brooklyn, BSEE program
Design experience including both military and industrial power converters
Presently Engineering Group Leader for Switchmode Power Supply Development

Volume 7, No. 1.
TOUR
BACK NEXT
PREVIOUS TOP NEXT

ProductsSupportLiteratureContact UsCareersAbout

KEPCO, INC. • 131-38 SANFORD AVENUE • FLUSHING, NY. 11355 U.S.A.
TEL (718) 461-7000 • FAX (718) 767-1102
www.kepcopower.com • email: hq@kepcopower.com