Symposium Paper

SUCCESSFUL TEAM PROGRAMMING

OF A MULTIPROCESSOR REAL TIME PLANT PROTECTION SYSTEM

BY
L. R. ERICKSON

NUCLEAR FUELS BUSINESS UNIT
WESTINGHOUSE IDAHO NUCLEAR CO., INC.
BUILDING CPP-656
P.O. BOX 4000
IDAHO FALLS, ID 83403
(208) 526-3568 FTS 583-3568

WESTINGHOUSE ENGINEERING COMPUTER APPLICATIONS SYMPOSIUM
HOWARD JOHNSON'S HOTEL
MONROEVILLE, PENNSYLVANIA
NOVEMBER 1987

INTRODUCTION

This paper describes the team programming of the Plant Protection System (PPS) of the Fluorinel Dissolution Process (FDP) facility at the Idaho National Engineering Laboratory (INEL), Idaho Chemical Processing Plant (ICPP). Initial testing of the PPS as delivered showed the programming to be inadequate for operation of the plant. A decision was made to correct the programming, and highly qualified and experienced people were hired to accomplish the reprogramming. The outcome of the project is demonstrated by the successful operation of the system. Safety and operability were raised to the level required for startup of the plant in September 1986. The experiences gained in completing the project showed that people were the most important factor in the success of the project. Program design with user input, independent testing, the use of a consultant, good hardware and software tools, and management support were also important factors in successful completion of the reprogramming.

Process Description

The ICPP is Government owned by the Department of Energy (DOE), and is operated under contract by Westinghouse Idaho Nuclear Co., Inc. (WINCO). The ICPP's mission is to recover the unburned uranium from the spent fuel rods removed from Government reactors. The recovered uranium is recycled into new fuel for the reactors. In preparation for the separation and purification process the fuel rods must first be dissolved in an aqueous solution of nitric and hydrofluoric acids with various other reageants added to provide the necessary chemical properties. Due to increased demand and antiquation of the existing facilities (originally built as a pilot plant in the 1950's), a new dissolution plant was required. The Fluorinel Dissolution Process facility has been designed and constructed to handle anticipated demands into the next century.

Plant Protection System Description

The prevention of the accumulation of a critical mass of fissile material is of paramount importance in the dissolution and processing of the nuclear fuel. A computer based system was specified and purchased to insure safe operation of the FDP. A safety system of this type at the INEL is formally designated as a Plant Protection System (PPS), and is defined as:

"An integrated arrangement of active devices, including sensors, signal conditioners, logic elements, and actuators, that function in conjunction with passive plant structures (and in some instances, operator actions) (a) to prevent unacceptable release or spread of radioactive materials, (b) to mitigate the consequences of exceeding a safety limit, or (c) to prevent plant damage beyond the limits specified in the design basis."

The PPS performs protective functions in the process to prevent exceeding the the uranium mass accumulation safety limit, and to assure that reageant nuclear poison (cadmium and boron) levels are maintained within proper limits. The protective action is a trip. A trip is the opening of a contact output which controls a process device to stop any process activity which has been defined as unsafe.

The PPS is divided into two redundant subsystems, Divisions A and B, both of which are connected to a single common subsystem as shown in Figure 1. Each division contains two Multiplexers (MUX's) and a Logic Processor (LP). The two MUX's are connected to the LP, and along with the LP provide the interface to the process. Both of the division LP's are connected to the Communication Processor (the single common subsystem). The Communication Processor (CP) provides the operator interface to the PPS and sends data to the Data Processing System Enhanced (DPSE) main FDP process control computer.

Identical process information from separate redundant sources is applied to each division. The MUX's are located in the process area, and obtain most of the process information and provide the contact output trips. The LP's are located in the main control room and obtain the vessel levels and poison monitor concentration values. The LP's provide decision-making trip logic and process calculations. All operator interaction with the PPS is accomplished through the CP via four CRT terminals connected to the CP, with each terminal located in the area appropriate to the actions to be taken on that terminal. The CP also contains a line printer for alarm, status, and advisory messages, a color CRT for alarm and advisory messages, and contact outputs to drive process status indicators.

The PPS was specified and supplied as a complete turn-key system including all system and application software. As supplied by the subcontractor, all units are Multibus based using Intel board level components. The boards selected use the Intel 808x microprocessors, with 8088's in the MUX's and 8086's in the LP's and the CP. The Intel iRMX86 real time operating system was supplied, with application software written in PLM86. All software is in EPROM. The PPS has no mass storage and no on-line programming change capability.

THE PROBLEM

Upon installation of the PPS equipment, assistance was obtained from EG&G Idaho, Inc., the main INEL contractor, in testing the PPS prior to preliminary FDP startup. Many software faults and inadequacies were discovered, but were resolved or corrected well enough by the subcontractor that the system could be accepted and "cold" startup of the plant using simulated and unirradiated fuel rods to test the chemical and mechanical functions of the process was allowed by DOE. A maintenance programmer was hired by WINCO to support the PPS during the cold operations. The cold runs resulted in major changes in the process and equipment, and revealed that the PPS software was unreliable and interfered with operation and safety. The experiences of the maintenance programmer also showed that the application software was all but totally unmaintainable.

PERSONNEL SELECTION AND PROJECT PREPARATION

The decision was made to assemble a WINCO team to make the changes required and correct the PPS deficiencies prior to "hot" (that is, dissolution of irradiated fuel) startup of the FDP. Bringing the plant to full operation was very high on the Department of Energy's priorities, and a team would have to be assembled, trained, and complete the task in 12 months. The assembled team consisted of seven people, a supervisor, a lead engineer/programmer/hardware designer (the author), four programmers, and a verification and validation test expert. The supervisor had been responsible for the PPS Specification and technical contract administration, and the test expert, an employee of EG&G Idaho, had performed the testing of the installed PPS. These were the only people on the team with prior knowlege of the system. With hot startup scheduled for September 1986, WINCO hired one programmer in August 1985, the author in September, another programmer in October, and brought two programmers from EG&G Idaho into the project during October. A consulting contract for on-site help was negotiated with a firm specializing in iRMX86 and PLM86.

Experience

Because of the tight schedule, all personnel were required to have extensive direct experience in real time control computer systems. The test expert, as mentioned previously, had experience on the PPS system itself. In addition, she had experience with Ford Motor Company testing microprocessor based ignition and other automotive control systems. Her educational background includes a BS in math and an MS in computer science. She also teaches evening undergraduate programming classes for the University of Idaho. The first hired WINCO programmer had worked on computerized instrumentation and data acquisition for testing conducted at the Power Burst Facility reactor at the INEL. Previously he had worked for NCR in hardware and software integration and analysis. He has a BSEE and has completed course work for an MSEE. The second WINCO programmer has BS and MS degrees in Math. His twenty-two years of programming experience includes scientific programming, microprocessors, communication, and operating systems. Most recently prior to joining WINCO, he had worked optimizing assembly language imbedded firmware for Diablo printers. One of the EG&G programmers has BSME and MSEE degrees. His experience includes extensive work in computerized data acquisition and control hardware and software for nuclear reactors. The other EG&G programmer was pulled from his primary responsibility of networking the INEL computers and workstations to work on the PPS. While not holding a recognized degree as such, his work experience and Navy training is equivalent to a BS in engineering. The author has a BSME and is a Registered Professional Engineer. His experience includes eleven years of hardware design and programming of microcomputer and microcontroller based data acquisition and control systems and dedicated microcontrollers.

Training

The first hired programmer was sent to an Intel PLM86 programming course for two weeks in September. He, the two EG&G programmers, and the author attended a two week iRMX86 operating system class during October. The other WINCO programmer attended the operating system class in December. The supervisor had previously attended an on site iRMX86 and PLM86 class given by Intel. The test expert had learned PLM86 and iRMX86 while testing the system.

Development Tools

An Intel System 310 computer had been purchased by WINCO as a development system before the new team members were hired. Full licensing for our use of PLM86 and iRMX86 were also already at hand. Additional development tools were specified and acquired by the first WINCO programmer and the author during the period preceding attendance of the operating system class. The additional development tools included power supplies, card cages, and racks to simulate the PPS for software testing using boards out of spares. A Compaq portable computer and an accessory serial data tap board with software were purchased for debug and testing. Multibus battery backed low power CMOS memory boards were obtained to allow downloading of programs to memory on the simulation system, after which the boards could be write protected and physically carried out to the plant and installed in the PPS for testing without having to burn EPROMS every time a change was made during debug and test. EPROMS were not burned until final test and startup. Low cost in circuit emulators for 8085 and 8088 microprocessors were puchased for hardware testing and software debugging.

A solid foundation had been built for successful reprogramming of the PPS.

PROJECT DEFINITION AND DESIGN

During the period up to January of 1986, the team programmers became familiar with the development tools, and configured them into a workable system. The supervisor and the author defined the system operational requirements and developed a method of meeting the requirements. The supervisor developed and implemented a Software Change Request (SCR) procedure. This involved filling out an SCR form describing the implementation of a software requirement, after which it was distributed to cognizant user personnel for their review. After the review period, a meeting was held on the SCR to resolve and/or incorporate the reviewers' comments. The SCR was then reissued including the changes for approval. Using this procedure resulted in a maximum of definition and a minimum of last minute major changes in the programming. The SCR's were prepared using simple top level logic flow diagrams to show the operating logic to the reviewers. In this manner the SCR procedure actually was also a top down software design tool that combined user and programmer input to produce both definition and design outputs. This technique was invaluable in meeting the goals imposed upon the team.

The problems in the original PPS programming evidently were a result of not doing preliminary top level design work. It appeared in studying the existing programming and its operation that the capabilities of the hardware and software had not been considered during development. A database manager had been built from scratch in spite of the fact that iRMX86 has built-in data management utilities. The database itself had been defined in a complicated classical file type data processing manner rather than simply using structures. Operating system overhead for context switching and interrupt handling (especially communications interrupts) had not been adequately allowed for, with resulting poor system response and unreliable communications. In addition, from talking with persons familiar with the PPS acquisition, user input came as a response to the programming, not as input to the program design. After thorough review, it was decided that modifying the existing programming was not practicable, and a fresh start was required.

The iRMX operating system features true multi-tasking as well as real-time response. Included in the multi-tasking are data security, inter-task communication, task syncronization, and dynamic memory allocation. These capabilities allowed splitting the project into relatively self contained sections (tasks) that could be assigned to an individual with a large degree of isolation between other tasks. This approach does necessitate, however, that what areas of interaction do occur must be defined as completely as possible before programming begins. In the case at hand, this meant that the interprocessor and intertask communications protocols and the common data had to be defined and structured. These definitions are what the author refers to as "glue" - that is, they hold the whole program together.

With the top level design and glue at hand, a full day team meeting held remote from the normal workplace to minimize interruptions and distractions. During this meeting, the overall project scope and schedule, the glue, and task breakdown and functions were discussed. After everything that was known at that point had been communicated to the programmers, individual tasks and schedules were assigned.

Following the kick off meeting, actual programming of the system commenced. Since the Multiplexers were trivial in comparison to the other units, they were programmed first. Emphasis was placed on keeping the programming as simple as feasible.

WORKING AS A TEAM

PLM86 has the necessary features to place public declarations and literals (equivalent to C defines or assembly equates) in include files so that data, labels, constants, and formats can be used in common by all programmers. Libraries are supported as well. People were assigned to maintain the common files on the basis of which file they were mostly directly involved with. Any changes to a common file were to be made only by the person assigned, and all references to common data and procedures were to be made by including the appropriate declaration or literal file. Local variables, labels, and programming style were left to the discretion of the programmer (although it was stressed that goto's were undesirable and unecessary). It became apparent as the Multiplexer modules were tested on the simulation system that one programmer was unwilling to conform to even this minimal degree of interference with his programming. It was also discovered that over complication (in both function and form) was a problem with him and another programmer. Fortunately, the other two programmers were superlative performers from both individual and team standpoints.

As work progressed on to the Logic Processor programming, the common include files became even more important in keeping the task modules coordinated. The initial design was not detailed down to every exact public variable and constant name and type. The author feels that in order to design to that low a level, the logic must be detailed to the point that one might as well write the code. In addition to the inter-task varibles, critical variables that might normally be kept locally had to be kept public where they could be located in non-volatile battery back RAM with a CRC-16 checksum maintained. The checksum of public variables is used to see if the data is still valid after a system shutdown. If the data is intact, a warm start is allowed to bring the system up in the same state as when it shut down. If the checksum test fails, the system must be initialized, and this has an adverse effect on plant operation. The common include files, in other words, were critical in maintaining good system design and proper system performance. Especially important was correct use of structures and using variables by the name from the structure. The two superior programmers worked with the others in keeping the common files maintained and coordinated.

As the LP programming neared completion, work was started concurrently on the Communications Processor. Once again, a top level design was done, and task assignment made to correspond where possible to the fit between the LP tasks and the CP tasks. That is, a person would program the CP task that interacted with the task he had programmed on the LP. The common include files were once again used to tie the individual modules together.

Implementation was approached from a bottom up method. Procedures and tasks were tested and debugged on the simulation hardware as they were completed. As the lower levels are built, modifications to the top level (especially the glue) are required. This "outside-in" iteration process mandates a certain level of communication and good programming practices for successful completion of a team project. While programming is by necessity, a solo endeavor, a good programmer uses the practices needed in team programming. That is, good programming includes small independent procedures, using literals for constants, sound data structuring, and hiding or elimination of information not specifically public. Any experienced programmer who has first, had to debug his or her code, and second, go back and maintain his or her (or someone elses) code after six months, should automatically write modularized programs. If these practices are followed, using include files is second nature, and interface between tasks is almost trivial. By this definition, a good programmer is a good team programmer.

Another characteristic needed for the real time programmer is the ability to recognize and allow for system capabilities and limitations. The original programming limited all two-way communications to a maximum speed of 1200 baud because of (a) a system interrupt with overhead on each received or transmitted character, (b) processing each character within the interrupt service routine, and (c) using a complicated communications protocol. System performance was further degraded by the protocol effectively limiting communications to less than simplex performance, and by implementing too many messages carrying too much unneeded and/or redundant information. A real time computer person must realize when operating systems and stock computers are not adequate and make modifications when necessary. In this case, intelligent serial communications boards were purchased, and custom firmware written and installed to offload the sending and receiving of complete messages (including CRC generation and checking) from the main CPU and operating system. The system firmware and software was also not adequate for the terminals used, and was re-written. By rewriting and installing smart communication boards, system communications were speeded up to 9600 baud full duplex with no communications dropouts. Message format was also simplified, and the overall result was fast and reliable operation.

Along the same lines, one of the project programmers wrote a task to update the alarm display once per second. Since the new firmware allowed a buffered message length greater than 4000 characters without increased system overhead, he elected to rewrite the screen on every update. As an experienced real time programmer he should have immediately realized that you can't update once per second when your update scheme needs more than four seconds to write the data at 9600 baud.

To reiterate and emphasize, good team programmers use what are currently accepted as good programming practices, and good real time programmers work with an awareness of the system's capabilities.

SYSTEM TEST AND DEBUGGING

After the LP programming was completed, and the tasks and modules had been tested and debugged as far as possible on the simulator hardware, the programming was downloaded onto the CMOS memory cards and installed in the PPS in the plant. The test expert had completed her test plan by this time, and was called in to observe, advise, and help at this time. The operating software was installed and running in a relatively short period of time thanks to the testing that had been accomplished on the simulation hardware. System testing under the direction of the test person commenced in June. The testing revealed many serious flaws in the programming, many of which were of the type that had to be corrected immediately before continuing further tests. By working twelve hour days six days per week, the system was more or less brought to operating status in time for plant cold testing in July. Twenty-four hour coverage was necessary to support the cold testing, and a twelve hour on, twelve hour off, four day on, four day off schedule with single person coverage was instituted. Operation and concurrent testing during this time showed up even more problems, but the system was kept limping along. The following items became apparent:

1. Exhaustion brought on by long hours and irregular schedule was adversely affecting productivity and quality.

2. Teamwork was almost impossible to maintain with everyone working different hours.

3. Two of the programmers' code was not of adequate quality to be run in the PPS.

A list of deficiencies thus far discovered was compiled, and the above observations were brought to management's attention. In conference with the project manager and the Vice President in charge of the project, a case was made to halt testing, get back on a regular schedule, and fix the programming. This approach was accepted. One of the programmers of marginal code had been previously transferred back to his original job by this time, and the other was now transferred to another organization with needs more in keeping with his interests. The author, the remaining two programmers, and the tester came back together on days and started rewriting major sections of the program. The system was rewritten, installed, tested, and burned into EPROM by the first week of September.

Independent Testing

Ideally the validation and verification test should be done completely independent of the programming. Due to the pressure of time, the test expert allowed this principle to be violated, and provided advice and constructive criticism in the initial design of the programming and during the correction of deficiencies her tests uncovered. This help proved critical in getting the job done, but more importantly she did not allow any "that's good enough" in areas that might later be questioned. She also concurred in the judgement to stop and redo the code, and helped provide input to management to make this critical decision.

The objectivity required to find mistakes and not accept borderline situations cannot be expected of the person who writes the program. Although a good programmer will look for every way possible to fail his or her programming, and fix bugs before continuing on to a new task, a fresh outlook will find problems that the programmer has been too close to see (the forest for the trees syndrome). An independent tester is not so closely involved with the actual logic, and will find things unanticipated by the programmer.

If you want your programming to work, if the programming must work, use a test expert.

USING THE CONSULTANT

During the programming, it was arranged for the consultant provided under the aforementioned contract to spend one week per month for four months working with us. The particular consultant hired had reviewed the PPS during initial testing and was reasonably familiar with the system. Unlike what sometimes happens, he was indeed an expert. His help in configuring the operating system, providing a customizable exception handler, and pointing out the bugs and idiosyncrasies of the PLM compiler was of genuine value. More than once, he would look over someone's programming and point out something that wouldn't work - not because of faulty logic or programming, but because of our compiler, the operating system, or the language. If he didn't have an answer to a question, he had no qualms about calling the other people in his office and checking until he did have the answer.

MANAGEMENT

When it came time to make a difficult decision on fixing the programming, project management listened to and accepted the advice of those on the team. The outward qualifications of the people hired, and the extent of the hardware and software support were outstanding. An entire paper could be written on the support given by management, and that support's role in successful completion. The PPS would not be successfully operating without the high level of backing provided by management.

RESULTS

The PPS has been operating successfully since September 1986. Twenty four volumes of code with seven unique application programs were reduced to five volumes and three separate programs. Safety and operability were raised to the level required for full plant operation. Reliablility was raised to the level that only two (non-critical) bugs have been discovered in eleven months of continuous operation. Maintainability is such that a list of desired changes to be implemented during an upcoming plant maintenance turnaround is expected to take approximately four manweeks to program. The supervisor, one WINCO programmer, and the author received the George Westinghouse Signature Award for Excellence for their work on the project.

CONCLUSIONS

People are the overriding factor in successful team programming. Good programmers have an interest in the job and a personal need to see their work performing a task well. They will complete a project in spite of obstacles. Indifferent or incapable programmers are obstacles. If it becomes apparent that a programmer is of the latter sort, reassign him or her to some other project immediately! Outstanding qualifications do not necessarily mean that a programmer will be suited to a particular project. Only observation of a person's work on the job at hand will show whether or not that person is suitable for the work being done. The delays involved in trying to make poor programming work are longer than writing the program to begin with. In addition (as was the case here), the faulty code may have to be rewritten.

Even if the schedule does not allow for it, do a top level design with user input. Understanding what needs to be done is a must before writing the program. After the design is adequate to begin, start writing from the bottom up. Test procedures and modules as they are developed. Changes made after other parts of the program are dependent upon a module cause grief.

Use a modern modular language such as C, Pascal, PLM, etc. Good programs can be written in Fortran or Basic, just as poor programs can be written in a structured language, but the proper tools make the task easier. The other tools such as a real time operating system, development system, hardware simulation system, etc. must also be provided. It is possible to do a job with inadequate tools, but quality and schedule will suffer.

Use independent testing. A test person not directly involved with the programming will discover unforseen circumstances that have not been allowed for.

If a good consultant can be found, the cost of retaining him or her will probably be worth it.

Supervision and management support must include respect for the programmers' capabilities and consideration of their judgement.

Do not work extended periods of twelve hour days, seven days a week. Fatigue leads not just to dimishing returns; it leads to negative returns. A section of code with a minor error can easily be turned into a piece of code with a major logical fallacy when "corrected" by a brain dead person.

First and last - people make success or failure.

SUMMARY

The reprogramming of the Plant Protection System was a success due to the people on the project. Most of the procedures commonly accepted as good programming practice were adopted. Support for the project was superlative. If the factors that brought this effort to a successful conclusion are included in a programming job, the results are almost guaranteed to be of excellent quality.

REFERENCES

1. Glasstone, S. and Sesonske, A., Nuclear Reactor Engineering, Van Nostrand Reinhold Company, New York, N.Y. (1967)

2. Westinghouse Idaho Nuclear Co., Inc., ECR-444, Appendix A, PPS Design Guidelines and Requirements (1985)

Work supported by the U.S. Department of Energy Assistant Secretary for Nuclear Energy under DOE Contract No. DE-AC07-84ID12345.

The submitted manuscript has been authored by a contractor of the U.S. Government under DOE Contract No. DE-AC07-84ID12345. Accordingly, the U.S. Government retains a nonexclusive, royalty- free license to publish or reproduce the published form of this contribution, or allow others to do so, for U.S. Government purposes.