NASA learning need to overcommunicate

During STS 51-I, a minor incident in which procedures were performed out of order, followed by multiple people failing to speak up, turned into a serious situation.  On that flight, the Shuttle carried two communication satellites using Payload Assist Modules (PAM).  The PAM is a rocket that pushes the satellite from the Shuttle’s 160 nautical mile high orbit to an orbit that is 22,000 miles high.  The PAM/satellite combination sits in a sturdy cradle in the Shuttle’s payload bay.  A sunshield covers each satellite to keep its temperature stable.  The sunshield is so flimsy, it must be launched in the open position so that it has sufficient support.

During launch, the Remote Manipulator System (RMS – the robotic arm) is rolled inward toward the payload bay.  Some time after the payload bay doors are opened, the RMS is rolled outward.  There is a camera mounted on the elbow joint of the RMS.  It is very close to the open PAM sunshield, so the camera is oriented during launch to avoid contact.  If the camera is used before the RMS is rolled out, it could contact the sunshield.  There is a Flight Rule that sets the policy for its use.

Shortly after the payload bay doors are opened, a health check is performed on each satellite to ensure that they survived the launch environment.  After that check, the satellite is powered-down and the sunshield is closed.  The satellite has a small antenna that extends soon after it is deployed from the Shuttle.  That antenna has several system features to keep it from being able to deploy under the sunshield.  One of those safety features is enabled by removing power from the satellite.

On STS 51-I, after the payload bay doors were open, the crew forgot to perform the satellite health check and closed the sunshield.  The Mission Controllers noted this mistake.  This led to several discussions about how to handle the situation.  The team decided that they would like to perform the health check.  The Payload Officer asked everyone involved if there was any reason not to reopen the sunshield and perform the health check.  Everyone agreed that there was no reason not to reopen the sunshield to perform the health check.  The Flight Director agreed and Capcom passed the instructions to the crew. 

The crew opened the sunshield and then reported a problem to Houston.  When the crew reopened the sunshield, it caught on the elbow camera mounted on the RMS and was bent substantially and stuck midway between open and closed. 

They said that a crewman noticed the sunshield was jerky as it closed.  He didn’t report it earlier, because he didn’t recognize the jerkiness being out of the ordinary as he had other duties on the flight and hadn’t been trained on the PAM. 

Everyone in Mission Control recognized the seriousness of the situation and the need to assess the situation and not make it any worse.  Several things immediately came to mind.  The sunshield might be sticking up high enough to interfere with closing the payload bay doors.  The satellite’s temperature would have to be managed to keep its propellant from freezing.  If it froze, the propellant lines would burst allowing the fuel to contaminate the payload bay creating a fire hazard, especially during reentry.  If the team couldn’t find a way to keep the sun from shining more than a few minutes on the solar arrays, the overheating would destroy the arrays and ruin the satellite. 

The robot arm might be able to be used to push the sunshield out of the way to allow the deployment.  If that didn’t work, a spacewalk might be needed to push the sunshield out of the way.  Dealing with errant sunshield might necessitate canceling the other mission objectives including deploying two other satellites as well as the planned repair of an ailing satellite already in-orbit. 

The decision was made to use the RMS to push the sunshield fully open.  When the crew unlatched the RMS and started to work on the sunshield, the RMS’s computer control failed.  Normally, the arm is controlled by two joy sticks whose point of reference can be changed among several modes.  This automatic feature failed.  They crew then had to manually move each joint one at a time in a very tedious manner.  To their great credit, they skillfully pushed the sunshield to the fully open position so that it didn’t interfere with the satellite. 

The sunshield being stuck in the open position necessitated deploying the satellite within a few hours and not waiting until the next day.  The second satellite using the other PAM was already scheduled for deployment on the first day of the flight.  We slightly modified the procedures and successfully deployed two satellites from the Shuttle in one day; a first.

During the deep dive discussion after the sunshield was stuck, it was announced that a controller in the back room had seen higher motor currents as the sunshield initially closed than had ever been seen on previous flights.  He discussed this with his McDonnell Douglas leader and lobbied to tell the larger team.  The leader decided to not pass this observation to the larger team because a) the currents were within limits and b) there was no other reason to think anything might be wrong.  When the crew forgot the health check and the Mission Controllers were asked by the Payload Officer if there was any reason to not open the sunshield and perform the health check, the McDonnell Douglas leader should have then mentioned the higher motor currents.  That would have led to more discussions and the crew probably would have mentioned the jerky sunshield closure.  That would have avoided the whole series of subsequent problems. 

After the flight, the crew was asked publicly about the incident.  They blamed it on “procedural traps” in their checklist.  Commander Joe Engle also blamed a changing manifest that did not give them time to train and understand the nuance of the camera/sunshield relationship. 

The procedures that they used to perform the health check and to close the sunshield were technically my procedures, but they were published in another person’s book for convenience.  The crew named that book publicly causing a lot of uncalled-for professional embarrassment.

The changing manifest had nothing to do with this crew error.  While they had several manifest changes, this particular arrangement of satellites in the payload bay that could interfere with the camera had been the same on two previous assigned manifests for this crew.

When I left NASA, my colleagues presented me with a “Mike Lounge” chair embedded with dozens of mouse traps labeled as “Procedural Traps”.  We all got a big laugh out of that. 

Lessons Learned

1. Overcommunicate.  Make sure that everyone who needs to know what has happened and what is going to happen.  Don’t assume they know.  (I’m going to break left and you break right.  I’m going to pull onto the runway now.  I’m going to pressurize the pipe with steam.  I’m going to turn the electricity onto this circuit.  I’m going to use data from the new IT system to make operational decisions beginning tonight.)

2. Listen to the young guys.  They may know more about an item than the experienced person.

3. Listen to the outsider.  He may notice something that the experts didn’t see.

4. Don’t worry about being embarrassed.  It’s better to look a little foolish than to actually be very foolish.

5. Leaders should not shirk responsibility and should make damn sure that they never blame the blameless.  While I think the world of Joe Engle, he made a big leadership error on this matter.