A maintenance worker may have accidentally switched off the power supply at the centre of the British Airway’s IT failure which caused disruption for 75,000 passengers, it has been claimed.
The investigation into the fiasco is likely to focus on human error rather than any equipment failure, according to The Times, after an internal investigation found that the power supply was working correctly.
Quoting a BA source, the newspaper reported that it was rumoured that a contractor doing maintenance inadvertently switched the supply off, however this has not been confirmed.
The chaos was caused by a sudden power loss at BA’s two main data centres last Saturday, May 27. The problem was then worsened by an uncontrolled reboot of the system which shut down the entire IT system. All information about flights, baggage and passengers was lost and travellers were left stranded over the bank holiday weekend with at least 700 flights cancelled at Heathrow and Gatwick.
Bill Francis, Head of Group IT at BA’s owner International Airlines Group (IAG), sent an email to staff, seen by the Press Association, which confirmed that the shut-down had not been caused by IT failure or software issues.
His email revealed that an investigation so far had found that an Uninterruptible Power Supply to a core data centre at Heathrow was over-ridden on Saturday morning.
He said: “This resulted in the total immediate loss of power to the facility, bypassing the backup generators and batteries. This in turn meant that the controlled contingency migration to other facilities could not be applied.
“After a few minutes of this shutdown of power, it was turned back on in an unplanned and uncontrolled fashion, which created physical damage to the system, and significantly exacerbated the problem.
“This was entirely a problem relating to the power supply. It was not an IT failure, and there were no software issues.
“The fix consisted of physically replacing servers that had been damaged, then bringing all of BA’s 700-plus applications back online in a controlled fashion while ensuring that all data was consistent across the system. All of the systems are now back up and running.”