BlackBerry smartphone users in North America recently experienced one of the longest and most significant BlackBerry outages in recent years. BlackBerry-maker Research In Motion (RIM) appears to have resolved the problem, service is mostly restored and though the company was tight-lipped on potential causes yesterday, I just got the following statement from RIM:
“A service interruption occurred last week that affected BlackBerry customers in the Americas. Message delivery was delayed or intermittent during the service interruption. Phone service and SMS services on BlackBerry smartphones were unaffected. Root cause is currently under review, but based on preliminary analysis, it currently appears that the issue stemmed from a flaw in two recently released versions of BlackBerry Messenger (versions 184.108.40.206 and 220.127.116.11) that caused an unanticipated database issue within the BlackBerry infrastructure. RIM has taken corrective action to restore service.
“RIM has also provided a new version of BlackBerry Messenger (version 18.104.22.168) and is encouraging anyone who downloaded or upgraded BlackBerry Messenger since December 14th to upgrade to this latest version which resolves the issue. RIM continues to monitor its systems to maintain normal service levels and apologizes for any inconvenience to customers.”
So it looks like the two recent BlackBerry Messenger updates are the culprit behind yesterday's BlackBerry service outage. If you're running either of the problem-versions, you should update your software immediately via BlackBerry App World or at BlackBerry.com/Messenger using your BlackBerry Browser.
The outage is the second major North American BlackBerry service disruption for RIM and its customers in a week's time. The occurrence of two major BlackBerry outages so close together is uncommon, and RIM typically prides itself on near-perfect uptime statistics. I can't help but wonder if the first outage earlier this week is somehow connected to yesterday's fiasco, but RIM isn't providing any additional information at this point.
I had suspicions the outages were at least related to BBM–it's very unusual for RIM to release two updates to one of its core applications so shortly after one another–the problem version were released within days of each other–and when I saw the company had issued yet another update yesterday, BBM v22.214.171.124, I was able to connect the dots. RIM's statement this morning confirmed my suspicions.
Check out the following information from BlackBerry-server monitoring-software firm BoxTone for a detailed timeline of yesterday's BlackBerry outage:
* “Between 3:00 and 4:00 PM EST – Problems with BBM and BIS internet browsing reported around the web.
* “Between 6:30 and 7:00 PM – The problem extended to BES email, preventing the delivery of BES emails to and from BlackBerry smartphones. At each of our customers, BoxTone detected a greater than normal quantity of users with messages pending, based on our learned baseline of what is normal for each server and carrier, and immediately generated a warning alert our customers before the flood of user calls. BoxTone also placed all affected BES and Carriers in a Critical state on our customers' Operations Dashboards (depicted by the red dots next to each BES and carrier). The steady growth in Pending Messages beginning around 6:45 continued until the issue was resolved early this morning. From our monitoring data, it appears that BES were able to communicate with the RIM NOC throughout the outage; however, the NOC was unable to deliver messages.
* “At approximately 12:09 AM, BoxTone detected a brief disconnect in the SRP connection of each BES to the NOC; it appears RIM reset the NOC SRP connection to complete their fixes. Following this reset, delivery of BES mail resumed.
* “By 2:45 AM or earlier, BoxTone detected that most of our customers had returned to their normal (baselined) service levels, and that the backlog of pending mail had been delivered. BoxTone generated notifications informing our users that their service levels had returned to normal and updated the status of the BES and carriers to Normal.”