This post is in followup to my March 20, 2013 post "Boulder Community Hospital computer system crash: Either you're in control of your information systems, or they're in control of you".

At a March 24, 2013 Denver Post article "Boulder Community Hospital computer records back on line" the following statements are made:


The computer system that Boulder Community Hospital uses to manage patient records, which had been down for almost two weeks, is now up and running again, hospital officials said Saturday.

Meditech, the system used by the hospital to manage patient records, went down March 12 and affected the hospital, its Foothills campus, eight laboratories and six imaging centers. It was put back into full service at about 3 p.m. Friday, according to hospital spokesman Rich Sheehan.

Sheehan said an investigation showed the outage was a result of a malfunction in one of the main computer servers ... the hospital has replaced the hard drives for the server that failed and are inspecting the remaining servers ... [the failure] resulted in the system being unable to access patient information. The malfunction affected both the primary server and a backup server kept off-site.


A hard drive failure led to a two-week outage of an entire EHR system and its offsite backup server?  A mission-critical system in a hospital is so fragile that a hard drive failure caused a two week outage?

If so, that itself shows, at best, poor overall system design with regard to reliability and redundancy (any server worth its salt has hard drives in a failure-tolerant configuration e.g., RAID), but also is not quite credible on its face.  A remote server should not be taken down by the failure of a local server.  I suspect the failure was more than just a hard drive failure, including software bugs or configuration errors, mass hardware and/or network failure, or even sabotage.

The following statement also lacks believability on its face:

... All patient data was recovered except for an eight-hour period the day of the outage. Sheehan said the hospital had to re-create, re-enter and validate the patient information for that eight-hour period before the system could resume normal operations.

If an information system is down for two weeks, there's two weeks worth of data lost.

... Sheehan said the hospital has replaced the hard drives for the server that failed and are inspecting the remaining servers. The hospital is also now doing data backups every four hours as opposed to every six hours, and is planning on doing hourly backups by the end of the week.

Replacing a failed hard drive is an inadequate precaution.  A 'system redundancy makeover' seems in order for when the next hard drive fails.   Hard drives have a very well known MTBF (mean time between failure) and annual failure rate.  (The very Seagate ST3750528AS hard drive in the PC I am typing this blog post on has an Annualized Failure Rate of 0.34%, per the manufacturer's publicly-available literature.)
 

... An independent consulting firm also has been hired to conduct an investigation. The hospital said it expects a report within a few weeks. 

As other organizations are using Meditech products, Joint Commission Safety Standards (as I wrote in a 2009 JAMA letter to the editor "Health Care Information Technology, Hospital Responsibilities, and Joint Commission Standards" available at this link) call for sharing the results of that report with other organizations.  I had discussed this letter numerous times with senior Joint Commission leadership.

Will sharing of the independent consultant firm's report happen?  Probably not.

However, rest assured the Plaintiff's attorneys of Colorado will request it in malpractice suits that arose during the time period of outage.

-- SS

0 nhận xét:

Đăng nhận xét

 
Top