Downtime explained

After several sleepless nights we now appear to be on top of our downtime crisis so I can dedicate some time and explain what happened.

Clear Books is a startup –  it’s exciting and we are adapting all the time but sometimes we make mistakes. We’re learning from our mistakes, improving all the time and maturing as a business.

Everyone at Clear Books is extremely sorry for the recent downtime. Our reputation and livelihood depend on providing an excellent service so when there is no service, it’s a heart stopping moment for our entire team and not much fun.

Thank you to all our customers who have supported us during this difficult period. Your rallying comments really help raise our spirits.

I will begin with a quick summary of how we have changed in our short 3 year history.

Year 1

Any kind of enterprise level hosting solution was far from my mind at this very early stage. Instead the focus was developing a basic application.

Year 2

Then we suffered a major hardware failure on our single database server. Having a single database server was a mistake although such a severe hardware failure felt like bad luck (but then you have to expect the worst). Customers working the bank holiday weekend lost their weekend’s worth of data as we had to resort to restoring offsite backups. We immediately reacted by introducing real time replication of our database server (master/slave) to ensure data would be replicated and safe in the future.

Year 3

And now we have just suffered a serious accessibility issue at the web tier. There was single point of failure with our NFS server. For the techies amongst you full details are provided here by Senior System Architect at CatN, Mark Sutton. CatN has also apologised to its customers and outlined the changes they are making here in a post by CatN’s commercial director, Joe Gardiner.

What is CatN?

CatN is Fubra Limited’s cluster hosting solution. Customers should be familiar with Fubra because you sign into Clear Books with your Fubra Passport and you make payments through the Fubra Payments system. It’s widely documented that Fubra Limited is a 50% shareholder in Clear Books too. Clear Books is hosted on the CatN platform.

CatN is in beta. What this means is CatN is planning to launch as a fully redundant system in January 2012 to the general public. At the current time CatN has no redundancy for its NFS server.

Clear Books Backup Cluster

Clear Books is working with CatN to ensure that Clear Books has full redundancy before January 2012. We have already made significant progress with this as we were put to the test on Tuesday morning when the NFS server failed again. We were able to change our DNS record and resurrect access to Clear Books from a backup web server. This ensured Clear Books remained accessible. Shortly, we will be moving to managed DNS so that the switch over will be seamless.

Communicating

It’s really difficult to set an expectation for the resolution time of a server issue. By “hoping things will be resolved within the hour” we are simply setting customers up for potential disappointment and more frustration. Therefore if we suffer downtime in the future we won’t put a time estimate on a solution. Instead we will:

This way customers can spend time focusing on other tasks. When the system is back up and running we will inform you immediately.

Compensation

We are also exploring incorporating a credit system into our subscriptions such that compensation will be applied to all customer accounts for any significant unscheduled downtime.

Status Page

Please take note of our Clear Books Status page which collates tweets from Clear Books and CatN to provide real time updates. If you cannot access Clear Books google “Clear Books status” to find this externally hosted website.

What Next?

In the past we made a mistake with database redundancy and we addressed it. Now we have made a mistake with accessibility, and we are addressing it. We are learning and improving all the time. Judge us on how we respond and we will continue to work hard to build you and your business a bigger and better online accounting system.

About the Author - Tim Fouracre

Tim founded Clear Books in 2008. Like many small business owners he worked from home for 15 months to get his startup off the ground. Today Tim enjoys helping Clear Books, its customers and its growing team innovate and achieve. Tim did his GCE O Levels in Ghana.

  • http://www.lucidrep.com Hat Margolies

    Hi Tim,

    Thanks for explaining so fully what happened, but I do think that there were failings in the fact that you couldn’t contact customers and let them know that there was a problem via email and that is was only through twitter that you could find out what was going on. I also found that when I had problems with the system again on Sunday there was a slightly incredulous air to the response, as if it was my fault, rather than that the system had failed again – as it had…
    Once the system was up again, I’m really suprised you haven’t emailed all your customers to apologise and even to offer some sort of compensation for the stress it caused, coming at the end of the month.
    I hope that the problems are solved now, and that you will have a more efficient way of letting people know there is a problem if it occurs on this scale again.

  • Paul

    Thank you for the apology, the informative explanation and most importantly the well explained way forward. Good communication is the keystone to any relationship!

  • Robert

    Many thanks for your full explanation Tim – it is good to know you guys have addressed the issues now – it has put my mind at rest.

  • http://www.clearbooks.co.uk Tim Fouracre

    Thanks for your positive comments.

    I’ve blogged again giving some more details about the next steps we are taking.

    http://www.clearbooks.co.uk/2011/11/04/next-steps-to-a-stable-platform/

  • Adam

    Sorry had to chuckle at this line:

    “CatN is in beta. What this means is CatN is planning to launch as a fully redundant system in January 2012 to the general public. At the current time CatN has no redundancy for its NFS server.”

    Well they’ve got a couple of months then :p

  • http://www.turnkeyit.co.uk Mike Turner

    Tim,

    “CatN is in beta. What this means is CatN is planning to launch as a fully redundant system in January 2012 to the general public. At the current time CatN has no redundancy for its NFS server.”

    I’m quite surprised that Clearbooks is using a hosting system which is in beta and not fully operational or fully supported.

    • http://www.clearbooks.co.uk Tim Fouracre

      @Mike CatN is not a beta company – they have blue chip clients on their private clusters. Their vCluster hosting solution and control panel, which we are currently on, is in private beta and is being used by a select few businesses.

      Clear Books recently recruited a full time sys admin who is working with the CatN engineers and our developers to build our own private Cluster bespoke hosting solution.