Apologies for downtime on Friday

I was on holiday last week and it ended badly with a phone call on Friday afternoon from one of our developers telling me that Clear Books was down.

There is nothing that makes my stomach turn more than unexpected downtime. It’s demoralising for our team and frustrating for our customers.

We are truly sorry to our customers for any disruption caused on Friday afternoon.

Our aim is to provide 100% uptime but things sometimes go wrong. For the record here are our uptime statistics:

This month, skewed by yesterday, simply isn’t good enough.

Our hosting provider, CatN, had their team of engineers working on the problem immediately and were providing updates on twitter. We also announced the problem on our system status page. There was no data loss or corruption associated with this downtime.

With CatN we will now review the cause and measures we need to take to prevent this happening again and provide an update shortly.

Thank you for your continued support.

About the Author - Tim Fouracre

I've always been interested in computers and finance. I started out as a PHP developer and later qualified as a Chartered Accountant at KPMG.

Programming and accounting came together when I co-founded Clear Books online accounting software back in July 2008.

Leave a comment

Showing 18 Comments

  1. Michael G

    Tim

    The lack of any updates on your system update page was a real worry to many of us.
    Twitter provided little response from Clearbooks – really unlike you.

    Glad it’s all resolved.

    7 months ago

  2. @Michael

    This is something we definitely need to improve.

    With communication in mind the system status site was recently introduced to provide a focused place for updates. However, it’s probably not widely known about at the moment and we didn’t utilise it enough.

    We need to get better at passing on updates from our host, catn. Perhaps we should incorporate their twitter feed into our system status blog and/or we could auto retweet their messages in our twitter feed so that the latest developments are passed on immediately.

    7 months ago

  3. Andrew Taylor

    To be honest your uptime stats are alarming. With such an important web-app that comes at a cost to customers, 3-9s is not an unrealistic target. By any standards and even looking at a full year of stats you’re failing way short of that.

    I’d suggest you’ve outgrown your current host, and, it’s time to step up to the next level.

    Obviously we haven’t yet had an incident report, but, it eeks back to your previous large outage. Last time you had no replicated slave DB and your master went. This time we’ve heard about NFS problems. It sounds like a pretty precarious arrangement if both your website and the catn website is reliant on the same NFS mount. It sounds like you have a large pool of disks that’s presented to the rest of the tiers via a single NFS host an you’re not utilising local storage at all. A poor mans NAS. That’s a very obvious single point of failure and there are much better ways to achieve the same level of distribution.

    CatN simply aren’t cutting it any more for your needs.

    Andrew

    7 months ago

  4. There have been a number of outages in the last 40 days and whilst we all have difficulties, clearbooks is providing a service one that can’t be replicated when it is offline. Therefore it is incumbent on you to ensure your system is robust and a backup system in place that can take over at an alternate data center for example.
    It is not acceptable for this downtime to continue on the current trend.
    This is a Cloud based system, security and service provision have to be top on the agenda so far results speak louder than words.

    A final point about downtime notification, an email should be sent to all clients immediately the system suffers a prolonged outage such as +5mins to explain what is happening, and when it is likely to be reolved updated every 30-60 mins. It is not acceptable for me that you rely soley on one form of communication over another such as twitter or the help pages. An email should be the official mode of communication with the other services as a realtime or additional means of communication.

    7 months ago

  5. @Andrew

    Thanks for your comments to which we will be better placed to respond once we have had a full debrief.

    @Flamefix

    We will explore your auto email idea further.

    7 months ago

  6. Why do you try and blame your host when your host is yourself? You work at Fubra (LinkedIn says so), which bought clear books, which also owns catN, right?

    7 months ago

  7. @Anonymous

    I don’t think any of our customers would post anonymously so I am not sure who you are or what your agenda is.

    For what it’s worth I co-own Clear Books. I don’t work at Fubra (about 8 years ago I did).

    Fubra never ‘bought’ Clear Books. They are an original 50% shareholder.

    Yes CatN is owned by Fubra, 100%.

    For the above reasons we have a very close relationship when it comes to our hosting as Fubra has a vested interest in making sure Clear Books runs smoothly on their CatN platform (it’s not ours).

    When Clear Books has upset customers due to unscheduled downtime then so does Fubra and they share our pain. That makes them more resolved to improve and address issues.

    We are not tied down to CatN as a host provider. As Andrew Taylor suggests we could move elsewhere, however, there are many benefits to the relationship we have.

    My post is reaching out to our customers to apologise for the inconvenience caused yesterday.

    7 months ago

  8. Reece

    Will there be any compensation for the hours lost during the outage? Thanks.

    7 months ago

  9. @Reece

    It’s a fair question but if we were to refund everyone for the 1.52% of downtime this month we would need to payout 8p on the growth plan, 15p on the established and 23p on the premium.

    I appreciate your request to be compensated but it would require a lot of effort to put in place for not much gain to anyone. It would also set a precedent for future downtime. We cannot guarantee that there will not be downtime in the future. We can guarantee that we will review what happened and do out best to prevent it happening again.

    7 months ago

  10. @Reece

    We’ve been talking about this internally and I think I was too hasty in my reply dismissing your suggestion.

    We are going to investigate how we can incorporate a refund credit system into our subscription system. Every penny counts after all. More to follow later.

    7 months ago

  11. I am not technically minded at all when it comes to computers etc. All I know is that I couldn’t access Clearbooks on Friday and, more importantly, I had no clue as to what was going on. I don’t use Twitter and I had to ask a question on an accounting website to eventually find out what the problem was.

    I completely agree with Flamefix that you should send your subscribers an automatic email in situations like this. Maybe not as often as he suggested but certainly enough so that we are kept fully informed as to progress. Surely it must be easy enough to set up. After all, you do it with your newsletters and weekly summaries of our account information.

    7 months ago

  12. @Flamefix @Steve

    The great thing about twitter is it lends itself to rapid updates.

    There isn’t much time to organise email campaigns when everyone is focused on solving the problem and getting the application back up and running as soon as possible.

    We certainly need to do more to raise awareness about our system status page though so please bookmark http://www.accountingsystemstatus.co.uk/ and visit that any time there is an issue.

    We also need to do better at using http://www.accountingsystemstatus.co.uk as a focused area for clear and regular updates.

    7 months ago

  13. @Tim no agenda; I was a user of another piece of online accounting software and I just look for transparency with SaaS as it’s important for the industry. I think you’re doing well with your posts here now all is back up. I hope you sort out the issue and improve uptime as more competition in the online accounting market is good because I quit using the other lot for various reasons ;) (not related to uptime though)

    It’s just that if the supplier I used blamed their host, if their host owned 50% of the shares (sorry for my ‘bought’ assumption – but my point still stands a little), I’d be frustrated as it’s not too clear.

    In summary I agree with Andrew mostly – I’d take a look at your infrastructure because running an important business tool on what must be a complex system shared with other users probably isn’t a good idea – whilst everyone has issues, if you ran your own system any issues would be smaller and easier to recover from.

    Re others’ posts about uptime it if includes scheduled maintenance the uptime is ok and far better than the major banks’ online banking uptime!

    7 months ago

  14. Down again today – this really isn’t good enough. I would suggest that you look into some kind of backup solution which can kick in when your host goes down – I was stuck on Friday unable to invoice my customers who were coming in to pick up jobs, and stuck today trying to get the accounts up to date before Monday. I don’t know any technicalities of this, but it must be possible. I’m sure Gmail doesn’t depend on one server.

    Clear Books is a brilliant system – yet when the system is down, all the hard brilliant work put in by your team is worthless if I can’t even access the system.

    7 months ago

  15. Given that you are running a SaaS business, you need a proven process for this type of event. Because even with a 99.9% performance, it will keep happening (hopefully just less). Such process needs to include clear communication to customers, and a communications channel which is not caught up in the outage (e.g. your website has been unreachable, let alone the accounting system). So Twitter or other posting mechanism would be good.

    I also think you should drop your formal corporate speak down a notch. You have a number of customers who really believe in your product and what you are doing, and your system stands out above other boring tired dated systems. (But at least they seem to run!)

    With such users, they will stick with you, support you and assist you in growing the business ….. but only if you play fair with them. Which includes frank information about errors, and not glossed-over corporate speak to excuse and transfer blame.

    And I am guessing you need to look seriously at investing in better infrastructure with mirrors, failover and no single point of failure. That’s what SaaS is all about.

    7 months ago

  16. Guys,

    I’ve posted a brief udpdate here after this morning: http://www.clearbooks.co.uk/2011/10/31/update-on-downtime/

    A more detailed update will follow shortly.

    Thanks

    Tim

    7 months ago

  17. Steve

    Sorry Tim,

    I know you are trying hard and you are a good bunch of people at Clear Books but this is the second time I have visited my clients to do the month end accounts and I have been sat in their office twiddling my thumbs. Really frustrating.

    I’m not angry at Clear Books, just disappointed that the server problems are still with us and also that the buggy parts of the software have not been cleared up.

    7 months ago

  18. Hi.

    Any updates on this? Have the failures been occurring often?

    Just asking because, after seeing aaaaall the accounting apps available just found Clearbooks today and looks like have all what we need, but worried about stability.

    Will appreciate any feedback.

    Thanks.

    1 week ago