Details of the Paystack outage on August 9, 2019

An explanation of the root cause of the August 9 outage

On Friday 9 August 2019, services that work to power Paystack's transaction processing experienced elevated errors. This started at about 11 pm WAT on Thursday 8th August and was mostly resolved by 6 pm WAT on Friday 9th August, with a handful of intermittent issues lingering until 10 am WAT on Monday 11th August. The duration and degree of errors and timeouts varied considerably from service to service and is explained below. A detailed description of the failures will be discussed in a future blog post.

Customers may have experienced increased latency (i.e. an abnormally long period to get responses from the product) and intermittent errors. This resulted in some customers not receiving a receipt and/or value right after being debited.

We apologize unreservedly to our customers whose services or businesses were impacted during this incident, and we're taking immediate steps to improve the platformโ€™s performance. Internal teams are meeting regularly, performing a full post-mortem to understand how this occurred and how we prevent this from ever occurring again.

Root cause and how we fixed it

This was a major outage, both in its scope and duration. As is always the case in such instances, multiple failures combined to amplify the impact.

During the lifecycle of a transaction request on Paystack, several messages are generated and pushed to queues. This helps ensure that our response times stay very low. These message queueing systems play significant roles in our infrastructure, including automating event broadcasting, delivering receipts to our customers, and the use of different available transaction processing failovers.

Around 11:10 pm WAT on Thursday 8th August 2019, we started observing an abnormal rapid rise in our transaction failure rates.

Upon further investigation by our Engineering team, we were able to trace the issue to a failure when attempting to post to one of our messaging systems. This was triggered by a yet-to-be-discovered fault in the image being used by our instances.

By 4:00 pm WAT on Friday, our engineers pushed an update to temporarily stop queueing messages, concluding every required step in the same API call. Later, we deployed a script to send receipts and events for successful transactions concluded during the downtime.

Prevention and follow-up

While we have rerouted traffic away from the failing dependency, we have also begun a dialogue with our messaging service provider to pinpoint why this occurred.

Weโ€™ve also started looking at ways to have failovers in our messaging system architecture to resolve this single point of failure and make it more resilient, including implementing fallbacks to handle the message calls in the event of failures like this in the future.

We're performing an ongoing post-mortem process at Paystack to ensure we fully understand what went wrong, what we could have done better, and identify steps to eliminate the possibility of this happening in future.

As we learn more, we'll share a follow up with more details of our findings as well as what additional measures we're taking.

Conclusion

Every day, billions of Naira worth of value hurtles at lightning speed through Paystack, on behalf of some of the most ambitious companies in Africa. It is a responsibility we take absurdly seriously.

Most days, this complicated dance happens seemingly effortlessly, but it takes a day like Friday to remind us of how many people rely on us to come through, flawlessly, every day, without fail. We measure a downtime like this not only in currency, but in the many thousands of friends we let down.

This was an upsetting situation for our customers and our team. We're embarrassed that this happened, and are committed to making this a profound learning experience. We're deeply sorry for the negative impact this had on our customers, and are putting in place all possible changes to ensure that something like this doesn't happen again.

Thank you sincerely to the many thousands of companies who rely on Paystack to power your growth. We strive to be even more worthy of your trust, every day.

Details of the Paystack outage on August 9, 2019 - The Paystack Blog Details of the Paystack outage on August 9, 2019 - The Paystack Blog