Incident Report: Thursday, 20:25 – Friday 01:15 UTC

Last night, a failure in the storage layer caused most of our services to be unavailable. In the week before, we replaced a failed hard drive. In the week before that, a so-called Virtual Fabric Adapter failed, causing a hypervisor to shut itself off. Since the most recent incident was the more serious downtime, that’s what we’ll start our reporting on.

Continue Reading “Incident Report: Thursday, 20:25 – Friday 01:15 UTC”

Incident Report: Backend Down

Earlier this morning, at 04:38 UTC, one out of the twenty-two IMAP backends in production stopped serving its mail spool, showing Input/Output errors on its disk. Our Standard Operating Procedure is to examine log files, flush vm caches, stop the virtual machine, and start it back up again. This occurred at 05:48 UTC. The IMAP backend in question did not come back up cleanly.

Continue Reading “Incident Report: Backend Down”