Posts Tagged ‘Incident Report’

Incident Report: Thursday, 20:25 – Friday 01:15 UTC

Posted on: September 21st, 2018 by

Last night, a failure in the storage layer caused most of our services to be unavailable. In the week before, we replaced a failed hard drive. In the week before that, a so-called Virtual Fabric Adapter failed, causing a hypervisor to shut itself off. Since the most recent incident was the more serious downtime, that’s what we’ll start our reporting on.

> Continue Reading

Incident Report: DNSSEC record expired

Posted on: December 10th, 2017 by

On Saturday morning (CET) the DNSSEC records expired on one of our DNS servers. This caused a group of customers to have troubles logging in and connecting to Kolab Now services. The record has been renewed and all customers should have access (at 22:49 CET – Please read below).

> Continue Reading

Incident Report: Hypervisor Failure

Posted on: November 27th, 2017 by

This weekend, at approximately 12:00 UTC on Sunday, an issue on one of the hypervisors went by unnoticed for too long, and was finally resolved in the morning of Monday. This post explains what happened, why it happened, and what we’re going to be doing to address the situation.

> Continue Reading

Incident Report: Backend Down

Posted on: October 17th, 2017 by

Earlier this morning, at 04:38 UTC, one out of the twenty-two IMAP backends in production stopped serving its mail spool, showing Input/Output errors on its disk. Our Standard Operating Procedure is to examine log files, flush vm caches, stop the virtual machine, and start it back up again. This occurred at 05:48 UTC. The IMAP backend in question did not come back up cleanly.

> Continue Reading