Posts Tagged ‘Incident Report’

Incident Report: IMAP backend server out of memory

Posted on: June 8th, 2021 by

On Monday night, the 7th of June 2021 @ 18:29 UTC, a process on one of our many backend servers was taking a lot of memory; faster than it could release it again, which made the server run out of memory and stop responding. The server were serving IMAP for a limited group of users, who in turn were impacted by the incident

As the server was heavily monitored, alarms were going off with the staff, but was little noticed, as it happened in a low traffic period, and as the systems are build to fix such situations themselves. The systems in this class usually just restore and keep working. This time however, the server did not come up in a timely manner, and the mentioned users saw their mailboxes freeze and their mail become unavailable.

The staff realized that something was not as it should be and went on to manually restore the situation. At 20:27 UTC the server was back up and running, and all mailboxes were available again.

No data was in danger, and mail delivery to the impacted mailboxes continued during the incident.

We apologize for any inconvenience that this incident may have caused.

 

Incident Report: Network Interruption

Posted on: May 28th, 2021 by

At 06:00 UTC on Wednesday May 26th Kolab Now fell silent. All connections were dropped. What happened?

> Continue Reading

Incident Report: Lock down of firewalls

Posted on: April 28th, 2021 by

At 10:10 UTC this morning, Wednesday April 28th, parts of our environment were getting updated. The updates included Security Enhanced Linux configuration on one layer of our firewalls. Unfortunately this new configuration locked up these firewalls and all traffic was blocked for a group of users.

The problem was confirmed corrected at 10:36 UTC.

No data was in any danger of being compromised during the incident.

We apologize for the inconvenience.

Incident Report: Storage Failure

Posted on: January 29th, 2020 by

At 10:23 UTC this morning, Wednesday January 29th, our environment experienced a catastrophic storage failure. The time to resolution for this underlying problem was approximately 80 minutes, and full service was restored approximately 60 minutes thereafter — 12:48 UTC.

> Continue Reading

Incident Report: Cascading Performance Problems

Posted on: March 5th, 2019 by

From last Sunday afternoon onward, up to Monday evening and throughout the Monday night, performance problems have deteriorated the Kolab Now service up to and including services becoming unavailable.

> Continue Reading

Incident Report: Thursday, 20:25 – Friday 01:15 UTC

Posted on: September 21st, 2018 by

Last night, a failure in the storage layer caused most of our services to be unavailable. In the week before, we replaced a failed hard drive. In the week before that, a so-called Virtual Fabric Adapter failed, causing a hypervisor to shut itself off. Since the most recent incident was the more serious downtime, that’s what we’ll start our reporting on.

> Continue Reading

Incident Report: DNSSEC record expired

Posted on: December 10th, 2017 by

On Saturday morning (CET) the DNSSEC records expired on one of our DNS servers. This caused a group of customers to have troubles logging in and connecting to Kolab Now services. The record has been renewed and all customers should have access (at 22:49 CET – Please read below).

> Continue Reading

Incident Report: Hypervisor Failure

Posted on: November 27th, 2017 by

This weekend, at approximately 12:00 UTC on Sunday, an issue on one of the hypervisors went by unnoticed for too long, and was finally resolved in the morning of Monday. This post explains what happened, why it happened, and what we’re going to be doing to address the situation.

> Continue Reading

Incident Report: Backend Down

Posted on: October 17th, 2017 by

Earlier this morning, at 04:38 UTC, one out of the twenty-two IMAP backends in production stopped serving its mail spool, showing Input/Output errors on its disk. Our Standard Operating Procedure is to examine log files, flush vm caches, stop the virtual machine, and start it back up again. This occurred at 05:48 UTC. The IMAP backend in question did not come back up cleanly.

> Continue Reading