Author Archive for Mads Petersen

Incident report: “Stability!”

Posted on: September 1st, 2023 by

Welcome back!

We were finally able to bring our blog back. During the past 2 weeks we had a long lasting incident. We did what we could to make sure that users could use their primary services during our work on the systems, but we know that there was unplanned downtime. As the blog was a victim of the situation, we couldn’t inform users as well as we wanted (we really do not like X (‘formerly known as Twitter’)), so here is finally the incident report from the beginning to the end.

> Continue Reading

Service window postponed – Due diligence..

Posted on: June 23rd, 2023 by

Last week we announced a service window that was supposed to start and be in session about now. However, following due diligence we went through the list of activities last night and found that we had overseen an important step in the plan. The operations that are going to happen are on the backend servers, are quite sensitive, and we would like to avoid any problems. We therefore suspend the current service window and hereby declare it for the coming Friday:

Friday 2023-06-30 at 08:00 UTC until 13:00 UTC.

> Continue Reading

Announcing Service Window Friday 23rd of June: The everlasting race to get rid of old infrastructure..

Posted on: June 16th, 2023 by

The Kolab Now Staff is going to update parts of the platform to get rid of even more old infrastructure, and to ensure the best possible solution for users. Therefore we declare a service window on:

Friday 2023-06-23 at 08:00 UTC until 13:00 UTC.

> Continue Reading

Announcing Service Window: Caching infrastructure..

Posted on: April 17th, 2023 by

Lately we have had reports from users of the Kolab webclient who saw a variety of issues such as: login issues, some data entries being hidden and showing up again and folder creations failing. As this was seriously affecting the reliability of the webclient, we put the deployment of new features on hold to investigate the root cause of those issues.

> Continue Reading

Update on the ongoing performance issue

Posted on: February 2nd, 2023 by

During the last 2 days Kolab Now has experienced a serious performance degradation issue. Users see that logins to- and operations in the webclient is taking a long time, and users connecting via desktop or mobile clients might not be able to perform synchronizations.

In the end of Wednesday it was thought that the root cause was found, and many users saw better performance for a while, but Thursday morning the performance dropped back to unacceptable levels.

> Continue Reading

Announcing Service Window: More replacement of old infrastructure..

Posted on: January 20th, 2023 by

This Friday’s service window was being executed well, and while assisting in replacing some old infrastructure, it opened opportunities to proceed down the same track. With that in mind, the Kolab Now operations team will perform more maintenance on the infrastructure of the Kolab Now platform on Friday, 27th of January 2023 at 10:00 UTC.  The update will move more of backend services to newer infrastructure. No new features visible to end users will be enabled during this maintenance.

The service window is expected to last for no more than 4 hours, ending on Friday, 27th of January 2023, at 13:00 UTC.

> Continue Reading

Incident report: ‘Gateway Timed-out’ at login

Posted on: January 14th, 2023 by

On Saturday January 14th, 2023, from approximately 03:47:12 UTC a group of Kolab Now users observed an error stating: “Gateway time-out” when trying to login to the webclient, the dashboard, or connect with any desktop or mobile client. The Kolab Now main page https://kolabnow.com/ and all other services (Support, blog and knowledge base) were available with no issues for these same users.

The issue lasted until approximately 08:04:22 UTC, and was caused by a db server in one of the clusters being stuck in a large operation. The issue caused troubles with the login procedure, but had no impact on the mail flow. No mail was lost or delayed during the incident.

> Continue Reading

Announcing Service Window: The death of old infrastructure..

Posted on: January 13th, 2023 by

On Friday, 20th of January 2023 at 09:00 UTC, the Kolab Now operations team will perform maintenance on the infrastructure of the Kolab Now platform.  The update will move a part of the backend services to newer infrastructure. The updates will bring more stability and better manageability to the backoffice systems. No new features visible to end users will be enabled during this maintenance.

The service window is expected to last for no more than 4 hours, ending on Friday, 20th of January 2023, at 13:00 UTC.

> Continue Reading

The Annual Certificate Refresh

Posted on: January 2nd, 2023 by

Another year past and we needed to again refresh our certificates.

As in previous years, e.g. 2018, 2019, 2020, etc, we rolled over certificates across all systems over the last few days of the year.

The new certificate is in place, and applies to https://kolabnow.com/, imaps://imap.kolabnow.com and smtps://smtp.kolabnow.com.

As in previous years, we publish the fingerprint here:

SHA-256:

FA:2A:BF:A9:F8:FA:67:E7:7B:0D:B2:2C:C2:F2:8B:F8:30:4C:77:84:55:38:02:B4:DD:66:7F:DA:F4:C7:DC:49

SHA-1:

B6:FE:BF:F6:3A:17:CD:E4:2D:02:7F:AF:C9:EF:0C:45:32:F9:0A:3B

 

Incident report: db server stuck in large operation..

Posted on: June 7th, 2022 by

On Monday June 06 2022, from approximately 11:24 UTC a group of Kolab Now users observed an error stating: “Gateway time-out” when trying to login to the webclient, the dashboard, or connect with any desktop or mobile client. The Kolab Now main page https://kolabnow.com/ and all other services (Support, blog and knowledge base) were available with no issues for these same users.

The issue lasted for about an hour, until approximately 12:19 UTC, and was caused by a db server in one of the clusters being stuck in a large operation. The issue caused troubles with the login procedure, but had no impact on the mail flow. No mail was lost or delayed during the incident.

The operations team was warned by the monitoring system about a stuck db server in the cluster, and was having hands on keyboards right away. The issue lasted for about an hour (some users in the group saw return of service earlier than others), and was caused by this db server in the cluster being stuck in a large operation.

Further investigation and action is going into reevaluating the amount of resources assigned to the db clusters and other servers for such large operations of this type.

If you were among the group of users that was impacted by this, then please accept our apologies for any inconvenience caused.