UNDER OBSERVATION: Slowdown on mu01 and mu11

  • Monday, 27th April, 2020
  • 14:15pm

2020-04-27 14:15 UTC: We are experiencing a slowdown on mu01 and mu11 servers and are investigating. This announcement will be updated when solved.
2020-04-27 15:35 UTC: We have now recovered and restarted the node server and will be monitoring it while investigating the root cause of the overload. There might be a reboot to be scheduled later on, once the investigation is complete. Sorry for the inconvenience.
2020-05-02 22:00 UTC: We will be conducting a planed maintenance in the time-window of 23:00 UTC to 00:00 UTC, with 2 server node reboots, each a probable 1-3 minutes downtime, in order to update the redundant disks software that caused the slowdown, and also a kernel security update.
2020-05-04 10:20 UTC: Maintenance work has been completed. System is under continued observation.

2020-05-06 14:40 UTC: A database overload is slowing down and intermittently interrupting database services. We are looking into restoring services quickly.
2020-05-06 21:40 UTC: Investigations have been completed, database server has been reconfigured and restarted, then tested and put back online. System stays under observation.

2020-05-14 14:00 UTC: We are investigating and fixing asap a big slowdown on mu01 and mu11 servers. We will update this announcement when solved.
2020-05-14 14:35 UTC: Investigations have been completed, server cache has been cleared and apache2 restarted, then tested and put back online. System stays under observation.
2020-05-14 15:00 UTC: We have re-opened this case, as we experience andother slowdown.
2020-05-14 15:20 UTC: Server cache has been re-cleaned and system is back. We are investigating.

2020-05-15 13:00 UTC: Slowdowns reappeared, we are on it.
2020-05-15 15:30 UTC: We had to take offline the server, check it, and restarted it on a different cluster node with a larger memory allocation. We are now investigating the root cause.
2020-05-17 16:00 UTC: This Sunday morning early we could finally narrow in to the randomly slowing down element in the cluster-chain, our distributed redundant file-system is under particular situations generating massive amount of disk-data. We are now continuing investigation info this.

2020-05-20 09:25 UTC: Slowdowns here again, we are looking into it.
2020-05-20 09:43 UTC: Overload resolved, we are monitoring the system.

2020-06-14 16:46 UTC: Some intermittant slowdowns reappeared this Sunday afternoon, being looked into
2020-06-14 17:10 UTC: We have rebooted the server, and are investigating logs to find root cause
2020-06-15 10:30 UTC: Unplanned maintenance: We are rebooting the server for upgrade
2020-06-15 11:10 UTC: System rebooted, continuing observation and planing a migration to a new system

« Back