Resolved -
Date: February 5-6, 2024 (Start time after 22:00 UTC on February 5th; end time 09:20 UTC on February 6th)
Impact: The Shared Data Plane experienced an outage, resulting in the interruption of transaction processing.
Description: A thorough investigation of system logs revealed a disruption in network connectivity between nodes within the Hazelcast cluster. Hazelcast serves as the messaging layer between various components in our Orchestrator. The network disruption prevented communication between these critical components, halting message processing and consequently, transaction processing.
Corrective Actions & Future Plans:
A root cause analysis is underway to determine the specific cause of the network disruption within the Hazelcast cluster. Aside from this analysis, the team is planning to replace Hazelcast with an alternative messaging platform.
Enhancements to the monitoring framework are implemented to proactively detect and alert on similar connectivity issues, enabling prompt intervention and minimizing service disruption.
Feb 5, 13:00 PST