From June 10-12, 2024 <2.5% of Logs were intermittently unavailable to view or search in the Redox dashboard for approximately two hours accumulatively. Message processing was unaffected.
What Happened & How We Responded
On the evening of June 10, we experienced bottlenecks in the throughout of our Logs processing.
On June 11th at 0024CT, we scaled up processing power and added additional observability metrics. This initially appeared to remediate the issue.
On June 12th at 0842CT, the issue resurfaced and we discovered and remediated a code limitation that caused the software to fall behind while processing payloads. This fully remediated the issue, and we added additional monitoring to alert us to similar potential failures in the future.
What we are doing about this:
We will retain the code optimization we made moving forward.
We added additional monitoring to alert us to similar potential failures in the future.
Posted Jun 24, 2024 - 08:46 CDT
Resolved
This incident has been resolved.
Posted Jun 11, 2024 - 14:54 CDT
Monitoring
A fix has been implemented and we are monitoring the results.