MLLP Inbound Message Processing Degradation

Incident Report for Redox

Postmortem

Root cause

On Monday 10/4, ongoing infrastructure scalability tasks inadvertently caused increased DNS lookups by services between 2:41PM CT and 7:00PM CT.

What Happened?

DNS performance degradation led to application services degrading as a result. We reverted partial changes on Monday (10/4) evening, which immediately improved performance. As volume increased we experienced a more severe degradation on Tuesday 10/5. Reverting all changes deployed between 2:41PM CT and 7:00PM CT on Monday 10/4 resolved DNS lookup performance, and returned message processing to normal speed.

Impact on customers

Customers with high-volume MLLP feeds experienced messages queueing up on the HCO side due to a sporadic increased latency for all HTTP requests requiring DNS resolution on the order of seconds (5-10).

Learnings / Follow-ups

Internal teams are evaluating technology which allows us to improve our testing and alerting on network degradation.

Posted Oct 15, 2021 - 15:05 CDT

Resolved

This incident has been resolved.

Posted Oct 05, 2021 - 16:45 CDT

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Oct 05, 2021 - 12:13 CDT

Identified

The issue has been identified and a fix is being implemented.

Posted Oct 05, 2021 - 11:50 CDT

Update

We are continuing to investigate this issue.

Posted Oct 05, 2021 - 11:50 CDT

Update

In addition to MLLP Routing we are experiencing a delay within our dashboard.

Posted Oct 05, 2021 - 10:40 CDT

Update

We are continuing to investigate this issue.

Posted Oct 05, 2021 - 10:37 CDT

Investigating

We are currently experiencing an issue with our inbound MLLP processing. We are actively investigating and should have an update shortly. If you have any questions please contact support@redoxengine.com.

Posted Oct 05, 2021 - 10:34 CDT

This incident affected: Traffic Processing and Dashboard Tools.