Results-Routing Incident Retrospective
A blameless retrospective on a misrouting incident in a lab ↔ EMR results pipeline. A cached results report could be delivered to the wrong recipient when the report API failed and a manual re-push went out against the wrong matching values. I led the response: detection, containment, transparent disclosure through the Privacy Officer, and a systemic fix that removed the error class rather than asking people to be more careful. The focus throughout was the system that allowed the mistake, not the person who made it.
Anonymized and recreated for illustration. This retrospective is fully anonymized. It contains no patient data and no real practice, provider, company, or vendor names, and no real dates. The specifics, including the timeline and any figures, are representative and presented to show how I lead an incident, not to report a particular event.
How caching plus an API failure created a misrouting risk class
The setup: results reports were cached so the pipeline could keep serving them quickly and survive transient upstream hiccups. That cache was a reasonable performance and resilience choice on its own.
The failure mode: when the report API failed or returned incomplete data, the pipeline could fall back to a cached report that was stale or incorrect and deliver it to an EMR. Recovering from that meant a manual re-push of the correct report.
Why the recovery was fragile: a manual re-push depended on a person matching the correct HL7 MSH values and the right practice and provider by hand. That hand-matching is exactly the step where a report can be sent to the wrong recipient. In one incident, a report was delivered to the wrong recipient practice. The conditions, a cache that could serve bad data plus a manual recovery with no guardrails, formed a whole class of misrouting risk, not a one-off slip.
The incident timeline
IllustrativeI ran the response as a clear sequence so nothing was improvised under pressure: detect, contain the spread, escalate to the right owners, then resolve and confirm. The stages below are representative, with no real dates or clock times.
Detection
Mismatch surfaced
A recipient mismatch between the delivered report and the expected practice was flagged.
Containment
Stopped the spread
Paused further automated delivery on the affected path so no additional reports could misroute.
Escalation
Right owners in
Looped in the Privacy Officer and Engineering to handle disclosure and the technical investigation in parallel.
Resolution
Corrected and confirmed
Correct report delivered to the right recipient; written confirmation obtained that the misdelivered report was deleted.
Handling it transparently, by the book
Once the misroute was confirmed, the priority was correct, transparent handling rather than a quiet fix. I routed the disclosure through the people whose job it is to get it right and made transparency to both affected parties the default.
Engaged the Privacy Officer
Brought in the Privacy Officer for proper handling before acting, so disclosure followed the correct process rather than my best guess in the moment.
Notified both parties
Notified both affected parties for transparency: the intended recipient and the practice that received the report in error, so no one was left unaware.
Confirmed deletion
Obtained written confirmation that the incorrectly received report was deleted, closing the loop with a record rather than a verbal assurance.
Who I coordinated, and how comms flowed
An incident is a coordination problem as much as a technical one. I kept a small, clear set of owners moving in parallel and sequenced communication so the right party heard the right thing at the right time, with no mixed messages.
Privacy & Compliance
Owned the disclosure decision and process. Handoff: I gave them the confirmed facts and the affected parties; they set how and what we communicated.
Engineering
Paused the affected delivery path, ran the technical investigation, and built the systemic fix. Handoff: a shared, reproducible picture of the failure mode.
Support
Carried the approved messaging to the affected parties and captured the written deletion confirmation. Handoff: a single, agreed script from Privacy.
Program management
I sequenced the response, kept one source of truth on status, and made sure containment, disclosure, and the fix advanced together rather than tripping over each other.
Root-cause analysis
IllustrativeThe point of the analysis was to name the system conditions that let this happen, not to find someone to blame. Two contributing factors combined into the misroute, and both were fixable in the system itself.
Cache served bad data on failure
When the report API failed or returned incomplete data, the pipeline could fall back to a cached report that was stale or incorrect and deliver it anyway. There was no validation gate to stop a known-bad report from going out on the failure path.
Manual re-push had no guardrails
Recovery leaned on a person matching HL7 MSH values and the right practice and provider by hand. With no automated verification of that match before delivery, a single mismatch could route a report to the wrong recipient.
Engineering the error class out, not retraining humans
The durable fix was to make the misroute impossible by design rather than to ask people to be more careful on a fragile manual step. We closed both contributing factors in the pipeline itself and pulled the manual re-push off the critical path.
Cache validation on failure
On an API failure or incomplete response, the pipeline now validates the cached report before it can be delivered, so a stale or incomplete report is held rather than served. A known-bad report no longer leaves the system on the failure path.
Automated MSH / practice verification
Recipient matching is verified automatically before delivery: the HL7 MSH values and the practice and provider are checked against the expected destination, so a mismatch is caught by the system instead of riding on a hand-match.
The principle: reducing reliance on a manual re-push removed the step where a human had to get the matching right under pressure. The error class was designed out of the system, so the same mistake cannot recur the same way.
Blameless postmortem, and what changed afterward
- check_circleFix the system, not the person. The retrospective stayed blameless throughout. A manual step with no guardrails will eventually produce a wrong outcome regardless of who runs it, so the honest conclusion was to remove the fragile step, not to coach the person who hit it.
- check_circleGuardrails now prevent the misroute class. Cache validation on failure and automated recipient verification close both contributing factors, so a stale report cannot be served and a mismatched recipient cannot be delivered to without the system catching it.
- check_circleThe disclosure path is codified. Engaging the Privacy Officer first, notifying both parties, and capturing written confirmation of deletion became the known, repeatable response, so the next team to face an incident is not improvising the handling.