Data capture outage for orgs on EU data center
Incident Report for Fullstory
Postmortem

Postmortem

2023/09/13 EU1 Data Capture Outage

Due to an infrastructure issue caused by a sudden spike of unanticipated traffic, end user session data was not captured for both web and mobile sessions on 2023-09-13 between 18:15 UTC and 19:05 UTC, for all Orgs hosted in our European (EU1) data center. Orgs hosted on our North American (NA1) data center were not impacted.

Any existing sessions being captured during the impacted timeframe may have gaps in intermediate pages, resulting in missing segments of time in playback, and any new sessions (web or mobile) started during this time may have been dropped. Analytics features that rely on this session and event data are also impacted, which means that there may be missing data points in metrics, funnels, dashboards, and conversions.

This postmortem details the impact on our customers, the root cause of the issue, how we addressed the problem, and the steps we're taking to prevent this and similar types of issues in the future.

Customer Impact

All Orgs on our EU data center were impacted by this incident. You can check if your Org was impacted by seeing if your Org’s ID ends in “-eu1”. Any web and mobile capture data coming to FullStory between 18:15 UTC and 19:05 UTC was not captured and is not recoverable.

Root Cause

On 2023-09-13 at 18:15 UTC our EU data center experienced an unexpected spike in unanticipated traffic. Our backend data capture service was unable to scale fast enough to accommodate the traffic increase and the service eventually crashed. The service would then crash again on attempted restarts as it could not scale fast enough to handle the incoming traffic.

Resolution

Our on-call operations team was immediately alerted and intervened to resolve this issue after it presented itself, including scaling up the data capture service manually to resume proper operation of our data capture service.

Process Changes and Prevention

So far we have:

  • Scaled up existing resources for our data capture service to handle the new volume of traffic
  • Updated our infrastructure so that the data capture service will be able to scale up faster in the future

To prevent a recurrence of this incident we will be:

  • Modifying our service scaling policies even more so that we can handle similar spikes like this more smoothly
  • Improving our monitoring and alerting so we can address these kinds of issues more quickly

We deeply regret this incident and invite any FullStory customer who was materially affected to contact support@fullstory.com. We stand by ready to fully address all of your concerns.

Posted Sep 20, 2023 - 15:05 EDT

Resolved
End user sessions and data were not captured from 2:25pm ET to 3:05pm ET for accounts on our EU data center. All missing activity during this time period is non-recoverable and may impact Metrics, Funnels, Dashboards, and Conversions. During this time, native mobile builds could also have failed. Re-running the build will properly upload the assets.

This issue did not impact any FullStory accounts utilizing our US data center. (Your FullStory URLs would start with app.eu1.fullstory.com if you are utilizing our EU data center.)
Posted Sep 15, 2023 - 15:33 EDT
This incident affected: Data Capture (Web Capture, Native Mobile Capture) and API.