Blank pages in a subset of sessions.
Incident Report for Fullstory
Postmortem

Session Replay Corruption Postmortem

Due to a software defect, session replay data for approximately 1% of sessions captured between July 18 (approximately 16:07:21 UTC) and July 31 (20:45:29 UTC) was corrupted. Although the associated product analytics remain unaffected for web sessions, the originating sessions cannot be viewed in session replay. An “Unable to retrieve session” error will display if one of these sessions is accessed. In an attempt to recover lost sessions, mobile analytics data for the affected sessions were also deleted for some accounts. Your Customer Success Manager will be reaching out to you if your account was impacted by this deletion. If you do not have a Success Manager and believe you are missing mobile analytics data, please contact support@fullstory.com.

This postmortem details the impact on our customers, the root cause of the issue, how we addressed the problem, and the steps we're taking to prevent this and similar types of issues in the future.

Customer Impact

Approximately 1% of the web and mobile sessions captured during the incident window are unplayable. Product analytics for the mobile sessions impacted by the mitigation attempt are irrecoverable.

Root Cause

On July 18th, a code change was introduced to address clock skew during session creation. This change inadvertently affected the archival process for raw session data. Pages that were recorded and appeared to be "from the future" had their timestamps clamped down to the server time. If those pages were still sending data, the page would eventually be reinitialized with the "correct" future timestamp. As a result, these sessions were either completely discarded or misattributed in our event storage database.

During an attempt to repair raw session data for the missing web sessions, product analytics associated with the affected 1% of mobile sessions were inadvertently deleted.

Resolution

Upon identifying the defect, we promptly fixed and deployed an update, ensuring page archival for raw session data functioned correctly for all subsequent sessions. All sessions recorded post-incident are and will remain playable.

Process Changes and Prevention

To prevent a recurrence, we've implemented the following action items:

  • We've enhanced monitoring and alerting for failures impacting the archival of session replay data.
  • We've updated session replay metadata to be completely immutable to eliminate the possibility of raw event storage becoming corrupted.

We are also working on:

  • Eliminating the timestamp from the object key used for session replay archival, which will reduce the possibility of future clock-related session archival issues.
  • Integrating monitoring and alerting within the playback client to detect and log potential session storage issues which cause playback failure, in order to provide immediate detection of this type of issue.

We deeply regret this incident and invite any FullStory customer who was materially affected to contact support@fullstory.com. We stand by ready to fully address all of your concerns.

Posted Sep 08, 2023 - 10:30 EDT

Resolved
From July 18 to July 31 there was a bug in our raw data storage that affected a subset of sessions (~1%). These sessions currently fail to play back as expected. During attempted recovery, mobile analytics data for the affected sessions was also deleted for some accounts. Please see our detailed postmortem for more information.
Posted Jul 31, 2023 - 21:00 EDT