Service Unavailability Escalation Matrix

Last updated: January 19, 2026

We maintain a clearly defined escalation path with documented roles, responsibilities, and contact information to ensure timely and effective response when the Astra platform or any of its services are unavailable.

This escalation process is designed to minimize downtime, ensure accountability, and provide transparent communication during incidents.

Escalation Framework

Level 1: Automated Detection & Initial Triage

Trigger

Responsibility

  • On-call Engineering team

Actions

  • Acknowledge alerts

  • Perform initial diagnosis

  • Apply immediate remediation or rollback if possible

Availability

  • 24×7 on-call rotation

Level 2: Engineering Escalation

Trigger

  • Issue cannot be resolved within the defined response time

  • Incident impacts multiple customers or core platform functionality

Responsibility

  • Senior Engineers / Engineering Lead on-call

Actions

  • Deep technical investigation

  • Coordinate cross-service fixes

  • Communicate incident status internally

Level 3: Incident Management & Leadership

Trigger

  • Prolonged outage

  • Security, data integrity, or compliance impact

  • Customer-facing SLA risk

Responsibility

  • Incident Commander

  • Engineering Manager / CTO

Actions

  • Overall incident coordination

  • Decision-making on customer communications

  • External dependency escalation

  • Post-incident review initiation

Vendor & Third-Party Escalation

For incidents involving external dependencies (cloud providers, monitoring tools, or third-party integrations):

Responsibilities

  • Engineering Lead or Incident Commander initiates vendor escalation

Actions

  • Open priority support tickets with vendors

  • Engage vendor on-call or premium support channels

  • Track vendor response and resolution timelines

Communication & Contact Information

Internal Contacts

  • On-call Engineering: Rotating schedule maintained internally

  • Engineering Leadership: Escalated via defined severity thresholds

  • Incident Commander: Assigned per incident

Contact details and on-call schedules are maintained in internal runbooks and incident management systems and are available to authorized personnel.

Customer Communication

  • Status updates are provided via:

    • Public Status Page

    • Direct customer notifications for high-severity incidents

  • Communication cadence follows incident severity guidelines

Documentation & Review

  • Escalation paths, roles, and contact information are documented in internal runbooks.

  • On-call rotations are reviewed and updated regularly.

  • Every major incident undergoes a post-incident review with documented root cause analysis and corrective actions.

Continuous Improvement

We periodically test and refine our escalation process through incident simulations and real-world reviews to ensure readiness and effectiveness.