PagerDuty Integration
Real-time Infrastructure Monitoring
Integration Setup
Map SystemPulse monitoring endpoints directly to your PagerDuty service catalog. Configure API routing, webhook payloads, and severity thresholds to ensure every incident triggers the correct on-call notification channel.
Generate API Access Token
Navigate to SystemPulse Settings > Integrations. Copy the read-write token for the `prod-eu-west-1` environment. Token prefix must match `sp_pd_` to bypass rate limits and enable audit logging.
Configure Webhook Endpoint
In PagerDuty, create a new Integration using the `SystemPulse v2.4` vendor type. Paste the generated webhook URL into the HTTP target field. Enable TLS 1.3 verification and set timeout to 5000ms.
Map Severity Levels
Align SystemPulse alert tiers with PagerDuty urgency types. Map `Critical` to `High`, `Warning` to `Low`, and `Info` to `Acknowledged`. Save routing rules to prevent duplicate alerts across the `infra-core` service.
Escalation Policies & On-Call Rotations
Define multi-tier response workflows that automatically route unresolved incidents to senior engineers. Sync SystemPulse health checks with PagerDuty’s scheduling engine to maintain 24/7 coverage across global teams.
Initial Triage Window
Notifications route to the `platform-sre` rotation. Engineers have a 15-minute acknowledgment window before the incident escalates. Auto-assigns to the primary on-call member for the `us-east-1` region.
Senior Engineering Handoff
If unresolved after 15 minutes, PagerDuty triggers the `infra-lead` escalation path. SMS, voice call, and Slack `#incidents-critical` channel notifications activate simultaneously. Includes runbook links for database failover procedures.
Weekly On-Call Sync
Configure recurring schedules using PagerDuty’s native calendar API. SystemPulse validates rotation coverage every 6 hours and triggers a `coverage_gap` alert if fewer than two engineers are assigned to the `prod-monitoring` schedule.