Skip to main content

Overview

This endpoint returns all alerts that have been triggered by a specific check. This is useful for analyzing the alert history and patterns for individual checks to understand their reliability and performance trends.

Response Example

{
  "data": [
    {
      "id": "alert_789123456",
      "checkId": "check_api_health",
      "checkName": "Production API Health Check",
      "checkType": "api",
      "alertType": "check_failure",
      "severity": "critical",
      "status": "resolved",
      "message": "Check failed: HTTP 500 Internal Server Error",
      "triggeredAt": "2024-01-25T14:30:00.000Z",
      "resolvedAt": "2024-01-25T14:45:00.000Z",
      "duration": 900000,
      "location": {
        "id": "us-east-1",
        "name": "N. Virginia, USA"
      },
      "checkResult": {
        "id": "result_123456789",
        "responseTime": 5240,
        "statusCode": 500,
        "error": "Internal Server Error",
        "url": "https://api.production.com/health",
        "headers": {
          "content-type": "application/json",
          "server": "nginx/1.18.0"
        },
        "body": "{\"error\": \"Database connection failed\"}"
      },
      "alertChannel": {
        "id": "channel_email_prod",
        "name": "Production Team Email",
        "type": "email",
        "target": "prod-team@example.com"
      },
      "escalationLevel": 2,
      "acknowledgedBy": {
        "id": "user_456",
        "name": "Sarah DevOps",
        "email": "sarah@example.com"
      },
      "acknowledgedAt": "2024-01-25T14:32:00.000Z",
      "resolutionNotes": "Database connection pool exhausted. Restarted database service.",
      "tags": ["api", "production", "critical", "database"]
    },
    {
      "id": "alert_456789123",
      "checkId": "check_api_health",
      "checkName": "Production API Health Check",
      "checkType": "api",
      "alertType": "performance_degradation",
      "severity": "medium",
      "status": "resolved",
      "message": "Response time exceeded threshold: 2.1s > 2.0s",
      "triggeredAt": "2024-01-24T09:15:00.000Z",
      "resolvedAt": "2024-01-24T09:22:00.000Z",
      "duration": 420000,
      "location": {
        "id": "eu-west-1",
        "name": "Ireland, Europe"
      },
      "checkResult": {
        "id": "result_456789123",
        "responseTime": 2100,
        "statusCode": 200,
        "url": "https://api.production.com/health"
      },
      "alertChannel": {
        "id": "channel_slack_ops",
        "name": "Operations Slack",
        "type": "slack",
        "target": "#ops-alerts"
      },
      "escalationLevel": 0,
      "acknowledgedBy": null,
      "acknowledgedAt": null,
      "resolutionNotes": "Performance returned to normal after CDN cache warming.",
      "tags": ["api", "production", "performance"]
    }
  ],
  "meta": {
    "checkId": "check_api_health",
    "checkName": "Production API Health Check",
    "totalAlerts": 47,
    "currentPage": 1,
    "totalPages": 5,
    "limit": 10,
    "timeframe": {
      "from": "2024-01-01T00:00:00.000Z",
      "to": "2024-01-31T23:59:59.999Z"
    },
    "summary": {
      "totalFailures": 12,
      "totalRecoveries": 12,
      "totalPerformanceDegradations": 23,
      "averageResolutionTime": 480000,
      "longestOutage": 1800000,
      "uptimePercentage": 99.1,
      "mttr": 480,
      "mttd": 120
    }
  }
}

Common Use Cases

Check Reliability Assessment

Evaluate reliability and uptime of individual checks
GET /v1/check-alerts/{checkId}?period=30d&includeSummary=true

Performance Trend Analysis

Analyze performance degradation patterns over time
GET /v1/check-alerts/{checkId}?alertType=performance_degradation

Incident Response Analysis

Evaluate team response times and incident handling
GET /v1/check-alerts/{checkId}?severity=critical&status=resolved

Alert Optimization

Optimize alert thresholds and reduce false positives
GET /v1/check-alerts/{checkId}?period=90d&limit=100

Query Parameters

  • from (string): Start date for alerts (ISO 8601 format)
  • to (string): End date for alerts (ISO 8601 format)
  • period (string): Predefined time period (24h, 7d, 30d, 90d)
Example:
?from=2024-01-01T00:00:00Z&to=2024-01-31T23:59:59Z
Default: Last 30 days if no parameters provided
  • alertType (string): Filter by alert type (check_failure, check_recovery, performance_degradation)
  • severity (string): Filter by severity level (low, medium, high, critical)
  • status (string): Filter by alert status (triggered, acknowledged, resolved)
  • location (string): Filter by monitoring location
Example:
?alertType=check_failure&severity=critical&status=resolved
  • page (integer): Page number (default: 1)
  • limit (integer): Number of alerts per page (default: 10, max: 100)
  • sortBy (string): Sort field (triggeredAt, duration, severity)
  • sortOrder (string): Sort order (asc, desc, default: desc)
  • includeSummary (boolean): Include alert summary statistics (default: true)
Default: Returns first 10 alerts sorted by most recent

Alert Summary Metrics

Reliability Metrics

  • Uptime Percentage: Overall availability during the time period
  • Total Failures: Number of failure alerts
  • Total Recoveries: Number of recovery alerts
  • MTTR: Mean Time To Recovery (seconds)
  • MTTD: Mean Time To Detection (seconds)

Performance Metrics

  • Performance Degradations: Number of performance alerts
  • Average Resolution Time: Mean time to resolve issues
  • Longest Outage: Duration of the longest outage
  • Escalation Rate: Percentage of alerts that escalated

Alert Patterns

  • Peak Alert Hours: Times when most alerts occur
  • Location Breakdown: Alerts by monitoring location
  • Severity Distribution: Breakdown of alert severities
  • Recovery Trends: How quickly issues are resolved

Team Response

  • Acknowledgment Rate: Percentage of alerts acknowledged
  • Response Time: Time from alert to acknowledgment
  • Resolution Notes: Documentation of fixes applied
  • Escalation Paths: How alerts were escalated

Alert Analysis Use Cases

Evaluate the reliability of individual checks:
  • Calculate uptime percentages for SLA reporting
  • Identify patterns in check failures
  • Track improvement trends after optimizations
  • Compare reliability across different time periods
// Example: Calculate check reliability metrics
const alerts = response.data;
const summary = response.meta.summary;

const reliabilityScore = {
  uptime: summary.uptimePercentage,
  mttr: summary.mttr / 60, // Convert to minutes
  mttd: summary.mttd / 60, // Convert to minutes
  failureRate: (summary.totalFailures / summary.totalAlerts) * 100
};

console.log(`Check reliability: ${reliabilityScore.uptime}% uptime`);
console.log(`Average recovery time: ${reliabilityScore.mttr} minutes`);
Analyze performance degradation patterns:
  • Track response time alert frequency
  • Identify performance regression patterns
  • Monitor seasonal performance trends
  • Detect gradual performance degradation
# Example: Analyze performance degradation trends
performance_alerts = [
    alert for alert in alerts 
    if alert['alertType'] == 'performance_degradation'
]

# Group by time of day to find performance patterns
from collections import defaultdict
hourly_performance_issues = defaultdict(int)

for alert in performance_alerts:
    hour = datetime.fromisoformat(alert['triggeredAt'].replace('Z', '+00:00')).hour
    hourly_performance_issues[hour] += 1

peak_hours = sorted(hourly_performance_issues.items(), key=lambda x: x[1], reverse=True)[:3]
print(f"Peak performance issue hours: {peak_hours}")
Evaluate incident response effectiveness:
  • Measure team response times
  • Track escalation patterns
  • Analyze resolution approaches
  • Identify areas for process improvement
# Example: Get critical alerts with response times
curl -X GET "https://api.checklyhq.com/v1/check-alerts/check_api_health?severity=critical&includeSummary=true" \
  -H "Authorization: Bearer cu_1234567890abcdef" \
  -H "X-Checkly-Account: 550e8400-e29b-41d4-a716-446655440000"
Optimize alert thresholds and configurations:
  • Analyze false positive rates
  • Evaluate alert noise levels
  • Optimize severity classifications
  • Fine-tune alert thresholds
// Example: Analyze alert patterns for optimization
const alertsByType = alerts.reduce((acc, alert) => {
  const key = `${alert.alertType}_${alert.severity}`;
  if (!acc[key]) {
    acc[key] = { count: 0, avgDuration: 0, escalations: 0 };
  }
  acc[key].count++;
  acc[key].avgDuration += alert.duration || 0;
  if (alert.escalationLevel > 0) acc[key].escalations++;
  return acc;
}, {});

// Calculate optimization insights
Object.keys(alertsByType).forEach(key => {
  const data = alertsByType[key];
  data.avgDuration = data.avgDuration / data.count;
  data.escalationRate = (data.escalations / data.count) * 100;
  
  if (data.escalationRate < 5 && data.avgDuration < 300000) {
    console.log(`Consider raising threshold for ${key}: low escalation rate and quick resolution`);
  }
});

Additional Examples

curl -X GET "https://api.checklyhq.com/v1/check-alerts/check_api_health?period=30d&includeSummary=true" \
  -H "Authorization: Bearer cu_1234567890abcdef" \
  -H "X-Checkly-Account: 550e8400-e29b-41d4-a716-446655440000"
This endpoint provides detailed alert history for individual checks, enabling deep analysis of check reliability, performance patterns, and incident response effectiveness.