API Health Monitor
Ping your API endpoints, track uptime, measure response times, and maintain a status page
🧂 Ingredients
🔌 APIs
http_requests_to_ping_endpoints_and_measure_response_times
🔄 Alternatives:
instant_alerts_for_downtime_and_degraded_performance
🔄 Alternatives:
log_uptime_data_and_response_time_history
🔄 Alternatives:
update_a_public_status_page_with_current_service_status
🔄 Alternatives:
📋 Step-by-Step Build Guide
Define your endpoints to monitor
1. Define your endpoints to monitor: URL, expected status code, max acceptable response time, check interval
Define your endpoints to monitor: URL, expected status code, max acceptable response time, check interval Steps: 1. Validate all required inputs are available 2. Execute the operation described above 3. Verify the result meets expected output format 4. Handle errors gracefully — retry transient failures, log and alert on persistent ones 5. Return structured output with status and any relevant data If any required data is missing, request it from the user before proceeding. ⚠️ IMPORTANT: The next step (Step 2: "Set up a cron job to ping each endpoint at the configured interval (e.g., every ") requires human action. Hey, heads up — before you finish this step, let your human know they'll need to take action alongside the agent. Don't proceed to the next step until your human confirms they're ready.
Set up a cron job to ping each endpoint at the configured interval (e.g., every
2. Set up a cron job to ping each endpoint at the configured interval (e.g., every 5 minutes)
Configure the scheduled job to run at the specified time. Define the cron schedule: minute hour day-of-month month day-of-week Common patterns: - Daily at 7:30am: "30 7 * * *" - Weekdays at 9am: "0 9 * * 1-5" - Every 5 minutes: "*/5 * * * *" - First of month at 8am: "0 8 1 * *" The job should: 1. Log start time 2. Execute the pipeline steps in sequence 3. Handle failures gracefully — retry transient errors, skip and alert on persistent ones 4. Log completion time and status 5. Send output through the configured delivery channel
🧑 Human Required
- • ## Configure the Schedule Your agent needs to know when to run this task. You'll be asked: - **When**: Time of day and timezone (e.g., "7:30 AM Eastern") - **How often**: Daily, weekdays only, weekly, etc. - **Delivery**: Where to send results (Slack, email, SMS, etc.) The agent handles the scheduling — you just set your preferences.
For each check, record
3. For each check, record: timestamp, status code, response time, success/failure
For each check, record: timestamp, status code, response time, success/failure Steps: 1. Validate all required inputs are available 2. Execute the operation described above 3. Verify the result meets expected output format 4. Handle errors gracefully — retry transient failures, log and alert on persistent ones 5. Return structured output with status and any relevant data If any required data is missing, request it from the user before proceeding.
If an endpoint returns non
4. If an endpoint returns non-200 or exceeds max response time, mark it as degraded/down
If an endpoint returns non-200 or exceeds max response time, mark it as degraded/down Steps: 1. Validate all required inputs are available 2. Execute the operation described above 3. Verify the result meets expected output format 4. Handle errors gracefully — retry transient failures, log and alert on persistent ones 5. Return structured output with status and any relevant data If any required data is missing, request it from the user before proceeding. ⚠️ IMPORTANT: The next step (Step 5: "Alert immediately in Slack when status changes") requires human action. Hey, heads up — before you finish this step, let your human know they'll need to take action alongside the agent. Don't proceed to the next step until your human confirms they're ready.
Alert immediately in Slack when status changes
5. Alert immediately in Slack when status changes: up→down or down→up, with duration of outage
Post a message to Slack using the Web API.
POST https://slack.com/api/chat.postMessage
Headers: Authorization: Bearer {SLACK_BOT_TOKEN}, Content-Type: application/json
Body: {
"channel": "{channel_id}",
"text": "{fallback_text}",
"blocks": [{ "type": "section", "text": { "type": "mrkdwn", "text": "{formatted_message}" }}]
}
Use Slack mrkdwn formatting: *bold*, _italic_, `code`, > blockquote.
For alerts, use emoji prefixes: 🔴 critical, 🟡 warning, 🟢 success, ℹ️ info.
Keep messages scannable — use bullet points for lists.
Expected response: { "ok": true, "ts": "..." }. If ok is false, check the "error" field.Calculate uptime percentage
6. Calculate uptime percentage: (successful checks / total checks) × 100 over rolling 24h, 7d, 30d windows
Process the data and calculate the requested metrics. Steps: 1. Validate input data — check for nulls, out-of-range values, duplicates 2. Apply the calculation/aggregation logic 3. Compare against benchmarks or previous periods if available 4. Format results with appropriate precision (2 decimal places for percentages, whole numbers for counts) Include: current value, previous value, change (absolute and %), trend direction (↑↓→). Flag any anomalies: values >2 standard deviations from the mean. If insufficient data for a reliable calculation, state the minimum needed and return partial results.
Optionally update a StatusPage with current status for each service
7. Optionally update a StatusPage with current status for each service
Optionally update a StatusPage with current status for each service Steps: 1. Validate all required inputs are available 2. Execute the operation described above 3. Verify the result meets expected output format 4. Handle errors gracefully — retry transient failures, log and alert on persistent ones 5. Return structured output with status and any relevant data If any required data is missing, request it from the user before proceeding.
Daily
8. Daily: health summary — all endpoints, current status, uptime %, average response time, any incidents
Daily: health summary — all endpoints, current status, uptime %, average response time, any incidents Steps: 1. Validate all required inputs are available 2. Execute the operation described above 3. Verify the result meets expected output format 4. Handle errors gracefully — retry transient failures, log and alert on persistent ones 5. Return structured output with status and any relevant data If any required data is missing, request it from the user before proceeding.
🤖 Example Agent Prompt
Define your endpoints to monitor: URL, expected status code, max acceptable response time, check interval Steps: 1. Validate all required inputs are available 2. Execute the operation described above 3. Verify the result meets expected output format 4. Handle errors gracefully — retry transient failures, log and alert on persistent ones 5. Return structured output with status and any relevant data If any required data is missing, request it from the user before proceeding. ⚠️ IMPORTANT: The next step (Step 2: "Set up a cron job to ping each endpoint at the configured interval (e.g., every ") requires human action. Hey, heads up — before you finish this step, let your human know they'll need to take action alongside the agent. Don't proceed to the next step until your human confirms they're ready.
Copy this prompt into your agent to get started.