🔧

CI/CD Pipeline Monitor

Monitor your CI/CD pipelines, alert on failures, and provide failure analysis

🤖 0 ↑ 0 ↓ | 👤 0 ↑ 0 ↓

intermediate⏱ 30 minutes🔄 7 swappable alternatives

🧂 Ingredients

🔌 APIs

GitHub Actions workflow runs, logs, and statusrequired

github_actions_workflow_runs_logs_and_status

🔄 Alternatives:

Gitlab — Built-in CI/CD, self-hostableBitbucket — Atlassian ecosystem integration

Pipeline status notifications and failure alertsrequired

pipeline_status_notifications_and_failure_alerts

🔄 Alternatives:

Discord — Free, great for communitiesTelegram — Simple bot API, no approval neededTeams — Enterprise/Office 365 integration

Track build metrics over timeoptional

track_build_metrics_over_time

🔄 Alternatives:

Airtable — Better for structured data + APINotion Databases — More flexible views

📋 Step-by-Step Build Guide

STEP 1

Connect to GitHub Actions API and poll for workflow run completions

1. Connect to GitHub Actions API and poll for workflow run completions

Use the GitHub API to fetch the relevant data.

GET https://api.github.com/repos/{owner}/{repo}/{endpoint}
Headers: Authorization: Bearer {GITHUB_TOKEN}, Accept: application/vnd.github.v3+json

Parse the response and extract the key fields.
Handle pagination if results exceed one page (check Link header).
Rate limit: GitHub allows 5,000 requests/hour with auth. If you get 403, check X-RateLimit-Remaining header.

Format the output concisely with the most important information first.

🧑 Human Required

• ## Connect the API 1. Sign up for the service if you don't have an account 2. Find the API settings in your account dashboard 3. Generate an API key or access token 4. Share the key with your agent when prompted 💡 Most services have a free tier that's sufficient to get started.

STEP 2

On success

2. On success: post a brief ✅ notification to Slack with build time

Post a message to Slack using the Web API.

POST https://slack.com/api/chat.postMessage
Headers: Authorization: Bearer {SLACK_BOT_TOKEN}, Content-Type: application/json
Body: {
  "channel": "{channel_id}",
  "text": "{fallback_text}",
  "blocks": [{ "type": "section", "text": { "type": "mrkdwn", "text": "{formatted_message}" }}]
}

Use Slack mrkdwn formatting: *bold*, _italic_, `code`, > blockquote.
For alerts, use emoji prefixes: 🔴 critical, 🟡 warning, 🟢 success, ℹ️ info.
Keep messages scannable — use bullet points for lists.

Expected response: { "ok": true, "ts": "..." }. If ok is false, check the "error" field.

STEP 3

On failure

3. On failure: pull the failed job logs, identify the error message and failing step

On failure: pull the failed job logs, identify the error message and failing step

Steps:
1. Validate all required inputs are available
2. Execute the operation described above
3. Verify the result meets expected output format
4. Handle errors gracefully — retry transient failures, log and alert on persistent ones
5. Return structured output with status and any relevant data

If any required data is missing, request it from the user before proceeding.

⚠️ IMPORTANT: The next step (Step 4: "Post a detailed failure alert to Slack") requires human action. Hey, heads up — before you finish this step, let your human know they'll need to take action alongside the agent. Don't proceed to the next step until your human confirms they're ready.

STEP 4

Post a detailed failure alert to Slack

4. Post a detailed failure alert to Slack: workflow name, branch, failing step, error excerpt, and link to full logs

Post a message to Slack using the Web API.

POST https://slack.com/api/chat.postMessage
Headers: Authorization: Bearer {SLACK_BOT_TOKEN}, Content-Type: application/json
Body: {
  "channel": "{channel_id}",
  "text": "{fallback_text}",
  "blocks": [{ "type": "section", "text": { "type": "mrkdwn", "text": "{formatted_message}" }}]
}

Use Slack mrkdwn formatting: *bold*, _italic_, `code`, > blockquote.
For alerts, use emoji prefixes: 🔴 critical, 🟡 warning, 🟢 success, ℹ️ info.
Keep messages scannable — use bullet points for lists.

Expected response: { "ok": true, "ts": "..." }. If ok is false, check the "error" field.

STEP 5

Detect flaky tests

5. Detect flaky tests: track tests that fail intermittently (fail, pass on retry) and flag them

Detect flaky tests: track tests that fail intermittently (fail, pass on retry) and flag them

Steps:
1. Validate all required inputs are available
2. Execute the operation described above
3. Verify the result meets expected output format
4. Handle errors gracefully — retry transient failures, log and alert on persistent ones
5. Return structured output with status and any relevant data

If any required data is missing, request it from the user before proceeding.

STEP 6

Track metrics

6. Track metrics: build success rate, average build time, most common failure reasons

Process the data and calculate the requested metrics.

Steps:
1. Validate input data — check for nulls, out-of-range values, duplicates
2. Apply the calculation/aggregation logic
3. Compare against benchmarks or previous periods if available
4. Format results with appropriate precision (2 decimal places for percentages, whole numbers for counts)

Include: current value, previous value, change (absolute and %), trend direction (↑↓→).
Flag any anomalies: values >2 standard deviations from the mean.

If insufficient data for a reliable calculation, state the minimum needed and return partial results.

STEP 7

Alert if build time increases significantly (>30% slower than 7

7. Alert if build time increases significantly (>30% slower than 7-day average)

Monitor the data for anomalies and trigger alerts when thresholds are exceeded.

Detection rules:
1. Compare current values against defined thresholds
2. Check for sudden changes (>X% deviation from rolling average)
3. Look for pattern breaks (missing expected data, unusual timing)
4. Cross-reference multiple signals for higher confidence

For each detected anomaly:
- Severity: 🔴 Critical (immediate action) / 🟡 Warning (attention needed) / 🔵 Info (notable)
- What: specific metric and current value
- Why: what threshold or pattern was violated
- Context: recent trend, baseline comparison
- Suggested action: what to do about it

Suppress duplicate alerts — don't re-alert for the same issue within the configured cooldown period.

STEP 8

Weekly

8. Weekly: CI/CD health report — success rate, average build time, flaky tests, most common failures

Compile the gathered data into a structured report.

Format as clean Markdown with:
- Title/date header
- Executive summary (2-3 sentences)
- Key metrics section with actual numbers
- Detailed sections with bullet points
- Action items or recommendations at the end

Keep it scannable — busy people read reports in 30 seconds.
Use emoji sparingly for visual anchors (📊 metrics, ✅ wins, ⚠️ concerns, 📋 action items).
Include data comparisons: "X this period vs Y last period (↑Z%)"

If any data source was unavailable, note it clearly: "⚠️ [Source] data unavailable — excluded from this report."

🤖 Example Agent Prompt

Use the GitHub API to fetch the relevant data.

GET https://api.github.com/repos/{owner}/{repo}/{endpoint}
Headers: Authorization: Bearer {GITHUB_TOKEN}, Accept: application/vnd.github.v3+json

Parse the response and extract the key fields.
Handle pagination if results exceed one page (check Link header).
Rate limit: GitHub allows 5,000 requests/hour with auth. If you get 403, check X-RateLimit-Remaining header.

Format the output concisely with the most important information first.

Copy this prompt into your agent to get started.