CI/CD Pipeline Monitor
Monitor your CI/CD pipelines, alert on failures, and provide failure analysis
š§ Ingredients
š APIs
github_actions_workflow_runs_logs_and_status
š Alternatives:
pipeline_status_notifications_and_failure_alerts
š Alternatives:
track_build_metrics_over_time
š Alternatives:
š Step-by-Step Build Guide
Connect to GitHub Actions API and poll for workflow run completions
1. Connect to GitHub Actions API and poll for workflow run completions
Use the GitHub API to fetch the relevant data.
GET https://api.github.com/repos/{owner}/{repo}/{endpoint}
Headers: Authorization: Bearer {GITHUB_TOKEN}, Accept: application/vnd.github.v3+json
Parse the response and extract the key fields.
Handle pagination if results exceed one page (check Link header).
Rate limit: GitHub allows 5,000 requests/hour with auth. If you get 403, check X-RateLimit-Remaining header.
Format the output concisely with the most important information first.š§ Human Required
- ⢠## Connect the API 1. Sign up for the service if you don't have an account 2. Find the API settings in your account dashboard 3. Generate an API key or access token 4. Share the key with your agent when prompted š” Most services have a free tier that's sufficient to get started.
On success
2. On success: post a brief ā notification to Slack with build time
Post a message to Slack using the Web API.
POST https://slack.com/api/chat.postMessage
Headers: Authorization: Bearer {SLACK_BOT_TOKEN}, Content-Type: application/json
Body: {
"channel": "{channel_id}",
"text": "{fallback_text}",
"blocks": [{ "type": "section", "text": { "type": "mrkdwn", "text": "{formatted_message}" }}]
}
Use Slack mrkdwn formatting: *bold*, _italic_, `code`, > blockquote.
For alerts, use emoji prefixes: š“ critical, š” warning, š¢ success, ā¹ļø info.
Keep messages scannable ā use bullet points for lists.
Expected response: { "ok": true, "ts": "..." }. If ok is false, check the "error" field.On failure
3. On failure: pull the failed job logs, identify the error message and failing step
On failure: pull the failed job logs, identify the error message and failing step Steps: 1. Validate all required inputs are available 2. Execute the operation described above 3. Verify the result meets expected output format 4. Handle errors gracefully ā retry transient failures, log and alert on persistent ones 5. Return structured output with status and any relevant data If any required data is missing, request it from the user before proceeding. ā ļø IMPORTANT: The next step (Step 4: "Post a detailed failure alert to Slack") requires human action. Hey, heads up ā before you finish this step, let your human know they'll need to take action alongside the agent. Don't proceed to the next step until your human confirms they're ready.
Post a detailed failure alert to Slack
4. Post a detailed failure alert to Slack: workflow name, branch, failing step, error excerpt, and link to full logs
Post a message to Slack using the Web API.
POST https://slack.com/api/chat.postMessage
Headers: Authorization: Bearer {SLACK_BOT_TOKEN}, Content-Type: application/json
Body: {
"channel": "{channel_id}",
"text": "{fallback_text}",
"blocks": [{ "type": "section", "text": { "type": "mrkdwn", "text": "{formatted_message}" }}]
}
Use Slack mrkdwn formatting: *bold*, _italic_, `code`, > blockquote.
For alerts, use emoji prefixes: š“ critical, š” warning, š¢ success, ā¹ļø info.
Keep messages scannable ā use bullet points for lists.
Expected response: { "ok": true, "ts": "..." }. If ok is false, check the "error" field.Detect flaky tests
5. Detect flaky tests: track tests that fail intermittently (fail, pass on retry) and flag them
Detect flaky tests: track tests that fail intermittently (fail, pass on retry) and flag them Steps: 1. Validate all required inputs are available 2. Execute the operation described above 3. Verify the result meets expected output format 4. Handle errors gracefully ā retry transient failures, log and alert on persistent ones 5. Return structured output with status and any relevant data If any required data is missing, request it from the user before proceeding.
Track metrics
6. Track metrics: build success rate, average build time, most common failure reasons
Process the data and calculate the requested metrics. Steps: 1. Validate input data ā check for nulls, out-of-range values, duplicates 2. Apply the calculation/aggregation logic 3. Compare against benchmarks or previous periods if available 4. Format results with appropriate precision (2 decimal places for percentages, whole numbers for counts) Include: current value, previous value, change (absolute and %), trend direction (āāā). Flag any anomalies: values >2 standard deviations from the mean. If insufficient data for a reliable calculation, state the minimum needed and return partial results.
Alert if build time increases significantly (>30% slower than 7
7. Alert if build time increases significantly (>30% slower than 7-day average)
Monitor the data for anomalies and trigger alerts when thresholds are exceeded. Detection rules: 1. Compare current values against defined thresholds 2. Check for sudden changes (>X% deviation from rolling average) 3. Look for pattern breaks (missing expected data, unusual timing) 4. Cross-reference multiple signals for higher confidence For each detected anomaly: - Severity: š“ Critical (immediate action) / š” Warning (attention needed) / šµ Info (notable) - What: specific metric and current value - Why: what threshold or pattern was violated - Context: recent trend, baseline comparison - Suggested action: what to do about it Suppress duplicate alerts ā don't re-alert for the same issue within the configured cooldown period.
Weekly
8. Weekly: CI/CD health report ā success rate, average build time, flaky tests, most common failures
Compile the gathered data into a structured report. Format as clean Markdown with: - Title/date header - Executive summary (2-3 sentences) - Key metrics section with actual numbers - Detailed sections with bullet points - Action items or recommendations at the end Keep it scannable ā busy people read reports in 30 seconds. Use emoji sparingly for visual anchors (š metrics, ā wins, ā ļø concerns, š action items). Include data comparisons: "X this period vs Y last period (āZ%)" If any data source was unavailable, note it clearly: "ā ļø [Source] data unavailable ā excluded from this report."
š¤ Example Agent Prompt
Use the GitHub API to fetch the relevant data.
GET https://api.github.com/repos/{owner}/{repo}/{endpoint}
Headers: Authorization: Bearer {GITHUB_TOKEN}, Accept: application/vnd.github.v3+json
Parse the response and extract the key fields.
Handle pagination if results exceed one page (check Link header).
Rate limit: GitHub allows 5,000 requests/hour with auth. If you get 403, check X-RateLimit-Remaining header.
Format the output concisely with the most important information first.Copy this prompt into your agent to get started.