Get Alerted When Cron Jobs Fail (or Silently Don’t Run)
The problem
Your nightly backup job runs at 3am. It fails. You notice the next morning when someone asks why the data looks stale. The window where fixing the broken backup was cheap closed eight hours ago. Now you are restoring from a two-day-old snapshot while explaining in Slack what went wrong.
That is the visible failure mode. The invisible one is worse: cron jobs that run but do not actually do their job. The script returned exit code 0, so cron is satisfied, but the file was not written because a disk was full, the API call silently timed out, or the database connection dropped mid-transaction and the script’s error handling swallowed the exception. “Success” is a lie and nobody finds out.
And the worst one: cron jobs that do not run at all. The machine was rebooted and the cron daemon was not restarted. The crontab was accidentally wiped. The schedule was written incorrectly (0 3 * * * versus 3 0 * * *, which nobody catches until the missing output becomes obvious days later). No event fires because no job ran. There is nothing to notice because nothing happened.
The failure mode is specific: cron itself has no notification layer. You write a crontab entry, you hit save, and the only indication that anything is working is the downstream effect of the job, which is usually invisible until it is missing.
The general shape of the fix
The fix has two independent parts, because the failure modes are different.
Part one: explicit failure alerts. Wrap your cron job so that it posts to API Alerts on both success and failure. You get a notification when the job ran and completed, whether the completion was happy or not. This catches the “it ran but it failed” case.
Part two: heartbeat pattern for silent failures. Because a job that never runs cannot tell you it never ran, you need an external watchdog. The pattern is: something outside your cron job checks “did we receive the expected success event in the last N hours?” and alerts if the answer is no. This catches the “it never ran at all” case.
Part one is always worth doing because the snippet is tiny. Part two is worth doing for jobs whose silent failure would be catastrophic: backups, data pipelines, billing cycles, anything where “no output” is indistinguishable from “output was correct and nobody needed to look.”
For part one, the implementation looks slightly different per cron platform, but the pattern is always the same: run the job, capture the exit code, post to API Alerts with a success or failure message. Use a dedicated channel per environment so production backups and staging backups stay separate on your phone.
Implementation by cron platform
Linux cron with a wrapper script
The cleanest pattern is a wrapper script that runs your actual job and handles the notification based on exit status. Save this as /usr/local/bin/run-with-alerts.sh:
#!/bin/bash
# Usage: run-with-alerts.sh <job-name> <command...>
JOB_NAME="$1"
shift
OUTPUT=$("$@" 2>&1)
STATUS=$?
if [ $STATUS -eq 0 ]; then
curl -s -X POST https://api.apialerts.com/event \
-H "Authorization: Bearer $API_ALERTS_KEY" \
-H "Content-Type: application/json" \
-d "{
\"channel\": \"cron\",
\"event\": \"cron.$JOB_NAME.success\",
\"title\": \"$JOB_NAME completed\",
\"tags\": [\"cron\", \"success\"]
}"
else
# Truncate output so the alert stays readable
TAIL=$(echo "$OUTPUT" | tail -c 500 | sed 's/"/\\"/g')
curl -s -X POST https://api.apialerts.com/event \
-H "Authorization: Bearer $API_ALERTS_KEY" \
-H "Content-Type: application/json" \
-d "{
\"channel\": \"cron\",
\"event\": \"cron.$JOB_NAME.failure\",
\"title\": \"$JOB_NAME FAILED (exit $STATUS)\",
\"message\": \"$TAIL\",
\"tags\": [\"cron\", \"failure\"]
}"
fi
Then in your crontab:
0 3 * * * API_ALERTS_KEY=your-key /usr/local/bin/run-with-alerts.sh nightly-backup /opt/scripts/backup.sh
You now get a push notification on every cron run, tagged by job name, with the last 500 characters of output on failure. Use separate channels per job so nightly backups, hourly imports, and weekly cleanup do not pile into one noisy feed.
GCP Cloud Scheduler
Cloud Scheduler jobs can have HTTP targets directly, so the simplest path is to point the job at the API Alerts event endpoint. But that only alerts when Cloud Scheduler successfully invokes something; it does not alert on the outcome of the work itself.
The better pattern: Cloud Scheduler triggers a Cloud Function or Cloud Run service that does the actual work, and the function calls API Alerts on completion (success or failure). The function has access to the result, so it can include real context in the alert.
// Cloud Function (Node.js) called by Cloud Scheduler
import { ApiAlerts } from '@apialerts/apialerts-js'
ApiAlerts.configure(process.env.API_ALERTS_KEY)
export async function nightlyTask(req, res) {
try {
const result = await doTheWork()
await ApiAlerts.sendAsync({
channel: 'cron',
event: 'gcp.nightly.success',
title: 'Nightly task complete',
message: `Processed ${result.count} records`,
tags: ['gcp', 'cron', 'success'],
})
res.status(200).send('ok')
} catch (err) {
await ApiAlerts.sendAsync({
channel: 'cron',
event: 'gcp.nightly.failure',
title: 'Nightly task FAILED',
message: String(err),
tags: ['gcp', 'cron', 'failure'],
})
res.status(500).send('failed')
}
}
Kubernetes CronJobs
Kubernetes CronJobs wrap an ephemeral Pod that runs on a schedule. The cleanest pattern is to wrap the container command in a shell that reports to API Alerts on completion.
apiVersion: batch/v1
kind: CronJob
metadata:
name: nightly-backup
spec:
schedule: "0 3 * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: mycompany/backup:latest
env:
- name: API_ALERTS_KEY
valueFrom:
secretKeyRef:
name: api-alerts
key: key
command: ["/bin/sh", "-c"]
args:
- |
if /opt/backup.sh; then
curl -s -X POST https://api.apialerts.com/event \
-H "Authorization: Bearer $API_ALERTS_KEY" \
-d '{"channel":"cron","event":"k8s.backup.success","title":"Backup complete","tags":["k8s","backup"]}'
else
curl -s -X POST https://api.apialerts.com/event \
-H "Authorization: Bearer $API_ALERTS_KEY" \
-d '{"channel":"cron","event":"k8s.backup.failure","title":"Backup FAILED","tags":["k8s","backup","failure"]}'
fi
restartPolicy: OnFailure
For more complex jobs, consider a small “notify” sidecar or a Kubernetes Event watcher that catches Job completion events and reports them, but the inline shell pattern covers the common case.
GitHub Actions schedule triggers
If you use GitHub Actions as a cron replacement (common for teams that already live in GitHub), use the same apialerts/notify-action@v2 pattern from the CI/CD failure alerts use case, just with a schedule: trigger instead of on: push:
name: Nightly backup
on:
schedule:
- cron: '0 3 * * *'
jobs:
backup:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run backup
id: backup
run: ./scripts/backup.sh
- name: Notify
if: success() || failure()
uses: apialerts/notify-action@v2
with:
api_key: ${{ secrets.API_ALERTS_KEY }}
channel: 'cron'
message: ${{ job.status == 'success' && 'Nightly backup complete' || 'Nightly backup FAILED' }}
tags: 'cron,backup'
link: ${{ format('{0}/{1}/actions/runs/{2}', github.server_url, github.repository, github.run_id) }}
For the full GitHub Actions configuration, see the GitHub Actions integration page.
Code-level schedulers (APScheduler, node-cron, robfig/cron)
If your “cron” is actually a long-running process with an in-code scheduler library, call the API Alerts SDK directly from inside the job function. This is often the cleanest pattern because you have full access to the job’s result, exceptions, and context.
Python with APScheduler:
from apscheduler.schedulers.blocking import BlockingScheduler
from apialerts import ApiAlerts
ApiAlerts.configure("your-api-key")
scheduler = BlockingScheduler()
@scheduler.scheduled_job('cron', hour=3)
def nightly_backup():
try:
count = run_backup()
ApiAlerts.send(
channel='cron',
event='python.backup.success',
title='Backup complete',
message=f'Processed {count} records',
tags=['python', 'backup'],
)
except Exception as e:
ApiAlerts.send(
channel='cron',
event='python.backup.failure',
title='Backup FAILED',
message=str(e),
tags=['python', 'backup', 'failure'],
)
raise
scheduler.start()
The same shape works for Node.js with node-cron, Go with robfig/cron, Ruby with rufus-scheduler, and every other in-process scheduler. Each SDK has a send method with the same signature. See the SDK overview for the language-specific details.
The heartbeat pattern for silent failures
Everything above catches failures that happen during the job run. None of it catches a job that never ran at all. For that, you need a watchdog that lives outside the job.
The pattern:
- Your cron job posts a
startedevent at the beginning and afinishedevent at the end (with success or failure status). - A separate watchdog process checks at regular intervals: “did we receive the expected
finishedevent in the last N hours for job X?” - If the answer is no, the watchdog posts its own alert: “Job X has not completed on schedule. Last seen N hours ago.”
For most teams, the simplest implementation is to use a dedicated “dead man’s switch” service alongside API Alerts rather than building the watchdog yourself. Two options that pair well:
- Healthchecks.io. Free tier, minimal UI, HTTP pings. Your cron job pings Healthchecks.io on each successful run. If the ping does not arrive within the expected window, Healthchecks.io alerts you. Use it alongside API Alerts: Healthchecks.io catches “job never ran” and API Alerts catches “job ran and failed.” The two together cover both failure modes.
- A simple scheduled API Alerts query from an external machine. A second cron job on a different host runs every hour and queries the API Alerts event history. If the expected job event is missing from the last N hours, it posts a failure alert. This is homegrown and more work but avoids introducing a new SaaS dependency.
The Healthchecks.io path is almost always the right first answer. It takes five minutes to wire up and it is free for reasonable volumes. The API Alerts half handles the “something went wrong while running” case, Healthchecks.io handles the “never even tried to run” case, and together they cover every cron failure mode we know of.
Related integrations
- GitHub Actions integration. For scheduled workflows that run in GitHub’s infrastructure.
- SDK overview. Common patterns shared by every language SDK, for code-level schedulers.
- curl reference. For shell wrappers and any cron platform that accepts a shell command.
Get started in 5 minutes
- Create a free workspace and grab an API key
- Pick your single most important scheduled job (probably a backup or a billing cycle)
- Wrap it using the snippet for your cron platform above
- Force a failure on purpose (rename a file, break an env var) and watch the alert arrive
- Install the API Alerts mobile app so the notification lands on your phone the next time it runs
For jobs where silent failure would be catastrophic (backups, billing, anything where “no output” is indistinguishable from “correct output nobody needed to look at”), add Healthchecks.io alongside API Alerts for the dead-man’s-switch half of the coverage.