Five minutes of setup, ten years of "I knew before my users did." An external service pings your site every minute from multiple regions; when the ping fails, you get a message before customers tweet about it. The cheapest reliability tool that exists.
Uptime monitoring and error tracking sound similar and aren't. Sentry watches your application code from the inside — when a request crashes, Sentry catches the exception and tells you. Uptime monitoring watches your site from the outside — when nobody can reach the server at all, Sentry can't tell you because Sentry can't reach the server either. The two are complementary; you want both. This tutorial covers the second one.
The mental model is simple: an external service makes an HTTPS request to your URL every minute (or every five, depending on plan), checks the response, and alerts you when the response stops looking right. "Right" can mean any of: HTTP 200, a specific string in the body, a response time under N milliseconds, a valid TLS certificate. The checks come from multiple geographic regions so a single-region network problem doesn't generate false alerts. The alert reaches you via the channel you picked: email, SMS, Slack, Discord, PagerDuty, push notification on a phone app.
This tutorial picks UptimeRobot as the default — its free tier covers most personal and small-business needs (50 monitors, 5-minute intervals), and the UI is genuinely simple. Better Stack, Pingdom, StatusCake, and Hyperping all do the same things; the patterns transfer. We'll set up the monitor, design a sensible /health endpoint that distinguishes "the process is alive" from "the database is reachable," wire up Slack notifications, and (optionally) put up a public status page so users can self-serve when something's wrong.
/health endpoint that's useful — not just res.send("ok")localhost:3000, deploy something first — uptime monitoring requires a public endpoint to ping. A Slack or Discord workspace is helpful but not required; email alerts work out of the box.
You can monitor your homepage URL directly — and for static sites, that's fine. For an app with a database and backing services, the homepage might return 200 even when the database is unreachable (cached HTML, static fallback, generic error page). A purpose-built /health endpoint distinguishes three levels:
app.get('/health', (req, res) => res.send('ok')). Good for catching crashed processes; useless for catching dependency failures.For uptime monitoring, readiness is the right shape:
app.get('/health', async (req, res) => {
try {
await db.query('SELECT 1');
await redis.ping();
res.json({ status: 'ok', db: 'ok', redis: 'ok' });
} catch (err) {
res.status(503).json({ status: 'degraded', error: err.message });
}
});
Now a 200 from /health means "process up and dependencies reachable" — a much stronger signal than monitoring the homepage.
/health cheap and ungated. Don't require auth on it (the monitor can't authenticate); don't make it run expensive queries; don't put rate-limiting in front of it. The endpoint runs every minute from multiple monitors — if it's slow, your bill goes up; if it's gated, the monitor reports false outages.
Sign up at uptimerobot.com. The free tier includes 50 monitors at 5-minute intervals. (The paid Solo tier adds 1-minute intervals and SMS for $7/month.)
Dashboard → + New Monitor:
https://mybrand.com/health."ok" with type "Yes — alert when keyword does not exist." Now the monitor checks for the literal string in the response body, not just the HTTP status. Pages that return 200 with an error message stop fooling it.Save. The monitor starts pinging within a minute; you'll see the response time graph populate.
UptimeRobot dashboard → My Settings → Alert Contacts → Add Alert Contact. Each "contact" is a destination for alerts:
<workspace>.slack.com/apps → search "Incoming Webhooks" → pick the channel → copy URL. Alerts post into the channel with the monitor name and downtime duration.After adding contacts, edit the monitor and tick which contacts should receive its alerts. Production monitors usually get all of email + Slack + SMS; staging monitors get only email.
An uptime monitor that alerts on every flicker becomes background noise. Three settings tame it:
The goal is "every alert is real, every real outage alerts." Drift in either direction kills the value of the monitor.
A status page (status.mybrand.com) shows your users which parts of your service are up. When something breaks, users check it instead of emailing support. When things are fine, the page is a trust signal.
For most projects with under 10K users, the free UptimeRobot status page is enough. Upgrade only when you have customers asking for an incident history.
HTTP uptime monitors only catch HTTP-facing failures. The other common silent failure: a cron job (nightly backup, weekly digest email, hourly cleanup) that's been broken for two weeks and nobody noticed.
The pattern: a service like Healthchecks.io (free for up to 20 checks) gives you a unique URL per cron job. The job pings the URL on every successful run. If pings stop arriving on the expected schedule, Healthchecks alerts you.
# In your cron job script, last line:
curl -fsS --retry 3 https://hc-ping.com/<your-unique-uuid> > /dev/null
# In crontab:
0 3 * * * /usr/local/bin/backup.sh && curl -fsS https://hc-ping.com/<uuid>
Healthchecks knows the schedule (you configure it) and pages you when a ping is overdue. Most useful for backups (where "did the backup run" is the question that only matters the day you need to restore).
Static sites on Vercel / Netlify / Cloudflare Pages. These platforms have ~100% uptime on the CDN; an outage of theirs is broader than your site. Adding uptime monitoring catches your-config-broke-the-build outages but those usually show up in deploy logs first.
Internal-only tools, no users, no revenue impact. If the only person affected by an outage is you, and you'll notice within the hour anyway, the monitor and its config maintenance aren't paying for themselves.
You can't be alerted in the next 24 hours anyway. Uptime monitors are most valuable when paired with an on-call rotation that responds to alerts. Without that, the alert just queues up unread. Set up the monitor when you're at the stage where you'd actually act on a 3 AM page.