I use pingdom to check if one of my sites is up and running, getting an email if the root index.html page doesn't behave. I just read the single most interesting piece to use pingdom to check your whole site:
I recommend creating an internal status page which automatically checks all the things you think are crucial, risky, and tractable to resolution if you were to know about them. (If an external API provider goes down and you already know your response is going to be “I wait until it comes back up”, then no sense disturbing your sleep about it, right?) For example, mine will fail to return properly if Nginx, Mongrel, the Delayed::Job workers, memcached, or Redis is having a bad day. You can then have your external monitoring poll that page and, if they don’t get the HTTP 200 all clear, send you an email.
Excellent advice IMHO!
HTH,
Member discussion: