recentpopularlog in


« earlier   
No more fiddling with slow, complex UIs. Checkup loads everything it needs from your checkup.json file: endpoints, storage credentials, and health criteria.
monitoring  sysadmin  devops 
21 hours ago by horshacktest
sourcegraph/checkup: Distributed, lock-free, self-hosted health checks and status pages
checkup - Distributed, lock-free, self-hosted health checks and status pages
yesterday by knokio
My Philosophy on Alerting - Google Docs
When you are auditing or writing alerting rules, consider these things to keep your oncall rotation happier:

Pages should be urgent, important, actionable, and real.
They should represent either ongoing or imminent problems with your service.
Err on the side of removing noisy alerts – over-monitoring is a harder problem to solve than under-monitoring.
You should almost always be able to classify the problem into one of: availability & basic functionality; latency; correctness (completeness, freshness and durability of data); and feature-specific problems.
Symptoms are a better way to capture more problems more comprehensively and robustly with less effort.
Include cause-based information in symptom-based pages or on dashboards, but avoid alerting directly on causes.
The further up your serving stack you go, the more distinct problems you catch in a single rule. But don't go so far you can't sufficiently distinguish what's going on.
If you want a quiet oncall rotation, it's imperative to have a system for dealing with things that need timely response, but are not imminently critical.
alerting  monitoring  sysadmin  notifications 
3 days ago by unclespeedo

Copy this bookmark:

to read