recentpopularlog in


« earlier   
kubectl-debug - Debug your pod by a new container with every troubleshooting tools pre-installed
An out-of-tree solution for troubleshooting running pods, which allows you to run a new container in running pods for debugging purpose. The new container will join the pid, network, user and ipc namespaces of the target container, so you can use arbitrary trouble-shooting tools without pre-installing them in your production container image.
Kubernetes  error-handling  opensource  CLI  tools 
6 days ago by liqweed
Cockatiel - Resilience and transient-fault-handling JS library
A resilience and transient-fault-handling library that allows developers to express policies such as Backoff, Retry, Circuit Breaker, Timeout, Bulkhead Isolation, and Fallback.
JS  error-handling  opensource  generic-toolkit 
9 days ago by liqweed
Thoughts on Error Handling in Rust [Totoroot]
A lightweight commenting system using GitHub issues.
rust  rustlang  error-handling 
19 days ago by willyh
Scaling in the presence of errors—don’t ignore... — programming is terrible
Although using things like replicated logs, message brokers, or even using unix pipes can allow you to build prototypes, clear demonstrations of how your software works—they do not free you from the burden of handling errors.

You can’t avoid error handling code, not at scale.

The secret to error handling at scale isn’t giving up, ignoring the problem, or even it trying again—it is structuring a program for recovery, making errors stand out, allowing other parts of the program to make decisions.

Techniques like fail-fast, crash-only-software, process supervision, but also things like clever use of version numbers, and occasionally the odd bit of statelessness or idempotence. What these all have in common is that they’re all methods of recovery.

Recovery is the secret to handling errors. Especially at scale.

Giving up early so other things have a chance, continuing on so other things can catch up, restarting from a clean state to try again, saving progress so that things do not have to be repeated.

That, or put it off for a while. Buy a lot of disks, hire a few SREs, and add another graph to the dashboard.

The problem with scale is that you can’t approach it with optimism. As the system grows, it needs redundancy, or to be able to function in the presence of partial errors or intermittent faults. Humans can only fill in so many gaps.

Staff turnover is the worst form of technical debt.

Writing robust software means building systems that can exist in a state of partial failure (like incomplete output), and writing resilient software means building systems that are always in a state of recovery (like restarting)—neither come from engineering the happy path of your software.

When you ignore errors, you transform them into mysteries to solve. Something or someone else will have to handle them, and then have to recover from them—usually by hand, and almost always at great expense.

The problem with avoiding error handling in code is that you’re only avoiding automating it.

In other words, the trick to scaling in the presence of errors is building software around the notion of recovery. Automated recovery.

That, or burnout. Lots of burnout. You don’t get to ignore errors.
error-handling  bestpractices  sre  software-architecture  scalability  scaling 
29 days ago by cdzombak
Best Practices for Errors in Go · Justinas Stankevičius
some of the pain mentioned here around wrapped errors is improved in Go 1.13
golang  error-handling  bestpractices 
7 weeks ago by cdzombak
Working with Errors in Go 1.13 - The Go Blog
Go 1.13 introduces new features to the errors and fmt standard library packages to simplify working with errors that contain other errors.
golang  error-handling  errors 
7 weeks ago by cdzombak

Copy this bookmark:

to read