Peter Bourgon · Go for Industrial Programming

cbrake · September 9, 2022, 3:45pm

A few quotes:

I’m speaking today about programming in an industrial context. By that I mean

in a startup or corporate environment;

within a team where engineers come and go;

on code that outlives any single engineer; and

serving highly mutable business requirements.

Incorrect or overly-complex designs for starting, stopping, and inspecting goroutines is the single biggest cause of frustration faced by new and intermediate Go programmers, in my experience.

I largely agree with what Charity told us earlier in the program. In particular, I agree that a core invariant of our distributed industrial systems is that there’s simply no cost-effective way to do comprehensive integration or smoke testing. Integration or test environments are largely a waste; more environments will not make things easier. For most of our systems, good observability is simply more important than good testing, because good observability enables smart organizations to focus on fast deployment and rollback, optimizing for mean time to recovery (MTTR) instead of mean time between failure (MTBF).

khem · September 11, 2022, 6:02pm

yeah fail-fast and recover-fast is a success mantra that has been adopted in industry thinking rolling releases pave way into these kind of designs. Although, I do not agree that integration and test environments are waste, observability platforms are complementary not replacement of testing and integration. It will be a mistake to ignore one for other.

cbrake · September 12, 2022, 12:13pm

I’ve not used integration or test environments much, but most of the things I work on are relatively small scale and a development environment suffices. Ironically, Google scale is probably similar – there is no way to implement a test environment, so the only practical option is to do rolling deployments into the real thing. Integration/test environments are probably more popular in medium scale systems (financial, manufacturing, hardware vendors, etc).

The following has worked well for me:

development environments: where each developer can easily spin up the entire software stack on his own computer
unit testing: used for testing algorithms or complex code
e2e testing: where you spin up the entire stack (with perhaps parts of it stubbed out or simulated) in a test, and then tear it down at the end – all very quickly. This decreases the possibility your code has globals that will make testing and using multiple instances difficult and helps ensure things shutdown cleanly.
CI: basically automated unit/e2e testing. CI is critical as not many of us have the discipline or time to do all the needed testing manually.
automated deployments: whether it is dotfiles on your personal computer, Ansible to spin up a server, Nix/OS, CD, whatever …
feature flags: allows you to easily turn stuff on/off in the field.

Golden systems of any kind (build machines, servers, personal computers, etc), are a big red flag as it ends in systems that are not easily reproducible and scalable. My standard is to rebuild any system in 1/2 hour. If not, automation work is needed. We all now have low-cost super-computers on our desk and cloud/virtual servers are cheap and quick to start – there is rarely a need for a golden machine any more.