Life-cycle management is hard

SIOT client life-cycle management just got a bit better. One more problem SIOT client code does not need to worry about. This is another example of the value of test code (in this case end-to-end test code).

How test code helped in this case:

  • the test code runs fast so it was creating new nodes while the client was initializing. A user manually creating rule nodes would have never exposed this problem. However, we do want first-class support for machine-generated config, so this is important.
  • even then the problem was rare (1 out of 15 times or so). Running tests a lot (because it is easy) helped expose the problem, and without testing, it would have been an extreme amount of work to debug a subtle race condition like this.

I had lived with this issue for a while, but experience has taught me that if you see a problem, it’s best to uncover it early and fix it as it might cause more serious problems.

One way to solve problems like this is change all related data together in a database transaction. This works fine if you have a single database. However, in distributed systems, your data is distributed. You can do things like distributed transactions, but it is very expensive and complex. It is much simpler to use CRDTs and carefully manage life-cycle. It is possible, just takes a little more thought.

Another way to solve problems like this is to have a single monolithic config and restart the entire app anytime something changes or assume nothing ever changes in a system. Some problems with this thinking:

  • the real world is not static – it is always changing.
  • most of the value in modern systems is added over time – systems need to change to add value
  • as the size of state/config grows, it is very inefficient to send the entire thing anytime something changes.

SIOT assumes granular data that can be edited anywhere and sent anytime, and the system just does the right thing. All you have to do is write a client that implements your custom logic, and SIOT manages when your client should run and the data it needs.