The hard parts of IoT Systems

By far, the hardest part of IoT systems is that they are distributed systems, which typically include a browser, cloud, and edge instances.


These are all distinct and geographically separated computing systems connected by networks. And distributed systems are difficult to get right.

The 2nd hardest part of IoT systems is lifecycle management. When you are interacting with the real world, things come and go. Devices are added and removed. Humans may add/remove entities through configuration changes. Processes may also automatically add or remove entities. And through all this, everything must keep working, processes must be created and cleaned up, and leaks cannot occur. Lifecycle management typically involves a good bit of concurrency, and like distributed systems, concurrency is also hard.

Recently, lifecycle management in Simple IoT was improved while adding support for Shelly IoT devices. The Shelly management client periodically sends out mDNS requests looking for new devices. This in turn creates new nodes. The SIOT Client Manager then instantiates new clients for new nodes as they are created. However, this was crashing and was recently fixed. Because the Client Manager manages the lifecycle for all clients, all clients now benefit from this bug fix (and there are now 17 or so clients).

In SIOT, we try to do the hard things once:

  • Distributed: use NATS and simple/common/granular data structures.
  • Lifecycle: Everything is a Client and the Client Manager manages the lifecycle.

Abstractions are useful as they reduce the number of times we need to write and debug hard stuff. They establish patterns that are easier to read and reduce errors. But for abstractions to be useful, we must identify the hard parts, and prioritize these. There are tradeoffs – in every feature you add to a system, would you rather:

  1. Write a little boilerplate?
  2. Deal with distributed, lifecycle, and concurrency issues?

As an example, in SIOT, we transfer and store most data using a simple point data structure. This requires some boilerplate to convert everything into a point before sending it and deserializing points into useful structs. But the benefit is the synchronization, data merging, and transport functions rarely have to change when we add new features. The hard part is stable.

Another example is typed languages – typically these languages require a little more
code and up-front work, but prevent many run-time bugs and save a lot of time when it comes to refactoring. In this case, preventing run-time bugs and code maintenance are the hard parts.

Architectural patterns like the Elm architecture are a third example. In Elm, it is fairly tedious to pass data between components – it requires a lot of boilerplate. But the resulting simplicity in handling state is a huge benefit. Managing state in frontend applications is the hard part.

Many developers shun boilerplate and other distasteful aspects of programming, but compared to distributed and concurrency challenges, boilerplate is easy. Too often I think we view abstraction as a way to reduce typing (boilerplate) instead of simplifying the hard parts. There are often tradeoffs – chose wisely.