Platform Thinking

What is the role of AI in your platform?

We hear much today about how AI is going to do our jobs better than us.

AI is powerful and has access to a lot of information and does a lot of things humans can’t do.

It is really great at writing shell scripts, boilerplate in programs, figuring out how to use an API, summarizing public information on the Internet, etc.

But these tasks are not the differentiator.

The real value is going deep.

Information found only in books, which AI does not have access to.

Experience found only in people and encoded in platforms.

AI in a product can be useful for extracting information from YOUR data.

General AI tools are platforms in themselves, but probably not YOUR Platform, and probably not the creator of the primary value of YOUR Platform.

YOUR Platform is for extracting and leveraging the deep – the value people need.

The Platform Test

How do you know if you own YOUR Platform?

If a customer needs a new hardware interface or connector on a product, can you easily add that?

If a security vulnerability is found in a piece of software in your stack, can you fix it?

If a customer wants to use a new USB peripheral in a product, can you release a new SW version with that driver included?

Can you do all the above quickly, easily, and with confidence?

If so, you likely own YOUR Platform.

Here’s the thing – embedded systems today are general purpose, updatable, and expandable. That means people will do things with them that you never imagined.

A fundamental characteristic of a platform is that it will go places you never imagined.

Are you prepared for that?

Preparing for the future

Yesterday, we discussed a fundamental characteristic of a platform: it will go places you never imagined.

How do we prepare for this?

There are two ways:

  1. Try to predict where things will go in the future and build specific features into your platform now – just in case …
  2. Build the tooling, processes, and workflows so that you can easily and confidently add functionality when it is needed.

It is obvious which approach works better – if we try to predict the future (at least the specifics), we are more often wrong.

We can either try to build the future now, or prepare to meet it as it comes.

The line between these two is often difficult to discern.

What will you improve today?

Platforms are all about improvement – at the personal, team, and company levels.

One approach is to each day write down something you are going to improve and set aside a small block of time daily to work on it. Make this part of your personal process.

This does not have to be something big – clean your workspace, create a checklist for something you don’t enjoy doing, automate something, write some documentation to help others on the team, write a test for some troublesome code, refactor something, improve CI/CD, …

The internal improvements you do today, while not directly seen by your customers, will help you deliver something better tomorrow.

What will you improve in YOUR Platform today? Reply and I’ll compile a list and share it in a future post.

Do you own your deployment?

Saturday morning, I got a call from a customer – something was not working due to a bug we had deployed Friday (no we don’t have very good tests) :shushing_face:.

The fix was easy, I tested it locally, and then tried to push it to a Git hosting service we are using, but the Git service was down.

Now what? Our Ansible deployment script pulled directly from git, built the program, and then deployed it.

While I could reverse engineer the build from the Ansible scripts and do it manually, that would have taken time and introduced the possibility of another error.

So I pushed the repo to my Gitea server, tweaked the repo line in the Ansible script, and deployed the update – not a big deal.

This brings up a question though – we don’t usually think of deployment as critical infrastructure – not a big deal if it is not working – until you need to fix something quickly in production.

What if the deployment was wrapped up in some CI/CD workflow that only worked in vendor X’s cloud service?

Maybe simple deployments are actually better – a shell script that lives in the project repo that you can run anywhere. This could still be called by a CI process for the normal workflow.

All computing systems have the potential to fail – it does not matter how big vendor X is – their stuff can still fail.

Networks occasionally have problems.

DNS can have issues.

Systems get hacked.

No matter how many layers of complexity we pile on top of this.

In networked computer systems, the simplest path to resiliency is the ability to QUICKLY rebuild systems, whether that is your workstation, laptop, server, or deployment system.

When things go wrong …

What do we do?

Do we focus on who/what to blame?

Or do we figure out a path forward.

How we are going to prevent this problem in the future?

Not by shaming someone into paralysis, but by fixing the process.

By improving YOUR Platform.

The opportunity for the individual/organization who made the mistake to help fix the process is a graceful way out and preserves their dignity.

What is the difference between YOUR Platform and other platforms?

We all use other platforms – operating systems, cloud services, middle-ware, hardware modules, etc.

It is tempting when building a product to piggyback entirely on someone else’s platform (AWS, .NET, one of the 100’s of IoT Platforms, etc.)

Society tells us – you can’t host your own service, deploy your own updates, design your own hardware, implement reliable systems, etc.

But at a small/medium scale, none of these things are very hard.

If you do them, you can simplify and optimize for your needs.

YOUR Platform is partly the ability to leverage other platforms, but also to build your own – where you are in control of the critical integration points.

The cost of updating dependencies, or not

As developers, we are often lazy when it comes to updating dependencies.

A short-term productivity hack is to not update them.

Leave our Yocto build at an old version.

Never touch go.mod or package.json – everything is working and I can keep focusing on coding features.

Don’t update our tools – we don’t have time.

… until things break, there is a security problem in a dependency, or we need a feature in a new version of something, etc.

And then things grind to a halt.

As Khem recently shared, Maintenance is costlier than development, so even though development is important today, maintenance is more important – for tomorrow.

Part of YOUR Platform should be selecting technologies that can be updated regularly with little pain, and a process to do this.

It is the question of paying a little bit continuously, or a lot all at once later, and the latter is often so painful that in many cases it is impractical.

Investing in YOUR Platforming compounds positive gains – neglecting technical debt compounds negative gains.

Platforms are for building systems

If you are building a one-off, non-connected device, you can get by without a platform.

This is why so many design-shops don’t get platforms – they are designing something then moving on to the next project.

But if you are building a connected system, you have a much bigger problem to solve.

You now have a distributed system, and distributed systems are hard.

You are now living on the Internet with all its associated security concerns.

You have a system that has the potential to do so much more than it is doing now.

You have a system that has almost unlimited potential to be expanded.

You (potentially) have a platform.

Isn’t it risky to update your dependencies?

This is a common objection I hear when building industrial systems: “We want to lock things down to a super stable/tested LTS (Long Term Support) release and then stay on that release for a long time – it’s risky to update dependencies.”

Is it?

How often do you update your browser?

Your phone?

How often does Windows or MacOS force you to update your computer OS?

Do you worry every time it updates?

I’ve run Arch Linux for years and update routinely without worrying.

I update to new versions of Gitea every time they come without a concern.

I routinely update to the latest HEAD of Zephyr on projects during development and have rarely had a problem.

The same with about every software component I use.

Yes, there are safety-critical control systems that have stringent testing requirements, but we’re talking about complex connected systems that are mainly concerned with moving data around.

Where security is a concern.

With rare exceptions, modern OSS projects get more stable with each release, and to a lesser extent with each Git commit.

They have defined the laws of entropy.

How? With OSS workflows, testing, continuous integration (CI), more real-world usage, more user feedback and contributions, etc.

With good CI, changes don’t get merged to main until they are tested pretty well.

Transparency, community, and OSS workflows are powerful – really the only practical way to build complex technology.

The next time you seek the cozy cocoon of an LTS release for a dependency in YOUR Platform, think about what you might be giving up … features, improvements, community connection, and likely also stability.k

Does consistency matter?

If you have a single developer on a single project, then perhaps consistency does not matter too much.

However, if you want to scale, either products or developers, then consistency matters.

Why?

So that code does not get drastically reformatted every time someone makes a change, making Git diffs impossible to review.

So that any developer can easily understand and make changes in any part of the codebase.

So that new products can leverage previous efforts.

So that new developers can be more easily on-boarded.

So that we can see patterns and simplify systems.

So that our systems are tested.

So that documentation can be easily found.

Linux Torvalds is being lambasted for encouraging some consistency in Git commits.

But if you read his actual email, the request seems quite reasonable.

The Linux kernel has a well-defined coding style that all contributors are expected to follow.

Have we considered the impact this emphasis on consistency has had on the Linux Kernel’s success?

The irony of all this is that consistency is usually done in the name of the “team” or “reuse”. But if we reflect a bit, we are mostly just helping ourselves.

We can read and understand our own code in 6 months.

We can find stuff.

We can more easily make changes and improvements.

We have tools helping us.

A little bit of consistency goes a long way in building YOUR Platform.

How can we be more consistent?

What does not work very well is long standards and endless code reviews where we shame people into compliance. There are better ways.

We now have tools that can proactively format our code. We have linting tools that check things. Use them. Even if no one else reads our code, they help us.

Write tests. Perhaps their greatest value is that they give us a new perspective on our work, which leads to consistency.

We can have CI hooks that check for various things – check out the Zephyr project if you want a good example of this.

Being nudged by a CI tool to be consistent is a much better experience than being pulled over by the consistency police.

Does YOUR Platform have tooling that encourages consistency?