Platform Thinking

cbrake · August 14, 2024, 1:47pm

Three levels of YOUR Platform

We can look at platforms at three levels.

platform levels

Your personal platform is the know-how that you can personally reuse from task-to-task, project-to-project, and job-to-job. But, it is not real useful outside yourself.

A product platform may be what a team uses to efficiently build, maintain, and produce variants of a product.

A company platform is when you are able to reuse know-how, technology, tools, and processes on one product and leverage that on the next product, and when an improvement on one product improves another.

All three are valuable, but the higher up the stack you can go, the higher the value.

cbrake · August 15, 2024, 8:04pm

Culture

What type of culture do platforms thrive in?

Where ideas are valued and evaluated on merit and truth.

Where initiative is rewarded.

Where mistakes result in process improvements, not scoldings.

Where openness and transparency are the standard.

Where anyone can improve anything in the system – there is no protecting turf.

Where OSS tools are used when possible.

Where things are continuously improved.

Platform cultures are organic.

We can learn a lot from Open Source projects here …

cbrake · August 16, 2024, 1:04pm

What is the #1 purpose of YOUR Platform?

The purpose of YOUR Platform is to reduce friction in delivering value to your customers.

We are in business to deliver value to our customers.

If we can do this better, we can deliver more value.

There are lots of ways we can do this.

Automation.

Better tools.

Additional features.

Quality/stability improvements.

Release more often.

Respond more quickly.

cbrake · August 19, 2024, 3:52pm

Hack-it, ship-it, forget-it …

Hack-it, ship-it, forget-it … is a race to the bottom where cost is the only thing that matters.

The initial delivery is what your customer gets – forever.

These types of products only depreciate.

And cannot be maintained.

Alternatively, the value of your product can increase over time.

Where you fix problems quickly.

And add features in a timely fashion as needs/markets change.

Where you overcome the laws of chaos and improve instead of decay.

Because you own YOUR Platform.

Which business do you want to be in?

cbrake · August 20, 2024, 2:41pm

How does your product value increase after the initial sale?

How does your product increase in value after the sale?

Software/Firmware/Cloud updates.

Software is soft for a reason – it is meant to be changed, improved.

And sometimes we install “apps” to add new functionality.

And with this improvement comes increased value.

It is kind of like your phone – with each new OS update, things generally get better.

Each app you install (potentially) provides some utility.

Why, because modern phones are platforms.

Is your product a platform, or is it at least built and deployed using a platform?

Does its value increase over time?

At industrial product scale, you don’t necessarily need to support user installed apps, but you need to at least be able to add functionality, fix things, and deploy these updates – efficiently.

How? With YOUR Platform.

cbrake · August 21, 2024, 7:54pm

What is the simplest form of automation?

The simplest form of automation is a checklist or playbook.

If there is something you need to do more than a couple times that involves multiple steps, a checklist is a great place to start.

Then we no longer have to dread doing a multi-step task because it is easy – all we need to do is go through the documented steps.

Checklists are surprisingly effective at reducing friction and mistakes. Many of us do this personally for things like travelling as it is easy to forget something if we don’t.

And the beautiful thing about checklists is they can easily be turned into more advanced forms of automation (scripts, continuous delivery, etc) because you have already thought through and debugged the process, which is often the hard part and where we get stuck.

Checklists can turn into scripts, which can turn into continuous delivery.

Part of YOUR Platform.

cbrake · August 22, 2024, 2:03pm

Platform != People

The right people in your organization are absolutely essential – no question about that.

And dealing with “people issues” is a critical and essential skill.

But, if your organization is dependent on a star individual and you are in big trouble when he/she is gone, then you don’t have a platform.

We need great people, but if they can’t work in the context of YOUR Platform, then you will have trouble scaling and hit a major speed bump when they leave.

People come and go.

YOUR Platform is what gives your business the consistency and resiliency to keep going – smoothly.

cbrake · August 23, 2024, 1:17pm

The present or the future?

Planning has its place – we need a vision for where we are going.

But YOUR Platform is best focused on the present, not the future.

Because the future of a complex system is pretty hard to predict.

When we focus on improving our current efforts with refactoring, testing, automation, documentation, CI/CD, etc. in the simplest way possible …

We pave the way to the future, by reducing friction in the present.

cbrake · August 26, 2024, 3:40pm

What makes a good Yocto BSP?

As we evaluate technology to use in our platforms, Yocto Embedded Linux BSPs often come into the mix.

Recently we talked with Matt Madison, who maintains the meta-tegra BSP layer, which provides Yocto support for Nvidia’s embedded processors, which are increasingly becoming popular in edge AI applications.

We are using meta-tegra in several projects, and it has been a good experience.

What came out of our discussion is that user/community involvement is what makes a good BSP.

meta-freescale and meta-raspberrypi are other examples of community-oriented BSP layers that are very high quality.

Chip makers have different priorities and concerns than users. And the only way to understand your users is to get them involved. And the best way to get them involved is to work on an OSS project together.

We’ve run the experiment in the Yocto BSP space for 14 years, and the results are in.

When evaluating a complex technology like processors that rely on open-source, if a supplier does include their users in the development process, there is a good chance they don’t really understand your concerns.

cbrake · August 27, 2024, 2:09pm

Tracking upstream and why does it matter?

Yesterday, we discussed three Yocto BSP layers that are exceptional:

meta-tegra (69 contributors)
meta-freescale (174 contributors)
meta-raspberrypi (157 contributors)

The above three layers make an effort to keep up with upstream developments. This may mean regularly merging upstream, doing a build, fixing issues – keeping up.

As a result, they are always ready for the next release of the Linux kernel, Yocto, whatever.

A little bit of continuous effort is much easier than a monumental effort every four years.

Why does this matter for small teams/companies?

There are many reasons, but when working with complex open-source software, we’ll eventually need support/help.

And this help typically comes from the community around open-source projects.

And the community is focused on the current development, not a 4-year LTS release.

Additionally, the latest releases are where security problems get fixed, features implemented, and value added.

YOUR Platform benefits most from being where the value is being created.

cbrake · August 28, 2024, 1:47pm

Github and why does it matter?

For the last two days, we have been discussing aspects of several high-quality Yocto BSPs.

Another characteristic of these three BSPs is that they are all hosted on Github.

Yes, there are many other platforms you can host Git repositories on.

You can easily do your own Git hosting (I personally use Gitea for private Git repos).

But, for better or worse, if you want to engage with a world-wide community around a software project, Github is the easiest, lowest friction place to do that.

Most developers have a Github account, and this means they can easily create issue tickets, submit pull requests etc.

Some projects, like the Linux kernel, are so popular and well-established it does not really matter how they interact with the community. They can be rude, use older methods of handling changes like patches on a mail-list, etc. and it does not matter. It might even be argued that these “barriers to entry” are helpful in weeding out the noise in a very large project. Perhaps this is true.

But for most of us, we are not at that scale. We don’t have that luxury. Even most downstream Linux trees are hosted on Github these days.

If users find it difficult to engage and interact with us, they won’t bother, especially the younger generation of developers.

Most developers are (rightly) focused on their projects – their work, not yours. Thus, it is an act of generosity if they take the time to interact with or contribute to a supplier’s project. They are doing it because they want to – not because they have to.

And if users can’t easily interact with a supplier, does that supplier really understand their concerns?

Github has set a new standard for transparency, tooling, and interaction. It is the lowest friction platform for social coding. And social matters today.

This all has implications for how users/customers interact with YOUR Platform (if public) and how you select technology for YOUR Platform.

cbrake · August 29, 2024, 1:45pm

Solving problems or symptoms?

The following quotes from the book “The One-Straw Revolution” caught my eye:

The more elaborate the countermeasures, the more complicated the problems become. … When a decision is made to cope with the symptoms of a problem, it is generally assumed that the corrective measures will solve the problem itself. They seldom do. Engineers cannot seem to get this through their heads. These countermeasures are all based on too narrow a definition of what is wrong.

Although Masanobu Fukuoka’s book is about agricultural systems, the concepts apply to systems in general.

With security, do we pile on additional layers of checking, detection, etc., or do we use a simpler and more secure technology to start with?

For reliability, do we implement elaborate distributed/redundant systems that check each other, or improve our testing of a single system so that it rarely fails?

When deployment mistakes are made, do we add more bureaucracy and red tape, or do we improve the process/tooling such that it is difficult to make the mistake in the first place?

When something is not working, do we add on layers of complexity to fix it, or try to simplify and identify the root cause?

One way to improve this is ask “why?” five times.

bradfa · August 30, 2024, 11:03am

At a previous small company that I worked for, we had a very solid “platform” for our hardware designs. It meant that we could mix and match different puzzle pieces of hardware modules together to create unique and customized machine configurations, and then we had software to handle all this complexity but since that was build on the same kind of modular platform expectation it wasn’t all that bad to manage. This was WAY easier, cheaper, and faster than making numerous custom full machine configurations and it meant that we could come out with something that looked like an entirely new product but in reality we only changed one small and fairly simple module. It worked very well!

cbrake · August 30, 2024, 2:52pm

Love that example – thanks for sharing! Amazing what happens when we see patterns and simplify things around these patterns.

cbrake · August 30, 2024, 2:53pm

Lessons from a 1-year old dog

As I was going through my morning routine with our 1.3 year old dog, Reese, who is very energetic, it occurred to me how well short training sessions every morning are working.

A 12hr session with her once a month will do virtually nothing.

Improvement is not really what my dog wants to be doing – she would rather chase squirrels, dig holes, jump on people – do something heroic.

Likewise, most of us don’t really want to clean our office, write a CI script, refactor some messy code – we want to design things, code a new feature, make a sale, build 1000 widgets, etc.

Improvement is hard and sometimes painful.

And the pain is proportional to the size of the improvement dose we are facing.

If we can break it down into small enough chunks, it is manageable, and actually enjoyable.

Platform improvements are rarely moments of deep inspiration, but rather just buckling down and doing what you know should be done.

Discipline.

And this sometimes works best if you schedule a time block every day – 25 minutes, start the timer, go …

You only have 25m, get going, now!

And after our morning walk, Reese now walks toward where her leash is hanging – she is actually looking forward to these short, regular sessions.

Improvement is best continuous.

cbrake · September 2, 2024, 2:39pm

What is the best tool for private Git repos?

Several days ago, I made a case for using Github to engage users around your OSS project.

But is this the hammer you should use to drive every nail?

For private projects, I think Gitea is actually better in most respects.

It is very fast, clean, and most of the basic functions work as good or better than Github.

And you can host all but the largest repos on a $5/mo Linode with unlimited users.

Groups and permissions are very flexible.

Branch protection works.

In all, it is very good!

I wrote an Ansible role that works very well for updating your Gitea instance. I’ve updated through ~70 Gitea versions with almost no problems.

Gitea may be a better option for YOUR internal Platform – more details to follow …

cbrake · September 3, 2024, 12:38pm

Seeing patterns

Patterns are important.

Seeing patterns allows us to simplify things by using common data structures, re-using code/design, finding solutions to common problems, etc.

Being able to see patterns also allows us to troubleshoot effectively.

But, to see patterns, we need to have history and data in a form that is easy to see and process.

For troubleshooting, this may be a common place to collect notes – say every time a system fails, we record the incident in a single shared Google Doc.

Tagging is also effective – we might tag a dataset for different types of events, and then be able to quickly filter on different combinations of tags.

To see patterns in source code, we need to have source code organized such that it is easy to see and navigate through all of it – for most of this, this means a monorepo.

Transparency is critical – can we easily see things other people are working on, or is each person or team working in walled-off silos?

Patterns are not invented, but rather discovered.

Platforms are all about identifying and leveraging patterns.

And seeing patterns requires transparency, thoughtful organization of our assets, and rigorous logging of what happens over time.

Do these things, and patterns will naturally emerge, for YOUR Platform.

cbrake · September 4, 2024, 6:28pm

How to help yourself?

Did you ever have the experience going back into a project you have not touched for 6 months – why did I make this change?

Or did you ever struggle to get all the dependencies installed to build a project you wrote?

Or did you ever wonder – will this change break something I’m not thinking of?

Or did you ever question – how do we actually deploy this now that it is updated?

Ironically, when you try really hard to help others use your work – documentation, tests, CI/CD – you are mostly helping your future self.

cbrake · September 5, 2024, 2:36pm

No Golden Machines

As humans, we can become attached to “golden” machines.

A very expensive bicycle set up just right, a well-tuned tool, a nicely configured workstation, a server that we have set up just right …

We like buying expensive things or the iterative process of tweaking things just right.

However, this generally does not move YOUR Platform forward.

Everything on this earth is ephemeral.

Laptops are damaged. Bicycles are stolen. Tools break. Severs crash.

So then we become obsessed with protecting our golden machines – locks, excessive security, redundancy, monitoring, etc.

The problem with being over-protective is that it is time-consuming and hinders our using something in the first place.

We’re afraid to use it in case we might mess it up.

We’re afraid to change it because we don’t understand the history of tweaks.

What if instead we said: “NO GOLDEN MACHINES”

If our laptop gets run over, we can quickly set up a new one.

If the server crashes, we can quickly deploy a new one.

If we need to use a favorite editor on a different computer, it only takes 2 minutes to set it up.

If our bicycle is stolen, we buy another reasonably priced one and have the skills to set it up.

YOUR Platform is best built from machines that we can easily use, replicate, and scale, not golden machines that we have to protect.

cbrake · September 6, 2024, 3:11pm

What is the first thing you should implement?

When building a new system, what is the first thing you implement?

There are a lot of approaches – one is to implement a minimal proof of concept.

However, I think an easy way to update software in any part of the system (Linux edge devices, microcontrollers, cloud applications) is something that should be done as early as practical.

Requiring a technician to go on-site with a computer and specialized software does not count.

Neither does a multi-step manual process of manually building something, scp’ing files to a server, manually restarting stuff, etc.

Deploying new software to any system should be easy and quick.

A single click, or putting a single file on USB disk and rebooting a device, or merging to main and continuous delivery (CD) delivers our new software to the cloud.

We may think nice update mechanisms are for our customers in production, and they certainly are.

But update tooling also helps us during development.

An easy way to update systems encourages us to do it more often, because it is easy.

We iterate faster.

Fixes and improvements are deployed instantly.

We test more systems during development.

We can easily adapt as we learn more about the problem we are trying to solve.

We can involve more non-technical people in development and testing.

What helps us scale in production also helps us scale during development.

Instead of doing update last, do it first – as an integral part of YOUR Platform.