Thoughts on documentation formats and tools (markdown)

Below is a draft of some things I’ve been thinking about:


title: Documentation Tools and Tradeoffs
author: Cliff Brake
date: 2020-07-10


Overview

There are many ways to create documentation. Microsoft Word is the standard in
the business world. Developers are often use Markdown in README files. Online
systems like Google docs can be handy for collaboration on a document. Content
management systems like Wordpress fit well for blogs and web sites. Sometimes a
raw text file makes sense.

This document explores the use of a documentation system an organization when
more than one person needs to maintain or review a document.

Requirements

What are the requirements?

  1. Relatively easy to use
  2. Encourage collaboration
  3. Easy to track changes and review
  4. Support any organization structure
  5. Rich formatting

Options

We are going to example three options in this document:

  1. Microsoft Word
  2. Google docs
  3. Markdown stored in Git

Microsoft Word

Most people probably reach for MS Word these days, as we’ve been conditioned to
do so. It is the “expected” document format in the business world. It is a
powerful tool and delivers impressive results with very little work. However,
there are some drawbacks to using Word:

  • Most software developers these days don’t run Windows. This may not be a well
    known fact but if you observe what people are using at conferences during
    presentations, and wander through the offices of leading tech companies, you
    will see mostly MACs and a lesser number of Linux notebooks/workstations. This
    is not an arbitrary choice or fad – there are very good reasons for this
    which are beyond the scope of this document.
  • Even though Word can track changes, it is tedious to review them. It requires
    downloading the file, opening the file in Word, and looking through them.
    History can be reviewed, but it is usually too much effort to bother.
  • When someone commits a Word document to a Git repo, the changes are not easily
    visible in the commit. Again, if you want to see what changed, you need to
    check out and open the file. And, the history in Word is not necessarily tied
    to the Git commit history.
  • Multiple people can’t work on the same document at the same time.

Google Docs

Google docs is a very impressive tool and excels in scenarios where a small
number of trusted people want to collaborate on a document. The ability for
multiple people to type into the same document and see each other’s edits in
real time is very neat. Edits are tracked, and it has all the normal features
such as comments that can be useful. However, there are also drawbacks to Google
docs in that a connection to the cloud is required to use it and changes can’t
be managed outside the normal sequence of revisions.

Markdown

Markdown is a fairly simple markup language where you can create nicely
formatted documents with a simple and readable text document. As with
everything, there are trade-offs. The simple/readable text format also means
there are limitations to what all can be done with it. However, the
functionality included in markdown is adequate for most documentation tasks.

Some of Markdown’s drawbacks:

  • Is not WYSIWYG.
  • Is a markup language so you need to know the syntax.
  • Requires a little work in setting up a good editor, automatic formatters, etc.
  • Can’t do all the advanced formatting that other solutions can.
  • Image files cannot be embedded in the document source file – they must live
    in files next to markdown source.

However, there are also some significant advantages to Markdown stored in Git:

  • Because the source file syntax is simple, it is very easy to review changes in
    a commit diff.
  • The document/markup is very readable in a text editor – developers don’t have
    to leave their primary editor to work on documentation.
  • Syntax is easy to learn and use, but there are tools like Typora that give you
    a more classic word processor experience.
  • Encourages collaboration.

Example Scenario

Consider this scenario where we examine how things might go if the company
handbook is maintained in Word, Google docs, and Markdown.

If the document was entirely in Word, any proposed changes would have to be:

  1. emailed in the form of: "consider changed sentence “abc” on page 23 to “xyz”.
  2. The maintainer of the handbook would have to make sure she had the latest
    version
  3. open the Word document, make the change
  4. send a copy of the Word document with tracking enabled for the CEO to review
  5. the CEO would send back some suggestions.
  6. the maintainer would have to coordinate with the original author
  7. at this point, since everyone is busy, this task gets lost in the black hole
    of our overflowing inboxes and is never completed.

Consider the alternative flow if the document is stored in Google Docs:

  1. Employee has a suggestion, but cannot be give access to change the document
    in Google docs directly, as it there is no formal review process.
  2. So, he emails the suggested change, similar to the world scenario.
  3. Maintainer makes the change and notifies CEO to review.
  4. Review process is much easier that Word as both CEO and maintainer have write
    access to the document.

Consider the alternative flow in Markdown stored in Git:

  1. Employee checks out out handbook source.
  2. She makes the change in a text editor (or tool like Typora) and pushes to a
    forked repo in her own namespace.
  3. In Gitea, a pull request is opened.
  4. The maintainer of the handbook reviews the pull request and tags the CEO in
    the pull request to see what he things.
  5. The CEO gets an email and reviews the pull request making a few suggestions.
  6. The original author makes the changes and submits an updated pull request.
  7. The handbook maintainer approves the pull request and the change is instantly
    merged into the master branch.
  8. The CI (continuous integration) system sees a change in the master branch and
    generates an updated copy of the handbook and deploys it the company
    webserver (could be internal, or in the case of Gitlab, external).

Even though Markdown and Git are harder to use than Word or Google docs for
actually making an edit, collaborating on the change is much easier. Even if the
maintainer takes a two week vacation in the middle of the process, when she gets
back, the pull request is still open, reminding everyone of the pending change
that needs completed. There is now process and tools that facilitate review and
collaboration. Even though the editting process is a little harder, there is
much less friction in the overall process of contributing. Thus, more people
will contribute.

The difference between the push and pull model

The Markdown/Git flow described above is a “pull” model in which a person makes
a change, publishes the change, and then the maintainer of the upstream document
or project can easily pull the change in with a click a button. This is
different than the “push” model where a person might try to push a change into a
document by emailing a suggested change, or might make the change directly in
Google docs. The big advantage of the pull model is that it enables process,
tools, and workflow. A few more notes on the pull model:

  • The change is considered in the context of just that change and not mixed in
    with other changes. It can be easily reviewed in the pull request. Tools like
    Gitea, Github, etc allow discussions to happen in the pull request. Pull
    requests can be updated after comments are processed, etc. Once everyone
    agrees on the change, it is easy to merge with a single button click.
  • Multiple changes can be happening in parallel and each proceeds at their own
    pace. In Google docs, a change is made and then recorded. You can revert to a
    version of a document, but it is not easy to isolate a change by itself and
    merge it in when desired – you can’t go back and pull out a single change.
    The pull model enables parallel work on a document where each change proceeds
    at its own pace.
  • The pull model allows any organizational structure. The maintainer is free to
    merge or reject any proposed change. Direct access to the master copy is never
    required by contributors.
  • Multiple levels can also be organized. For instance, an engineering manager
    could be given responsibility for the engineering section in the handbook. He
    might collect contributions from members of that department, and submit one
    pull request to the company maintainer of the handbook.

The pull model is one reason Open Source projects have been so successful. The
organizational structure of the Linux kernel is rather complex (with many
levels). There are many subsystem maintainers who collect changes from
contributors and then pass through multiple levels upstream. This process is all
enabled by the Git distributed version-control system.

The difference between the push and pull models is the pull model scales.

Why is collaboration and process important?

This brings us back to a more fundamental question – why is collaboration in a
company, project, or organization important? There are many reasons:

  1. As systems increase in complexity, no one person knows everything that needs
    done. Collaboration with experts is the only way to build complex systems.
  2. A company is a collection of people. If people have a voice, they will be
    much more motivated than those who are simply told what to do. The command
    and control model of yesteryear does not work today.
  3. Just as in product development where no one person has all the knowledge
    needed, how much more in the complex operations of a company composed of
    humans. A company’s success depends on its ability to leverage good ideas of
    all involved, and there needs to be a process for these ideas to surface.

The Choice

The choice of a documentation tool selection comes down to the following
questions:

  1. Do we want to optimize for ease of editing (the short term)?
  2. Or do we want to optimize for collaboration and the spread of ideas and
    information (the long term)?

A casual study of various successful companies and open source projects suggest
that reducing the friction of collaboration is critical to success – especially
in technology companies developing cutting edge/complex systems. Gitlab’s
handbook is maintained in a Git repo and
anyone can open a pull request. While most of the contributions come within the
company, there have been
outside contributions. For
more information on Gitlab’s approach consider the following:

But Markdown/Git is too hard

Can non programmers learn Markdown and Git? This is a good question and not sure
I have the answer yet, but I think they can. In many ways, learning a dozen
elements of the Markdown syntax is simpler than navigating the complex menu
trees of Word trying to figure things out. Git does have a learning curve, but
the essence of it is fairly simple. Once you understand the basic concepts and
operations, it is as natural as anything else.

Humans are inherently lazy, so most continue to just do what they have always
done. Additionally, not all people are intrinsically motivated to share and
collaborate. Probably the most important thing is to establish an organizational
culture where the people at the top set the example, and ask others to follow.
As much as we don’t like to compare ourselves to sheep, most of us resemble them
more than we like – we don’t like to be driven, but will gladly follow the lead
if it makes sense to us. Telling people to collaborate and use certain tools
will not work if the people at the top are not doing the same.

@khem pointed me to this repo:

A nice example of a team creating documentation using markdown and Pandoc.

We are doing something similar with the tmpdir handbook.

I have been also looking at different documentation systems lately and found reStructured text as quite appealing as well. Eg. see this

How is wxPython related to reStructured Text – do they do their docs in it?

seems they are using it in lot of places

Found mark doc today

Seems interesting from authoring point of view

Yes, that is interesting. I’ve heard good things about Stripe’s documentation (Stripe created Markdoc for their own needs). Markdoc is written in Typescript (which does not excite me), but may be required to get the extensibility they want. The faq is also interesting and includes a comparison to AsciiDoc.