The $440 Million Software Error at Knight Capital

Reading this article re-iterates how important CI and Automated Deployments are. Getting
the human element out of deployment process can greatly reduce these risks as deployment
is a repetitive task but is detail oriented.

Manual deployment

In the week before go-live, a Knight engineer manually deployed the new RLP code in SMARS to its eight servers. However, the engineer made a mistake and did not copy the new code to one of the servers. Knight did not have a second engineer review the deployment, and neither was there an automated system to alert anyone to the discrepancy. Knight also had no written procedures requiring a supervisory review, all facts we shall return to later.

1 Like

This is a fascinating story and highlights the importance of automation, good practices, etc. Any of the following would have prevented this very costly error:

  • automated deployment (Ansible, etc)
  • remove dead code
  • better testing
  • review
  • deployment playbooks

Process is important – it can free us from drudgery of tedious work and prevent human mistakes.