Machine Learning: A Manifesto for Good ML Coding Practices

10 min readMar 29, 2021

Why we need good coding practices in the world of ML

Introduction

This article examines why we need coding standards in the world of ML, and why machine learning needs to catch up with established software development practices.

Where Are We Now?

ML engineering is a branch of software engineering. At the development coal face, its only difference is that instead of explicitly defining rules in code to act on data, we use data to discover those rules, then apply those discovered rules to new, previously unseen data.

But we are still coding to make that happen. Programming. Software engineering. That means all of us ML engineers are also software engineers, albeit software engineers with some very specialised knowledge around statistics, data ethics, bias, models, etc.

However, look at many examples of ML code on the internet and the first thing that strikes you as a seasoned non-ML software engineer is that it is badly written, compared to non-ML code. There are few good practices on display.

Why Are Good ML Coding Practices Important?

So, why do we need to adopt the same good coding practices found in non-ML software engineering into ML?

Because by being better software engineers we become better ML Engineers.

How Did We Get Here?

As software engineers, we’ve been talking about and commercially using good coding practices for decades. They’re not rocket science. Just agreed working standards and practices, nothing more. So why have we failed to adopt them broadly in machine learning?

Here are just a few of the reasons we’ve found ourselves in this state in 2021:

ML is still relatively immature as a software engineering discipline. We’ve only been employing the current breed of ML since the end of the “AI Winter” in 2012 (3). “Traditional” software and database developers tend not to be ML engineers. Conversely, few ML engineers are also traditional software engineers. Therefore, they lack experience with and discipline in those areas. It cannot be expected. And because few software engineers “cross the boundary” into machine learning, none of those good software engineering practices built up over decades are brought into the ML world.
The nature of a JuPyTer notebook, used for a large amount of data science work, is insular. It often exists in isolation as a purely exploratory object in the larger ML lifecycle. This in turn means its code has no one to answer to other than itself. It does not need to comply to any standards in this environment. But conversely, once we take it out of that environment to integrate with existing systems, this changes.
Many exploratory examples of machine learning are presented as JuPyTer notebooks and uploaded to GitHub for re-use. This encourages working code, however badly written, to be used as-is. It produces the results, right?
Similar to point 3, ML tutorials are copied and pasted without being critiqued by the end-user for the coding practices they employ. And why should they be expected to comply with good practices? Short, educational, sample pieces of code to illustrate a point foster the thinking that “it’s not worth it”, and so long as the problem and the technique used to solve it are relayed to the reader, the objective is met. This, after all, is not production code, it’s code for learning.
Academic papers out on the internet are written by academics. In general, academics have no need to write solid code for production purposes. It’s not their end purpose. So why should they bulletproof their code when its end aim is to clearly demonstrate their work to fellow researchers. They are researching, exploring, and creating short proof of concept examples for research that have no need for coding standards, and need no validation or verification code, as the assumption is that the data being fed into the ML model has been pre-cleansed and is fit for purpose. The code will never see production and won’t be maintained.

Finally, until recently the rise of MLOps has been a comparatively slow one. It has been confined mostly to the tech innovators such as Amazon, Facebook and Google, and early non-tech sector commercial adopters.

But in the last 4 to 5 years, machine learning has rapidly permeated mainstream business as the ability to predict, classify and automate accurately, swiftly and in an adaptive manner have been realised as a commercial advantage. This has also been driven partly by the commoditisation of ML into easier to use services, and better toolsets, promoting a simpler route to adoption and integration into existing systems.

The upshot: As ML is more widely integrated into non-ML software, the need to make ML code interoperable and readable with non-ML code is essential, as the barriers between the two have been blurred or completely broken down.

Engineering, Business and IT Benefits

Comprehension and readability for fellow engineers saves time. This is not to say a SQL Developer or a DBA can write ML code or vice versa. Most of us can’t, and shouldn’t. But being able to understand the interface between each others’ code is important. Which means being able to understand each other’s code intent. Consistent coding standards and naming practices help to promote this increasingly important capability. All the more so as ML code and services are increasingly integrated into non-ML systems.

The ability to move people between teams where they have the right skills, with minimal friction. There’s another angle to this. Resource. Software engineers are flexibly minded and constantly acquiring new skills. They are life long learners. They have to be as technology changes at such a pace. And machine learning engineers are expensive and challenging to recruit.

Therefore, larger organisations with the foresight to invest in and transition engineers from a non-ML Software Engineering role to a Data Engineer to an ML Engineer inside their organisation, will find that task easier if the newly trained and promoted ML engineer can quickly understand the machine learning code written by their new team. That gets the new ML engineer off to a flying start in becoming productive quickly. It saves them time. Time is money.

Code reuse to data engineers, production implementers. Database developers and DBAs lost a large degree of their distinction in the mid-2000s. Just as you’d want your SQL Server or Oracle developers to write code which not only each other can understand, but which at the very least your pre-production DBAs can understand too before it gets promoted into production, why shouldn’t the same principle apply to ML engineers? Code written in the research or development stage can then be re-used, instead of being re-written before it goes into production, so time is saved. Time is money.

Maintenance costs. If code is well written and written to a standard that is widely accepted and circulated, code written in 2021 will be easily understood in 2031. Code can last much longer than that in a production system. It’s just that ML code isn’t old enough. Yet.

Easily understood ML code promotes ease of fixing and ease of adaptation to new needs. Especially when the original developer left many years previously. It also helps to avoid errors through misunderstanding when an ML engineer new to your organisation sees your code for the first time.

If machine learning code is easier to adapt and maintain post-deployment, it saves time. Time is money.

Code reuse and readability. Code that can be understood easily can be re-used between members of your team on other related projects. Easy to read code also saves already burned-out developers from being interrupted mid-holiday about how their code works, or more often, doesn’t. It saves time. Time is money.

Integration into other code. As non-ML and ML code bases merge, the distinction between the two will become less clear in commercial environments. Thus it is essential that the two can understand each others code, and as far as possible, a common coding standard is implemented between non-ML, ML, data and ML Ops engineers.

Whilst no one is suggesting that C# should adhere to the same standards as Python code, a degree of agreement between standards across the two code bases promotes understanding and removes some of the circumstances for error at that most critical of places, the coding interface. Error fixing costs time. If you can save time…you know the rest.

Diversion: A Lesson From History

Joseph Whitworth knew a thing about standards back in the 1840s. You may not know his name, but as an IT professional, you’ll have heard of one of his employer’s customers, the Victorian researcher and inventor Charles Babbage, the father of computing.

Whitworth worked for Joseph Clement, where they were building Babbage’s Difference Engine. Incredible as it may seem today, until that time, screw and bolt threads were made individually (!) and were largely incompatible with each other. Babbage employed Clement to produce his Difference Engine, partly because it required the production of thousands of identical threaded bolts, which Clement with his reputation for workmanship and accuracy could provide (2), (5).

Clement’s employee, Whitworth, saw the cost and interoperability advantages of this, and later went on to advocate the first standard system for screw threads. Because of interoperability, his system became widely adopted and his single, simple idea, standardisation, is still in use today (6). Without it, our visit to the DIY store for a replacement bolt or nut would be both frustrating and considerably more expensive.

Standards: What’s in it for me?

As a programmer and as a machine learning engineer, I want to make things work swiftly, and I want to understand code, be it written by others or myself.

If I come back to my own code months or years later, I want to understand it, rapidly. I don’t want my own coding “bad grammar” to get in the way of re-using my previous work and building upon it. Whether that’s my understanding of my code, my understanding of someone else’s code, or their understanding of mine. Interoperability between people and across time is king. No one wants to be coding at 7pm on a Friday night in an empty office.

As an ML Architect, I mentor others by providing coding examples. I want to communicate my ideas. Well written code is the best way to communicate coding ideas. It demonstrates the clarity of your own thinking to others, in order that they can think clearly about the problem too, and have a well- written example to work with.

As soon as your own code achieves the standard of being a good educational aid to others, it becomes a scalable asset to them in learning ML, as they can easily understand your code. The problem itself is then the only challenge, not the code that is representing it. Clarity and simplicity is the key.

The knock-on effects are reduced learning time, correctness and clarity of comprehension by the learner, and of course the capability of your code being re-used by someone else as a great example of how to do it properly. This in turn has productivity, efficiency and cost benefits. Your code has become a scalable asset, a reference example for others to use.

As an ML engineer and employee, I don’t just want people to find my web page. I want them to spend minutes reading the content and bookmarking it for future reference. It’s this page stickiness that propels and keeps pages at the top of search listings, not old fashioned keyword SEO manipulation.

Website owners know that if they serve up pages that people spend time on, those people are likely to come back to their website again, and others are likely to find that page useful too. The page usefulness increases the value of the website. And the number of hits on your own page.

So creating code people can easily understand is beneficial, as your own page is more likely to get promoted and recommended to others. Which creates a flywheel effect, a virtuous circle in creating a name for yourself in the wider ML community. This in turn feeds into peer recognition, opportunity, interesting projects, career prospects and ultimately money as you become a recognised authority on your subject.

Solution

At least one company I’m aware of throws away JuPyTer notebook code and starts again for production. This is something I have never seen in a non-ML software engineering environment. Whilst this approach ensures production quality code reaches production, the downsides are numerous. Not only does it cost time and therefore money, it is neither scalable or efficient.

Worse still, re-writing ML code introduces risk through error.

This has the potential to be magnified further, as if the ML research coder’s intent is not clear before it is re-coded for production, the potential for misunderstanding that intent whilst re-coding for production is greater. Thus, error may be introduced or replicated into the production environment, where it becomes at least an order of magnitude costlier (1), (4) to rectify than if it was written properly in the development environment.

Agreeing standards across teams assist in eliminating this problem. By deciding on a standard with each other, complete code re-writes are eliminated by making code closer to being production-ready, by being readable, and therefore reusable. Indeed, research code can form the backbone of the production code if it is well enough written.

Coding standards are akin to use of indicators in a car: Whilst they are not essential to driving, signalling your intent to others around you so that they have a good idea what you’re about to do avoids misunderstandings and the negative consequences that follow.

A Final Thought

We hire ML Engineers for their knowledge of ML. That makes them intelligent people. The vast majority are not prima donnas. So, whilst we’re not asking them to write entire production pipelines in PySpark or data extracts in SQL, is it really too much to ask them to write good ML code for their machine learning model that matches agreed and accepted production practices used across teams, and which benefits them as well as their employers?

It’s time to bring good coding practices into ML Engineering. Babbage, Clement and Whitworth would be proud of us.

References

(1) Steve McConnell Code Complete, First Edition, (out of print, find used, via Alibris): https://www.alibris.com/search/books/isbn/9781556154843?qwork=7668259

(2) Joseph Whitworth, Babbage, Screw Thread Standardisation: https://www.boltscience.com/pages/screw2.htm

(3) Sebastian Schuchmann, The First AI Winter: https://towardsdatascience.com/history-of-the-first-ai-winter-6f8c2186f80b

(4) Dawson, Burrell, Rahim and Brewster, Integrating Software Assurance into the Software Development Lifecycle (SDLC): https://www.researchgate.net/publication/255965523_Integrating_Software_Assurance_into_the_Software_Development_Life_Cycle_SDLC

(5) The Science Museum Group, Clement, Joseph 1779–1844 https://collection.sciencemuseumgroup.org.uk/people/ap28821/clement-joseph

(6) Graces Guide, Joseph Whitworth, Screw Threads https://www.gracesguide.co.uk/Joseph_Whitworth