Last week, Google DeepMind announced that they had open sourced Sonnet, a software library that draws on DeepMind’s internal best practices for quickly building neural network modules in TensorFlow. This is a great resource, leveraging the collective experiences of some of their 250 highly skilled engineers, released to enable others to more effectively apply machine learning to their problems.

In fact, over the last few years, the world’s biggest tech companies (including Google, Facebook, Microsoft, IBM, Baidu, Amazon and more) and university research labs have open sourced at least 2.5 million lines of machine learning platform code (see table below), which equates to over 650 man years or $80m in development costs.

These toolkits are now freely available online, and many such as TensorFlow and Paddle are accompanied by significant volumes of training and example materials. As such, they can be viewed as an incredible initial investment into any company looking to use machine learning as an enabling technology for their product.

LOC and Person Year estimates from openhub.net for selection of most popular machine learning frameworks. Lines of code is current rather than at time of open sourcing

This trend towards open sourcing seems set to continue, driven by researchers and engineers from academic backgrounds who push their employers for the ability to continue to contribute back to the research community. This creates an interesting question around deployment of talent — is it better for the ecosystem to have its best AI engineers pooled in a small number of organisations doing core research and open sourcing key parts of their output, or to have this talent embedded as small teams across a much larger number of organisations where they can work to solve specific commercial problems?

Many early stage companies looking to hire engineers to work on machine learning problems find it hard to compete with the likes of Google and Facebook, who have invested significant resources into creating the best possible environments to do AI research. If this work was exclusively undertaken behind closed doors and for their employer’s gain, this ‘hoarding’ of talent could be seen as damaging to wider innovation.

However, by open-sourcing this work, these companies are actually accelerating the pace at which broader innovation is possible, by providing a significant head start to developers building their own businesses applying these technologies. And companies are doing this today, with over 7,000 job postings worldwide currently listed on LinkedIn which specifically mention one of the above frameworks.

In Matt Turck from FirstMark’s recent piece “Firing on All Cylinders: The 2017 Big Data Landscape”, he writes:

We’re witnessing the emergence of a new stack, where Big Data technologies are used to handle core data engineering challenges, and machine learning is used to extract value from the data (in the form of analytical insights, or actions). In other words: Big Data provides the pipes, and AI provides the smarts.

We have now reached an exciting point where the powerful components of this new “big data + AI” stack are available to be applied to a diverse range of real world problems.

Initial applications here show significant potential. High-profile examples, like DeepMind reducing Google’s already optimised data centre cooling costs by 40%, demonstrate clear and measurable impact, and the early results from companies I have met using these technologies across healthcare, agriculture, logistics and business processes give me confidence of the broad applicability here. For example:

  • Kheiron uses computer vision to analyse medical images and automate radiology reporting tasks, saving up to 60% of a radiologist’s time
  • Connecterra is using machine learning to analyse data from sensors connected to cattle and identify early behaviours, which can be acted on to improve milk yield by over 20%
  • DigitalGenius uses machine learning to mine customer service data and provide automated response suggestions to customer support staff, increasing their capacity by ~30%.


Normal Yield Curve

If we think about the impact of AI on a problem as a Normal Yield Curve, where ‘Maturity’ is the effort spent applying AI to a problem, and ‘Yield’ is the measurable improvement against the current baseline, what is most exciting about our current position is that for so many problems today we are right at the start, in the period of rapid improvements. In ten years time the curves will have flattened and we will have to work hard to draw out small improvements, but for now there are still huge gains to play for across many new applications.

To be clear, building an AI and Machine Learning driven company today is no small undertaking (Nathan Benaich’s recent presentation “So, you want to found an AI startup” shares some great thoughts on doing this), with real challenges in hiring suitable talent, accessing training data and gaining commercial validation. However, with a large and growing open source toolkit and the vast number of problems waiting to have machine learning applied, two of the key pieces are in place to act as a boost to the great entrepreneurs, researchers and engineers looking to do so.

This article first appeared on Medium