Machine Learning — Build vs. Buy (or Sell and Optimize)

When is Shoplifting OK?

When I go to the Amazon Go store in Seattle, I always try to shoplift. I’m doing it many times, and so far I still failed. The Amazon Go has “just walk out” technology and just putting an item in my bag or pocket is not going to help me take the things and not pay for them. I tried different ideas, such as picking two items in a single pick, or getting closer to the shelf and hide the product that I take from the cameras on the ceiling, or put a product back in the wrong place and then pick it up again. My friends also tried picking up a drink, drinking it and placing the empty bottle back on the shelf (I wouldn’t do such a thing…). So far I couldn’t fool the system, and every time the receipt that I got after my visit was accurate. I can say that I did it to debug the system, but it was always to test the extents of the magic. For me, even after dozens of visits, going into the Amazon Go store and leaving 45 seconds later with my lunch is magic. I’ll use this example to demonstrate the process of building magical services such as Amazon Go.

Build or Buy?

A frequent discussion in IT departments in companies is “should we buy or should we build?” It is clear that off-the-shelf products are never perfect for a critical system of a business, but it is often good enough for the need of the company. The external vendors are trying to solve a common problem (CRM, Database, ERP, for example), and allow enough easy customization points for each business to tweak the general purpose system to their unique requirements. Machine learning systems are not different in this regard. For example, a recommendation engine is not easy to build, and many products are making recommendations out of the box. However, most companies that base their business on the quality of their recommendations are building recommendations engines themselves and spending much money to do so.

How do you decide?

The primary criterion to make this decision is the value of the new system for the company, and therefore the investment the business is willing to put into it. If you estimate the business value and compare the systems, you would see a power graph that describes many phenomena in nature.

ML Systems Value and Cost Distribution

The majority of the systems should fall into the category of the long-tail, which are many systems that are giving each a small contribution to the business. The long-tail is very important and should not be ignored. First, it is contributing a lot to the bottom line of the business. Second, it is the source of the few hits, as it is impossible to predict correctly what will be successful finally. Lastly, it is easy to start with these systems with the availability of many cloud-based APIs.

Split between the long tail (Buy) and the giants (Build)

There are many general purpose APIs from the various cloud and SaaS providers, such as image classification, text processing, speech recognition, machine translation, and others. These APIs are hard to build as they are based on a lot of science and a lot of data, which the big companies such as Google, Amazon, IBM or Microsoft can provide. On the other hand, they are mostly general purpose and are not going to handle your data correctly. For example, if you have a lot of legal documents, natural language processing (NLP) API will be able to analyze these documents and find name entities. However, if you train machine learning models specifically on your text documents to find specific outputs such as precedents, you will find it more useful. The same logic is applied to other types of materials (financial, hotel reviews, sports news, political discussion transcripts, etc.) and different types of data (images, videos, events, audio, etc.).

The simple advice is to start with the available APIs and try to use them for the new ML-based system. You will get the sense quickly if the system has a high potential to be useful and should belong on the left side of the chart (the “hits”). At this point, it will also be clear if you need to build-your-own-model (BYOM) and invest more into the system with more people (data engineers, data scientists, and developers) and more data and time for them to iterate to spin the flywheel of the ML system.

Not all the API are coming from general purpose cloud providers. In various verticals, there are strong partners (integrated systems vendors — ISV, or system integrators — SI) that built a specialization in these verticals and can provide a more specific solution to companies in this domain.

The additional split where vertical partners are operating (Sell)

The partners’ section is useful in multiple aspects. First, it is an excellent place to find a more specific solution to a problem in your domain, such as a recommendation engine. However, more importantly, it can give you an extra incentive to invest in building your own models. Most of the systems that are built inside of companies are used internally, but it also opens an excellent opportunity to turn it into a profit center. There are many reasons for that from a revenues source with very high margins, to controlling the technology that is used by other companies in your domain. It might seem insane to take the secret sauce of your company, that was developed with much effort, and give it to your competitors. However, it is also the way to penetrate markets that you don’t have, generate income and remove the incentives of your competitors to BYOM as they can get the functionality from your API.

For example, Amazon is famous for its ability to turn internal systems into businesses. From Amazon Web Services that started as internal IT APIs to become a $20B business, Fulfilled-by-Amazon (FBA) that is shipping billions of items for 3rd party sellers, Alexa tens of thousands of 3rd party Skills and many others.

Many companies are not defining themselves as a technology company. Most companies started as a real-world service such as marketing, logistic, retail, insurance, etc. All these businesses are hard to scale and be extended globally. Technology and machine learning based API specifically, are much easier to scale up and be used globally and help these companies to add technology to their business model.

The last step in this evolution of ML systems that appears on our systems chart is the need to invest even more in the most successful systems, and optimize them dramatically.

The last split for the heavier investment (Optimize)

There are many opportunities for optimization in the domain of machine learning, but since this domain is relatively young, such optimizations are hard and should wait for the proofs that it is worth to invest in it.

You can even take it to the extreme as I see it Amazon “I don’t care how much it costs to build it, I mainly care about how much it costs to operate it.” The logic behind it is simple: if you built something successful, the investment in the process of building it is negligible compared to the cost of operating it in scale for an extended period. Many factors are working in your favor to be able to optimize and reduce the cost of the large-scale operation dramatically. The economy of scale will reduce the cost of the hardware, as cost per instance is getting lower with scale, and the success of the service will attract more talent and budget.

“Amazon Go” Use Case

Amazon Go is an excellent use case to discuss in this context. Amazon Go is a concept of a store without lines. Many people are wondering why Amazon, the leader in e-commerce, is building a concept of a physical store. If you check the sales numbers, you can see that Amazon is growing both in volume and in market share, but still, online purchases are around 10% of the overall purchases. Even if the growth in e-commerce (or m-commerce and VR-commerce) continues, the majority of the business is still in physical stores for many years. A company, such as Amazon that wants to keep growing must expand to the physical stores, as customers will always want to buy there. It can be the need to touch, smell and see the product, or the satisfaction of getting the items immediately (lunch break, for example), but physical stores will be with us for a long time.

When Amazon decided to explore how to build a physical store that will have the good part of shopping (see, touch, smell, take…), but will avoid the bad part (wait in line), they tried many different solutions. They tried solutions such as using RFID tags. These are “Buy” technologies that are used in various logistics solutions and are easy to implement and try. However, the RFID tags were not a scalable solution, as they need to be attached to every item in the store, including sandwiches or soda cans. Once you see the value of systems (multiplying Amazon’s business), and you see that off-the-shelf solutions are not working well enough, it is time to “Build” the needed technology. Amazon spent much effort to build the machine learning technologies that can recognize all the items in the store, and identify when each customer is taking one of these items, or returning them to the shelves. It required a lot of computer vision science and development, and also a sensor fusion technology that is integrating multiple cameras to coordinate (one sees the face, one the action, and one the item, for example) to decide who bought what.

Once Amazon opened the first store based on “Amazon Go” technology in Seattle in 2016, it was only opened as “beta” for Amazon employees to test the system. A couple of days after the release of the video that described the service, I received a call from one of the largest convenience stores operators in the world, asking to buy “Amazon Go” for their stores. I contacted the GM of the team and asked him for details that I can share with this customer. After laughing for a few minutes, he told me that he is now busy building the technology for Amazon and using it in Amazon stores, and he is not ready to sell it to others. Nevertheless, some of the technologies that were developed as part of the Amazon Go projects are now available for “Sell” from Amazon Web Services, such as Kinesis Video (to stream the video from the store to be analyzed in the cloud), Rekognition Video (video analytics including people tracking), etc. Many other retailers with physical stores and operators of convenience stores are starting to build similar services, some of them are using these AWS services. With such retail-specific APIs, they can build their “X Go” system in a matter of months instead of the few years that Amazon invested.

Now starts the “Optimize” phase with the expansion of the technology to more stores across the US. It is one thing to have the store in the heart of Seattle and Amazon HQ and a completely different one to operate thousands of these stores globally. The main optimization is going into embedding some of the computer vision logic into the cameras in the stores instead of streaming the video to be analyzed in the cloud. In the next chapters, we will discuss “ML Inference at the Edge” in more details, and it is one of the most useful optimization options of machine learning in scale.


In this chapter, we discussed the decision on the level of investment that a machine learning project should get in an organization.

The journey of ML systems development within a company

I described the various stages of the maturity of the company and its machine learning practices and projects. The beginning (1) is mostly on the API level that is available from the various cloud and SaaS providers. The more valuable systems will shift (2) to the internal building (BYOM). Once the internal model is developed and working successfully with some internal customers, it is time to check if it makes sense to offer it as an external service and profit from it (3). The central question is can the model or system be generalized to solve various customer problems. If not, the shift should be to optimize it (4) to allow it to grow internally and efficiently.

Every company should draw its chart of machine learning systems it needs for its business, and use it to navigate the development of these systems: from the many “Buy” low hanging fruits APIs to the few strategic “Build” projects, to create new lines of business that are based on “Sell” these unique models for its verticals, to the “Optimized” business critical systems that are supporting and driving the growth of the company.

Guy Ernest is the co-founder and CTO of @Aiola, an AI company serving large enterprise AI transformation in the cloud. Guy is an ML-Hero of AWS.