Deploying machine learning models: a simple guide
•
June 8, 2022
Serving and deploying Machine Learning models is a topic that can get complicated quite fast. At Amplemarket, my team and I, like to keep things simple.
Let me show you how.
Serving trained machine learning models: A step-by-step guide
1. Understanding model serving in machine learning: Simplicity is key
At Amplemarket, we’re big fans of simplifying. We know that the less code we have to write, the fewer bugs we're likely to introduce. FastAPI allows us to take a trained model and create an API for it, in less than 10 lines of code! That’s pretty incredible.
Let’s say you are creating a model that is trained on the Iris data set that predicts the species of a plant given some information about it. Previously, we would have saved our model to a pickle file, and sent it to our engineering team; perhaps we would have even scheduled a meeting with them to discuss how to use this model. We would also have sent some documentation on how to use that model and how it can inference on some data.
But the likelihood that something will go wrong in one of those steps is greater than we’d like.
Every time the model gets updated, we would need to have another discussion, about how the model has changed, and what gets improved - documentation would get outdated, etc.
To avoid dealing with that house of cards, we at Amplemarket have settled on a process that significantly reduces the complexity of deploying the machine learning models we develop.
As an example, the following snippet trains a classifier and saves it to the model.joblib file:
Such a simple setup to launch an API ensures that our Machine Learning team is not limited to creating these models, but also responsible for making them available to our development team, or even directly to our users.
“You built it? You ship it”
2. Reduce the need for coordination by operationalizing machine learning models
As a remote, distributed, and asynchronous team, documentation is huge for us. We don’t want to have a meeting with 5 other departments every time we build or ship a new model. We want to document as best we can, with the least effort possible.
These docs already tell a lot about the application: what endpoints it has, what type of queries the user can send, etc. We love FastAPI because it takes little work to enrich the documentation further and minimize the need for future coordination.
If we now turn back to our docs, we see that the markdown we've added has been rendered at the top of the page. Now, when someone needs to check some details about this model, all the information that person might need is neatly described - and we even provide a support email if they run into trouble!
But FastAPI doesn’t stop there.
With a couple more lines of code, and thanks to a small library called Pydantic, we can also add data validation to our model’s API. By doing so, API users will know what kind of data it expects to receive and what kind of data the API will respond back with.
We start by creating two classes, one to handle requests, and the other for responses:
And we tweak our endpoint code:
As an added bonus, the model Config classes also help provide developers with an example request and response:
With this documentation page at hand, our users know exactly what our API is, what it expects, and what it will reply back.
All of this comes without us having to invest much time and effort into ensuring the documentation has all the information that future developers might need. And notice how we didn’t have to write any extra documentation.
Our documentation is our code.
3. Best practices for serving machine learning models: Make it fast (enough!)
Python is far from the fastest language out there, nor does it claim to be. We don’t use Python for its speed, but for its ecosystem, especially as it relates to data science and its many needs.
Even with several claims that FastAPI is a very performant web framework, we know we’re not using the fastest web framework out there.
However, even if FastAPI was slower than it currently is, we would still be willing to compromise that speed for time-to-market, documentation, and ease of use.
When deploying, Sebastián Ramírez offers a FastAPI/uvicorn high-performance docker image with auto-tuning. Allowing the app to scale according to the number of available CPU cores on the machine it's running on:
If you’re running on Kubernetes or something like that, you probably don’t need this image - but if you have a simpler setup, this image will come very in handy!
4. Deploy your machine learning model with confidence
At Amplemarket we like to adopt best practices from Software Development and apply them to our Machine Learning projects. That means every model we develop and deploy is version controlled, tested, and continuously deployed.
Using CI/CD automation, we can continuously deploy our model and code to several targets (e.g., staging and production). Allowing us to serve different versions of our model in different endpoints, and roll back with ease.
FastAPI also allows us to easily test our code. This is especially important when we are inferencing. What if we receive a different set of numbers? Will we make a prediction? What if the user sends us a string, and it gets evaluated as a float? How do we account for that? Testing matters. And FastAPI allows us to do it with ease.
Closing thoughts
FastAPI has been particularly valuable when serving our Machine Learning models. Our development team is especially happy with the high level of documentation and data validation that our APIs offer and our users also get those benefits.
Thanks to FastAPI, we’ve been able to predict with confidence, and I hope you got some inspiration from this post to gain confidence as well.
Amplemarket is growing! If you are interested in solving some difficult problems, we have a bunch of open positions!Serving and deploying Machine Learning models is a topic that can get complicated quite fast. At Amplemarket, my team and I, like to keep things simple.
Subscribe to Amplemarket Blog
Sales Tips, Email Resources, Marketing Content