Writing

Deploy Machine Learning Model

Step-by-step tutorial for packaging and deploying a machine learning app to Google Cloud Run with Docker and Cloud Shell.

April 20, 2019 2 min read from the archive original publication

Runtime: Cloud Run
App: GPT-2 Flask
Packaging: Docker
Config: 2GB / 2 CPU

Google Cloud Run deployment tutorial screenshot. — Start Cloud Shell from the top-right icon.

You are a hobbyist Machine Learning developer. You come across tons of exciting news related to artificial intelligence. You followed online tutorials and built something cool. Next, you want to show your creation to the world.

If you have been in this situation, you know there are very few options available to you. But this changed when Google announced Cloud Run.

Cloud Run Is the Game Changer

Cloud Run is one of the most exciting additions to Google Cloud. In this article, we deploy an open source pre-trained deep learning model on Cloud Run.

Getting Started with Google Cloud

If you do not have an active Google Cloud account, you can sign up and start Cloud Shell once the project is ready.

Into The Code

For this tutorial, I used an existing deep learning project from GitHub: gpt-2xy. It uses HuggingFace's PyTorch implementation of GPT-2.

If you only want to deploy, you can skip this section. Otherwise, the first file is a minimal web UI.

Minimal web UI for GPT-2 — Minimal web user interface for GPT-2.

The model logic extends input text with GPT-2.

Model code for GPT-2 text extension — Loading and testing the model with a simple text phrase.

Finally, a Flask server serves both the user interface and the API.

Flask server code — Web server code for the interface and generation endpoint.

Requirements

You can test the project locally with these requirements:

PyTorch, CPU version is fine
transformers
flask

python main.py

Containerizing the Project

Containerizing with Docker

Next, build a Docker image so the project can be deployed to Cloud Run.

Dockerfile for Cloud Run — Dockerfile used to deploy the Flask app to Cloud Run.

Building in Cloud Shell

You can build locally and push to Google Cloud, but if your internet is slow, Cloud Shell is more convenient. First get your project ID:

gcloud config list --format 'value(core.project)' 2>/dev/null

Docker Setup Steps

Then replace [PROJECT_ID] in the following commands:

git clone https://github.com/NaxAlpha/gpt-2xy
cd gpt-2xy
docker build -t gcr.io/[PROJECT_ID]/gpt-2xy .
gcloud auth configure-docker
docker push gcr.io/[PROJECT_ID]/gpt-2xy

Deploying to Cloud Run

From the top-left menu, go to Cloud Run.

Navigate to Cloud Run in Google Cloud Console — Navigate to Cloud Run from the Google Cloud Console.

Then click create service.

Create service button in Cloud Run — Create Service starts the Cloud Run deployment flow.

Important Configuration

Enable Allow unauthenticated invocations, then open optional settings.

Change memory to 2GB. Setting CPUs to 2 is also recommended because it makes generation faster.

Cloud Run optional settings — Memory, CPU, and optional settings for the service.

Deployment Process

Click Create. After the deployment finishes, the app is ready.

Custom Domain (Optional)

You can also map a custom domain. I had deployed my own version at the time, but I later took it down because the project is old. The code is still usable if you want to deploy your own version.

Demo of deployed GPT-2 application — I typed “Recently, tech giants” and the rest was written by AI.

Conclusion

The full source code is still available in the gpt-2xy repository.

Update (19-07-2020)

Prebuilt Docker image commands — Commands for using the prebuilt Docker image.