Deep Learning in Cloud
A cost-focused comparison of cloud GPU options for deep learning across AWS, Paperspace, Colab, and Google Cloud preemptible machines.
- Free baseline
- Colab T4
- Best value
- T4 $0.35/hr
- High-end
- V100 $0.80/hr
- Tradeoff
- Preemptible
Cheap but High Quality
Google Colab is a free platform that provides GPUs for training. It is excellent for experiments, but not ideal for larger models such as BigGANs or large language models. The annoying part is always moving checkpoints to Google Drive and restoring them when the runtime disappears.
I had a GTX 1060 laptop, which was fine for prototyping models and training scripts. Once the scripts were ready, though, the laptop was not enough for bigger runs. That pushed me to compare cloud GPU options.
Amazon Web Services (AWS)
AWS was the obvious first place to look because it had the trust, maturity, and breadth of services. The problem was pricing. For the kind of training I wanted to do, GPU cost mattered more than the surrounding cloud ecosystem.
A p2.xlarge was about $0.90/hour, but K80 GPUs were weak for modern deep learning. A p3.2xlarge with V100 was around $3.00/hour, or roughly $1.20/hour as a spot instance.
Paperspace
My search continued because some models needed less compute than a V100, and even $1.20/hour was still expensive for hobby use.
Paperspace had lower-end GPUs at better prices: P4000 near desktop GTX 1070 performance for about $0.51/hour, P5000 around $0.78/hour, P6000 around $1.10/hour, and V100 around $2.30/hour.
I used Paperspace for a while, but the upfront storage cost was a drag because I only needed a few hours of training in a typical month. That kept the search going.
Google Cloud Platform (GCP)
Google Cloud was compelling because the preemptible prices were low enough for hobby deep learning work. A T4 at around $0.35/hour and a V100 at around $0.80/hour changed the economics.
The other useful difference from Colab was persistence. Preemptible machines can still shut down, but the machine data stays intact. As long as you save checkpoints regularly, interruption is annoying rather than destructive.
| Platform | GPU / machine | Approx. price at the time | My note |
|---|---|---|---|
| AWS | p2.xlarge / K80 | $0.90/hour | Cheap, but weak for deep learning. |
| AWS | p3.2xlarge / V100 | $3.00/hour | Powerful, but expensive. |
| AWS Spot | p3.2xlarge / V100 | $1.20/hour | Better, but still high for hobby use. |
| Paperspace | P4000 | $0.51/hour | Close to desktop GTX 1070 performance. |
| Paperspace | P5000 / P6000 | $0.78-$1.10/hour | Good mid-range options without spot pricing. |
| Google Cloud | T4 preemptible | $0.35/hour | The most attractive value for my workload. |
| Google Cloud | V100 preemptible | $0.80/hour | High-end compute at a much lower price. |
Key Takeaway
Google Cloud offered the best cost/performance story for this workload at the time, especially with preemptible instances. Colab remained the easiest free baseline, but preemptible GCP machines made longer or heavier experiments more practical.