⚠️ Yak shaving risk. ⚠️ Outdated info risk.
tl;dr Google cloud ML is probably excellent if you design your algorithm from the ground up for it, but if you have some thing that runs perfectly well on your laptop and you wrote it to use, e.g. local files, or modern versions of python, or custom compiled code, then you are going to need to substantially rewrite. NB this all is outdated now.
The goal: Working through getting and analysing the magnatagatune dataset in google cloud on my macOS laptop using tensorflow.
I will follow approximately the least-nerview HOWTO, which sadly conviently sidesteps many of my difficulties by having the input data be magically good.
There are too many ways to get the damn thing going.
There is much documentation for all these things, but its often unclear from page to page what the hell is happening, since it’s not clear whether you are spinning up VMs locally or in google’s cloud.
One way is the purely cloud-based
cloud shell, but this is clearly too fragile and restrictive for real usage unless you are in Mountain View.
Offline there is a a docker based thing, called
which AFAICT is a somewhat monolithic machine image which approximates the online APIs or something.
here is a surly google help page which implies as much.
Or you can install a bunch of python packages from the command line.
Do I need this datalab nonsense? I can’t tell. I just want to run tensorflow. Idea: proceed installing stuff randomly until eventually I have finished a deep-learning-based paper.
gcloud config set core/project learning-gamelan gcloud config set compute/zone asia-east1-a
Cloud datalab thingy
First you must install docker.
Or must I? I’m so awash right here.
datalab create learning_gamelan
Then do all the actual work.
datalab delete learning_gamelan
Getting data into cloud datastore
Complicated and tiresome. Not in the sense that it is too complicated if I really am fitting a million-parameter regression, but too complicated in the sense that I am just one grad student doing a side project I don’t have 2 weeks of coding time to fit their data ingestion workflow.
It’s not a bad workflow as such I suppose, it’s just massive overhead for my small project.
Now, port my python 3 code to python 2
This is an aspirational section; I won’t actually get here.
Oh sod it, just give me a normal virtual machine
ARGH their tensorflow nonsense is melting my brain I just want to use the sweet prototype on my laptop but go slightly faster.
Maybe I can rent a machine?
Oh wait! I don’t get GPUs, so my entire motivation for using this google stuff (I had some free credits) is hereby obliterated.
Sod this, I’m going to rent myself a big-ass GPU from Amazon and get this finished.