The promise of these Google-designed chips is that they can run specific machine learning workflows significantly faster than the standard GPUs that most developers use today. For Google, one of the advantages of these TPUs is that they also use less power, something developers probably don’t care quite as much about, but that allows Google to offer this service at a lower cost.
The company first announced Cloud TPUs at its I/O developer conference nine months ago (and gave access to them to a limited number of developers and researchers). Each Cloud TPU features four custom ASICs with 64 GB of high-bandwidth memory. According to Google, the peak performance of a single TPU board is 180 teraflops.
Developers who already use TensorFlow don’t have to make any major changes to their code to use this service. For the time being, though, Cloud TPUs aren’t quite available at a click of a button, though. “To manage access,” as Google says, developers have to request a Cloud TPU quota and describe what they want to do with the service. Once they get in, usage will be billed at $6.50 per Cloud TPU and hour. In comparison, access to standard Tesla P100 GPUs in the U.S. runs at $1.46 per hour, though the maximum performance here is about 21 teraflops of FP16 performance.
Google’s reputation for machine learning will surely drive a lot of new users to these Cloud TPUs. In the long run, though, what’s maybe just as important is that this gives the Google Cloud a way to differentiate itself from the AWS’s and Azure’s of this world. For the most part, after all, everybody now offers the same set of basic cloud computing services and the advent of containers has made it easier than every to move workloads from one platform to another. With the combination of TensorFlow and TPUs, Google can now offer a service that few will be able to match in the short term.