Accelerating AI inference workloads

0 views

0 0

Accelerating AI inference workloads

Deploying AI models at scale demands high-performance inference capabilities. Google Cloud offers a range of cloud tensor processing units (TPUs) and NVIDIA-powered graphics processing unit (GPU) VMs. Join Debi Cabrera as she sits down with Alex Spiridonov, Group Product Manager, to discuss key considerations for choosing TPUs and GPUs for your inference needs. Watch along and understand the cost implications, how to deploy and optimize your inference pipeline on Google Cloud, and more!

Chapters:
0:00 – Meet Alex
2:52 – Balancing cost and efficiency
5:51 – TPU vs GPU for AI models
8:21 – Getting started with Google Cloud TPUs and GPUs
10:05 – Common challenges when using inference optimization
12:10 – Available resources for AI inference workloads
13:13 – Wrap up

Resources:
Watch the full session here → https://goo.gle/3JC32qx
Check out Alex’s blog post → https://goo.gle/3wa2DZb
JetStream GitHub → https://goo.gle/49SoSRj
MaxDiffusion GitHub → https://goo.gle/4aQ1g11
MaxText GitHub → https://goo.gle/49SoYZb

Watch more Cloud Next 2024 → https://goo.gle/Next-24
Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech

#GoogleCloudNext #GoogleGemini

Event: Google Cloud Next 2024
Speakers: Debi Cabrera, Alex Spiridonov
Products Mentioned: Cloud TPUs, Cloud GPUs

Date: April 30, 2024