VIMAGE: Accelerate, Simplify, Deploy

Scalable GPU-backed solution on Google Cloud.

VIMAGE is an early image-to-video AI application, with over 10 million total downloads on Google Play. It is primarily built by a group of just 5 people (as of March 6, 2023). An overview of the app’s capabilities is shown in this promotional video:

The app works by decomposing an uploaded image into two parts: foreground and background, and manipulating the two parts to add effects like a parallax. Even though the technology behind this is simple, the results (as seen above) look quite impressive.

Project Requirements

The core of the project consists of the following three parts:

  • Handling image I/O in a way that respects end-user privacy
  • Creating an accurate depth map of the input RGB image
  • Deciding which pixels fall within the background and which within the foreground
Original Image Depth map
Modified Image Raw image
Creating an accurate depth map from an RGB image is a very important step in background separation.

In addition, there were other business restrictions in place:

  • Prioritize cost saving
  • Ensure scalability - the app had frequent usage spikes
  • Make the app faster
  • Add basic Firebase authentication
  • Host all of it on Google Cloud Platform to make use of credits

Implementation

The most pressing constraint of this project was the GPU requirements of the core layer of the project. This, coupled with the scalability requirements (i.e., a fully managed GPU provider platform), narrowed down our choice of compute providers significantly. At this stage, we had just three options:

  • Set up a manual lightweight server that allocates and deallocates GPU-powered VMs from Google Cloud Compute service.
  • Investigate the Google Cloud Run Anthos service (which reportedly provides serverless GPUs).
  • Investigate whether Google Cloud Vertex AI could be used for our requirements.

To avoid a costly research phase, we consulted with a certified Google Cloud architect and, with their insight, decided to go with the Vertex AI service. The rough outline of the project was specified during a packed one-hour-long meeting:

  • Use Vertex AI Endpoint to access the cheapest GPU VM (T4 GPU instance backed with N2-family CPU).
  • Leverage the scalability of Vertex AI Endpoints. VAIE is scalable by default but is constrained to have at least one instance always running. Usually, this means a monthly bill of around $300, and for personal, ultra-low-use projects, this kind of bill is a big no-no. However, given our customer’s access to a large amount of budget, this price was of no concern.

However, VAIE also comes with a few drawbacks. Namely:

Reflections

The VIMAGE project was the first large-scale project I worked on that had active users from day one. The application backend profiling, bottleneck identification, and optimizations were the easiest parts to complete. Basic security and authentication, to my surprise, were even easier to add. The most time-consuming part was babysitting the Vertex AI deployments. I find it mind-boggling just how suboptimal the backend implementation was as it was presented to me. I think with this project I’ve learned a very valuable lesson in designing successful applications: make it work first, and you can make it optimal later.