A deep dive with Google AI Edge’s MediaPipe


Large language models (LLMs) are incredible tools that enable new ways for humans to interact with computers and devices. These models are frequently run on specialized server farms, with requests and responses ferried over an internet connection. Running models fully on-device is an appealing alternative, as this can eliminate server costs, ensure a higher degree of user privacy, and even allow for offline usage. However, doing so is a true stress test for machine learning infrastructure: even “small” LLMs usually have billions of parameters and sizes measured in the gigabytes (GB), which can easily overload memory and compute capabilities.

Earlier this year, Google AI Edge’s MediaPipe (a framework for efficient on-device pipelines) launched a new experimental cross-platform LLM inference API that can utilize device GPUs to run small LLMs across Android, iOS, and web with maximal performance. At launch, it was capable of running four openly available LLMs fully on-device: Gemma, Phi 2, Falcon, and Stable LM. These models range in size from 1 to 3 billion parameters.

At the time, these were also the largest models our system was capable of running in the browser. To achieve such broad platform reach, our system first targeted mobile devices. We then upgraded it to run in the browser, preserving speed but also gaining complexity in the process, due to the upgrade’s additional limitations on usage and memory. Loading larger models would have overrun several of these new memory limits (discussed more below). In addition, our mitigation options were limited substantially by two key system requirements: (1) a single library that could adapt to many models and (2) the ability to consume the single-file .tflite format used across many of our products.

Today, we are eager to share an update to our web API. This includes a web-specific redesign of our model loading system to address these challenges, which enables us to run much larger models like Gemma 1.1 7B. Comprising 7 billion parameters, this 8.6GB file is several times larger than any model we’ve run in a browser previously, and the quality improvement in its responses is correspondingly significant — try it out for yourself in MediaPipe Studio!

Leave a Reply

Your email address will not be published. Required fields are marked *