Scaling wearable foundation models


Wearable devices that measure physiological and behavioral signals have become commonplace. There is growing evidence that these devices can have a meaningful impact promoting healthy behaviors, detecting diseases, and improving the design and implementation of treatments. These devices generate vast amounts of continuous, longitudinal, and multimodal data. However, raw data from signals like electrodermal activity or accelerometer values are difficult for consumers and experts to interpret. To address this challenge, algorithms have been developed to convert sensor outputs into more meaningful representations.

Historically, algorithms for wearable sensors have relied on supervised, discriminative models (i.e., a class of models often used for classification) designed to detect specific events or activities (e.g., recognizing whether a user is running). This approach, however, faces several significant limitations. First, the limited volume and severe class imbalance of the labeled events means that there are large amounts of potentially valuable unlabeled data left unused. Second, supervised models are trained to do only one task (e.g., classification) and thus create representations that may not generalize to other tasks. Third, there can be limited heterogeneity in the training data since it is frequently collected from small study populations (usually tens or hundreds of participants).

Self-supervised learning (SSL) using generic pretext tasks (e.g., rearranging image patches akin to solving a jigsaw puzzle or filling in missing parts of an image) can yield versatile representations that are useful for multiple types of downstream applications. SSL can be used to leverage a much larger proportion of the data available, without bias to labeled data regions (e.g., a limited number of subjects with self-reported labels of exercise segments). These benefits have inspired efforts to apply similar training strategies to create models with large volumes of unlabeled data from wearable devices.

Building on this, the empirical and theoretical success of scaling laws in neural models indicates that model performance improves predictably with increases in data, compute, and parameters. These results prompt a critical question: Do scaling laws apply to models trained on wearable sensor data? The answer to this question is not immediately obvious, as the sensor inputs capture information that is quite different from language, video or audio. Understanding how scaling manifests in this domain could not only shape model design but also enhance generalization across diverse tasks and datasets.

In “Scaling Wearable Foundation Models”, we investigate whether the principles driving the scaling of neural networks in domains like text and image data also extend to large-scale, multimodal wearable sensor data. We present the results of our scaling experiments on the largest wearable dataset published to date, consisting of over 40 million hours of de-identified multimodal sensor data from 165,000 users. We leverage this dataset to train a foundation model, which we refer to as the Large Sensor Model (LSM). We demonstrate the scaling properties of this dataset and model with respect to data, compute, and model parameters, showing performance gains of up to 38% over traditional imputation methods.

Leave a Reply

Your email address will not be published. Required fields are marked *