Skip to main content
PhD student Jack West and Assistant Professor Kassem Fawaz use Instagram’s vision model
April 2, 2024

Popular social media mobile apps extract data from photos on your phone, introducing both bias and errors

Written By: Jason Daley

Digital privacy and security engineers at the University of Wisconsin-Madison have found that both TikTok and Instagram extract different levels of personal and demographic data from user images, but can misclassify aspects of the images.

Led by Kassem Fawaz, an associate professor of electrical and computer engineering at UW-Madison, the researchers studied the social media platforms’ mobile apps to understand what types of information their machine learning vision models collect about users from their photographs—and importantly, whether the models accurately recognize demographic differences and age.

The team will present a paper on its findings at the IEEE Symposium on Security and Privacy in San Francisco in May 2024.

Many mobile applications use machine learning or AI systems called “vision models” to look at images on a user’s phone and extract data, which can be useful in facial recognition or in verifying a user’s age. Not too long ago, this process took place in the cloud; vision models would send user data to an offsite server to be processed.

“Nowadays, phones are fast enough where they can actually do the machine learning directly on the device, which not only saves money, but it also allows for more data to be used, and for different types of data to be produced,” says PhD student Jack West, who worked on the project with PhD student Shimaa Ahmed and Fawaz.

Because that processing now happens locally, on people’s devices, it also means researchers can look more closely at the AI models and the types of data they collect and process.

In its project, the team looked at Instagram and TikTok to determine what types of information their vision models collect and how those models process information. West created a custom operating system to track information put into the vision model and to collect the model’s output. The team did not try to extract or reverse-engineer the vision model itself, which would violate the apps’ terms of service.

“We opened the app and found where the input is happening and what the output is,” explains Fawaz. “We were basically watching the apps in action.”

They found that on TikTok, when users choose photo from their phone’s camera app, the vision model automatically predicts the age and gender of the person or people in that image. Using that understanding, they ran a model data set of more than 40,000 faces through the vision model and found that the model made more mistakes classifying people under 18 than over 18. For people ages 0 to 2, the model often classified them as being between 12 and 18 years old.

When they did a similar analysis of Instagram, the researchers found that its vision model categorized more than 500 different “concepts,” including age and gender, time of day, background images, and even what foods people were eating in the photographs.

“That’s a lot of information,” says Ahmed. “We found 11 of these concepts to be related to facial features, like hair color, having a beard, eyeglasses, jewelry, et cetera.”

To test the Instagram vision model, the researchers showed it a set of AI-generated images of people from four ethnicities, then looked to see if Instagram could correctly determine the 11 face-related characteristics. While Instagram was much better at classifying images by age than TikTok, it had its own set of issues. “It didn’t perform as well across all demographics, and seemed biased against one group,” says Ahmed.

So what, exactly, are the apps doing with this information? It’s not totally clear. “The moment you select a photo on Instagram, regardless of whether you discard it, the app analyzes the photo and grows a local cache of information,” says West. “The data is stored locally, on your device—and we have no evidence it was accessed or sent. But it’s there.”

If Instagram and TikTok are using the data for purposes like age or identity verification, the researchers believe the technology has room for improvement. Decreasing bias in these types of vision models, they say, can help ensure all users receive fair and accurate digital services in the future.

Other UW-Madison authors include Maggie Bartig and Professor Suman Banerjee. Other authors include Lea Thiemt of the Technical University of Munich.

The researchers acknowledge support from the DARPA GARD program under agreement number 885000; the National Science Foundation through awards CNS-1942014, CNS-2003129, and CNS-2247381; and the Wisconsin Alumni Research Foundation.

Featured image caption: PhD student Jack West (left) and Assistant Professor Kassem Fawaz (right) use Instagram’s vision model to categorize more than 500 different “concepts” in video feeds of their co-authors, Lea Thiemt (top of the screen) and Shimaa Ahmed (bottom). Credit: Joel Hallberg.