Ask an expert: Fawaz on data privacy, ChatGPT

Kassem Fawaz is an associate professor of electrical and computer engineering. Among other areas, he focuses on online privacy and security: how we interact with mobile devices, online applications, and tools that access and use the information we share with them. Fawaz looks at tech privacy policies and practices, and he analyzes how devices and apps behave to collect user data in everyday settings.

In this interview, Fawaz talks about a few touchstone issues of data privacy—from how familiar apps, websites and devices collect your personal data to how artificial intelligence programs like ChatGPT present new challenges in the data security space.

Q: We live in a world that seems to be more connected than ever. From “smart” devices in our homes to the apps and devices we use to go about our daily lives at work or at play, it’s difficult not to be connected to the internet in some way. What do you see as some of the—maybe lesser-known—threats to data security in our everyday lives?

A: Every time you interact with a digital service, data is being collected about you. Every time you browse the internet, data is being collected about you. Even if you don’t click on something on a web page—if you scroll down and stop to look at something—that is being recorded. That data is being tracked and packaged and sold over and over again. There are lots of data profiles about you that are created to push advertisements and personal recommendations that make all of these different companies money.

I think every time users access an online service, they have to wonder: Who’s paying for this service? Google Maps is a great product, so why is it free? Facebook is free. WhatsApp is free. Economics has to make sense, so who’s paying for that? In 99.9% of cases, you’re paying through your data for these things.

Q: One privacy topic that often comes up is technology like Siri or Alexa, and the worry that they’re always listening to us, even when they shouldn’t be. Do you think that’s an actual concern?

A: The jury is still out on that for now. There’s some anecdotal evidence where someone might say, “Well, I said something and then an ad showed up for it.” As of right now, nobody has found any evidence of it happening, but there are a lot of people looking into it. So we don’t know for certain.

What we do know is that these devices get activated by what’s called a trigger word or wake word, like by saying, “Hey Alexa,” or “Hey Siri.” We know the devices are listening for those triggers, and when they detect it, they begin recording your speech. Now the privacy problems that can happen here are when the device gets triggered accidentally—if it mishears something and thinks it’s some of those trigger words. That happens because, at the end of the day, machine learning engines or models are powering these devices and they can make mistakes.

Now, the interesting thing is that these models are inconsistent. To train these models, you have to have the right data, and the datasets they use typically most heavily represent your average young, American speaker. So with someone like me, who doesn’t speak with a perfect American accent, it might trigger more. If you miss that in testing data, it can get normalized with certain demographics, which can increase the accidental triggering rates for these devices. If that happens, then they may suffer more privacy problems, because it may be recording more of their speech. For example, we’ve found some discrepancy with age, indicating it works better with younger people than other groups. So if it triggers more with older people, that means more of their speech goes into the cloud than they might think.

Q: AI is having a moment in the spotlight, spearheaded by the popular ChatGPT program. In early April, Italy banned ChatGPT, prompted by concerns from the Italian Data Protection Authority. Do programs like ChatGPT prompt new data privacy concerns, in your eyes, or are they a continuation of the same challenges we’re already facing in that space?

A: The biggest issue with ChatGPT is the interface. It’s a human-like interface that is very similar to the type you might use to chat with friends or colleagues. When you’re chatting with ChatGPT, it talks back to you like a human, using very humanlike English. Many people use it like a more sophisticated search engine to get results that are clean and condensed that they can use without having to dig through links.

In the old days—a few months ago!—to do some searches, you went to Google and typed in a query. You’d get all these results and you could see the source of the information. You could see that it might be a “.edu” or a “.gov” or some thread on Reddit. So in the back of your mind, you have some way to weigh the trustworthiness of the results.

Now, you can go to ChatGPT, and because it has this sort of interface, you’re more likely to share in ways that you wouldn’t with Google. In the moment, you might be thinking about how OpenAI uses what we put into ChatGPT to train this model, and that it might get leaked or someone might steal it. The second problem is that you don’t know where the model is pulling information from in response to your queries. It cannot tell you, “I got this piece of data from this URL,” because of the way it’s designed.

Q: So it’s not just about how the program’s user-facing interface looks, but that you can’t look behind the curtain to see what’s happening on the back end?

A: Exactly. Another problem with ChatGPT is the applications people are using it for. People use it to generate emails, texts and memos, to fix English—to do all these different things. That means that they’re sharing lots and lots of private data, and these are new dimensions to data security that simply did not exist a few years ago.

For instance, say you are a government employee or an employee at some company and you’re asked to solve some sensitive problem or something that has to do with proprietary information. If you ask that question to ChatGPT, now OpenAI has that question. Many people are using it to help with writing or cleaning code. They’re doing that for companies, which can be competitors to OpenAI, and they’re posting their code and asking ChatGPT for recommendations for particular tests without knowing what happens once the program has that information.

All of this can be very sensitive information, and people are more likely to believe anything ChatGPT says because the way it communicates is so humanlike. There’s a lot of research showing that when you have a system that communicates like a human, people are more prone to believing it, which is another problem.

Q: Do you see it as a problem we need to fix from the technological side? Or is it a matter of regulation and education and other societal efforts?

A: All of these things combined. You have to have technological safeguards. You have to have policy safeguards and education around the technology. We can’t just have a one-size-fits-all solution.

With technology safeguards, here is one example: Many instructors want to know if text was generated by ChatGPT, so they can tell if students are plagiarizing. It’s very hard to detect, and many people will start selling solutions they claim will tell you if a text was generated by ChatGPT or not. These systems will fail, because you can just paraphrase and change what the model gives you.

We have to take a multi-pronged approach. For example, the U.S. government can tell OpenAI that it needs to have these safeguards in place around this technology. Then we have to have some education in our departments and in our coursework to teach students how to interact with these systems responsibly. We have to adapt our course work to incorporate ChatGPT and similar tools.

For example, I teach one of the ECE capstone courses that covers programming in the Android platform. We have a set of minilabs for students to understand and get used to the Android APIs and programming environment. ChatGPT can solve these minilabs reasonably well. I cannot prevent the students from using the tool, but I can change the minilabs to have students generate boilerplate and template code using ChatGPT. Then, students can edit the generated code for correctness and incorporate it with other code modules. ECE is supporting us this summer to redesign some parts of this course to incorporate ChatGPT into it. We will explicitly ask students to prompt chatGPT, generate code, fix it, interface it with other modules, upload the initial prompt, the initial code, and the edited code.

Featured image caption: ECE Assistant Professor Kassem Fawaz, an expert on data privacy, talks to a student at UW-Madison.

Cookie Notice

Departments:

Focus Areas:

Categories

Focus on new faculty: George Tzimpragos is rebooting computer architecture

Focus on new faculty: Jennifer Volk aims to make the promise of unconventional devices a reality

New mechatronics course prepares students for high-demand engineering jobs