There’s probably not a single element of daily life that machine learning won’t influence in the coming decades.
“Machine learning for data-driven policy making and decision making is becoming ubiquitous, and it has a huge impact on society and people,” says Ramya Vinayak, who joined the University of Wisconsin-Madison’s Department of Electrical and Computer Engineering in August 2020 as an assistant professor. She adds, “It is imperative for us to understand when machine learning algorithms work and when they do not.”
Previously, she worked as a postdoctoral researcher at the University of Washington after earning her bachelor’s degree at the Indian Institute of Technology in Madras and her master’s and PhD degrees from Caltech.
Vinayak’s research spans machine learning, statistical inference and crowdsourcing. She is interested in both developing fundamental theoretical understanding of machine learning algorithms as well as practical applications in areas such as social sciences, medicine and public health in collaboration with domain experts.
At a high level, the role of machine learning algorithms in the decision-making process is to take data as input and provide inferences as output that can be used by domain experts. But a machine learning algorithm is only as good as the data used to train it. In many cases the images, tags and information in data sets contain the biases and errors of the people who produced them, passing along the same issues to the algorithms.
Data gets collected in many different ways, from observational data where there is limited control on what data is available to surveys where researchers get to design the questions. Vinayak is intrigued by the unique sets of challenges and opportunities they present in learning from data.
She is interested in finding tools to make limited data more usable. For instance, she says, epidemiologists may consider whether it makes sense to offer free flu vaccines to a community based on previously observed data. However, they may have access to limited information on a population, like whether individuals caught the flu over the previous five years, which is not enough data to accurately predict someone’s risk of getting sick.
At the same time, just calculating an average of how many people in a population might contract the flu ignores important factors like age, race, gender. However, Vinayak says that by using the right statistical inference tools it is possible to make informed decisions at the population level while still capturing the variations due to diverse sets of people.
In many applications the data used to train algorithms is labeled via crowdsourcing, which is not always accurate because it relies upon non-experts. Vinayak’s research has also focused on designing queries and algorithms that can obtain quality data from non-experts. For instance, she says, it’s easy for people to say if a picture is of a cat or of a dog. But they may not be able to label the breed of a dog in a picture.
However, adding tweaks to the system—like posting two images of dogs and asking whether the labeler thinks they are the same breed—could be a more manageable task and lead to higher-quality data. “You have to think about how to design these questions,” she says. “How can we leverage those and get the information that we want?”
“In general, my research vision is to develop theoretically grounded machine learning tools to make reliable inferences using data that comes from people,” she says.
UW-Madison is a great place to pursue this research, Vinayak says, not only because of the machine learning expertise in the ECE and computer science departments, but because there are so many other opportunities for collaboration. “People are looking at new research questions in all sorts of domains, including medicine, public health, public policy and social sciences,” she says. “The scope for building collaborations is large. The machine learning tools are out there, and they’re being used for a lot of applications that touch people’s lives. It is therefore important to ensure we have a solid understanding of the reliability and limitations of these algorithms.”