Nowadays, the sheer amount of data collected from edge devices such as mobile phones and self-driving vehicles is beginning to overwhelm traditional centralized data analytics regimes where data from the edge is continuously uploaded to a central server to be processed. Excessive communication traffic from data upload, significant central server storage needs, energy expenditures from centralized learning of big data models, and privacy concerns from sharing raw data are becoming critical challenges in centralized systems. Fortunately, a critical change is happening in today’s Internet of Things (IoT). The processing and computational power of edge devices is becoming increasingly powerful. AI chips are rapidly infiltrating the global market. As such, we now have the opportunity to process more of our data where it is created – i.e., at the edge. This decentralized data analytics paradigm is often coined as federated data analytics (FDA). FDA resolves many of the aforementioned drawbacks. By exploiting edge computations, one can parallelize inference, reduce storage and communication costs, achieve faster alerts and decisions, and protect privacy, amongst many others. Meanwhile, FDA, as an emerging technology, poses significant intellectual challenges.
To name a few: (1) current FDA literature mainly focuses on deep neural networks solved via empirical risk minimization (ERM). This is understandable as FDA has been predominantly explored within mobile applications. However, in engineering settings, data collected is often highly correlated. Besides that, statistical questions such as variable selection, uncertainty quantification, hypothesis testing, and incorporating domain expert knowledge remain unanswered in FDA. (2) In engineering applications, edge devices often collect local datasets that differ in both size and distribution. As such, personalization becomes critical as there is no single model that can perform well on all edge devices. (3) IoT systems can raise bias and fairness concerns. Devices with insufficient amounts of data, limited bandwidth, or unreliable internet connection are not favored by conventional training algorithms.
In this talk, I try to address some of these fundamental challenges. First, I will present a framework – GIFAIR – that imposes group and individual fairness to the FDA setting. By adding a regularization term, GIFAIR penalizes the spread in the loss of groups to drive the optimizer to a fair solution. Second, I will present a federated treatment for the Gaussian process that brings FDA beyond ERM and to correlated settings. Our model can naturally handle statistical heterogeneity and provide a personalized solution to each edge device. We also present the first theoretical results of the FDA beyond the ERM paradigm. The talk concludes with some interesting future directions.
Bio: Xubo Yue is a Ph.D. candidate in the Department of Industrial & Operations Engineering at the University of Michigan. His research focuses on federated and distributed data analytics. Currently, he is developing federated data analytics methods that rethink how both prescriptive and predictive analytics are done within IoT-enabled systems, specifically manufacturing. He has received several best paper awards from the Institute for Operations Research and the Management Sciences (INFORMS), the Institute of Industrial and Systems Engineers (IISE), and other renowned organizations.