Session O-2P

Innovative and Interdisciplinary Uses of Data and Machine Learning

1:30 PM to 3:10 PM | CSE 305 | Moderated by Chetana Acharya


FairRL-FL: Reinforcement Learning for Fairer Federated Models
Presenter
  • Jack McFarland, Senior, Computer Science & Software Engineering Mary Gates Scholar
Mentors
  • Afra Mashhadi, Computing & Software Systems (Bothell Campus), UWB
  • Ekin Ugurel, Civil and Environmental Engineering
Session
  • CSE 305
  • 1:30 PM to 3:10 PM

FairRL-FL: Reinforcement Learning for Fairer Federated Modelsclose

Bias in Machine Learning (ML) can lead to unfair treatment of certain groups, particularly in areas like healthcare and finance, where disparate outcomes can have life-altering consequences. New training techniques aim to improve fairness while preserving privacy. Federated Learning (FL) is one such approach, allowing models to be trained on data from many devices without centralizing it. Instead of sharing raw data, each device trains a local model and sends model updates (adjustments based on its local data) to a central server, which aggregates them into a global model. This protects privacy while enabling large-scale training, but differences in data quality, representation, or access across devices can reinforce bias, leading to models that work well for some groups but poorly for others. This project tests whether a debiasing system can effectively mitigate bias in FL without sacrificing model performance. To tackle this, I'm adapting a Reinforcement Learning (RL) system, where an agent learns by interacting with an environment and receiving rewards for beneficial actions. The agent evaluates fairness using feedback from client devices and adjusts the central model’s weights before redistributing it for further training. Using fairness metrics and accuracy as its reward signal, the agent continuously refines its strategy, learning how to mitigate bias while preserving performance. I'm solely responsible for designing, building, testing, and analyzing this system, though I've benefited greatly from the guidance of my mentor, Dr. Afra Mashhadi, insights from her graduate students, and tools developed in prior research. Results from prior work suggest this method can reduce bias while maintaining strong model accuracy, highlighting its potential for improving fairness in FL systems. If successful, this approach could be applied in areas like medical diagnostics, risk assessment in insurance, and hiring algorithms, where biased models can lead to significant real-world harm.


Through Science Comes Art!
Presenter
  • G Alvarado, Senior, Computer Science, Pacific Lutheran University
Mentor
  • Renzhi Cao, Computer Science & Engineering
Session
  • CSE 305
  • 1:30 PM to 3:10 PM

Through Science Comes Art!close

Welcome all young and old to the future of movie magic! 2D animation remains a powerful storytelling medium, yet its resource-intensive nature has made it increasingly rare in today’s industry. What if we could change that? What if artificial intelligence (A.I) can work with, rather than against artists, making 2D animation more accessible? Could a small studio implement this and revive this beloved genre? Join international award-winning filmmaker G Alvarado as we explore cutting-edge image generation and video interpolation A.I models. Along with an enhanced 2D animation pipeline that preserves artistic integrity using customly trained models. Early findings suggest that this can significantly reduce production time, transforming what once took years into mere months. Come all far and near to see our research results in action and peek behind the curtain. For once you do, you will find through science comes art, and through innovation, a new era of storytelling begins!


From Silos to Solutions: Secure Synthetic Data Generation on the NAIRR
Presenter
  • Shane R (Shane) Menzies, Senior, Computer Science and Systems
Mentors
  • Martine De Cock, School of Engineering and Technology (Tacoma campus), UW Tacoma
  • Sikha Pentyala, School of Engineering and Technology (Tacoma campus), University of Washington Tacoma
Session
  • CSE 305
  • 1:30 PM to 3:10 PM

From Silos to Solutions: Secure Synthetic Data Generation on the NAIRRclose

Data is the fuel driving AI innovation. Much of the most valuable data is, however, siloed in research centers, hospitals, banks, etc. The onerous processes researchers must go through to access each silo cause a substantial underutilization of AI in many of the most important domains, including healthcare and genomics. AI researchers cannot train models for personalized medicine if they cannot get their hands on enough relevant patient data. One way to provide broader access for research while also retaining the privacy of the original data is with synthetic data generation (SDG), which uses machine learning to generate a set of synthetic data similar enough to the real data to retain its value for research while also anonymizing it. While in some cases a single data custodian (such as a hospital) alone may have enough data to train a generative model, usually, datasets from multiple custodians need to be combined to reach a cumulative size that enables meaningful AI research. The latter is, for example, often the case for rare diseases, with each clinical site having data for only a small number of patients, which is insufficient to train high-quality synthetic data generators. The goal of my research is to generate synthetic genomics data of patients with Neurofibromatosis type 1, a rare genetic condition that causes changes in skin pigment and tumors on nerve tissue. Thanks to our Privacy-Preserving Machine Learning Lab’s inclusion in the National Artificial Intelligence Research Resource (NAIRR) Pilot and our collaboration with Sage Bionetworks, I have access to the TACC Frontera supercomputer at the University of Texas and multiple sets of NF1 patient data. Results of my work on the NAIRR include an empirical evaluation of cross-silo federated SDG algorithms in terms of quality of the generated NF1 data, computational cost, and level of privacy protection.


Robust Robotic Behavior Cloning via Learned Dynamics, Neighbor Combination, and Action Chunks with Confidence
Presenter
  • Quinn Pfeifer, Senior, Computer Science
Mentor
  • Siddhartha Srinivasa, Computer Science & Engineering
Session
  • CSE 305
  • 1:30 PM to 3:10 PM

Robust Robotic Behavior Cloning via Learned Dynamics, Neighbor Combination, and Action Chunks with Confidenceclose

The most compelling challenges in robotic behavior cloning arise when agents must perform precise, complex tasks - especially those that challenge even the best human expert demonstrators. The key question here is as follows: how can we most optimally utilize human-collected demonstration data in such domains? There are many ways to tackle the issue of robust, data-efficient robotic behavior cloning; we explore three: leveraging learned system dynamics to generate synthetic corrective data with assumed Lipschitz continuity, exploiting local structure by utilizing a cloud of distance-aware neighboring data points and their predicted actions, and ensembling past predicted action trajectories conditioned on their confidence to produce ideal, outlier-robust actions and even predict when an agent needs guidance and correction from a human expert. The first of the three has already been published as a series of works under the acronym CCIL and have shown large improvements in both simulated imitation tasks and real-world robotic fine manipulation, showing particular promise in low-data regimes. The latter two are ongoing research projects; the first, utilizing local neighborhood information, has shown promising results on simulated tasks, and work to transfer this algorithm to the real world is currently under development. The final of the three has shown promise on real-world robotic tasks as an out-of-distribution detector and confidence measurement tool, and research is underway to utilize this information for the purposes of robustness and corrective data collection. The projects and their findings all contribute towards the common goal of optimizing data usage in a robotic behavior-cloning paradigm, opening the door for robots to complete more and more complex and data-scarce tasks performed by humans.


Transcribing in Context: Evaluating Biases in English Phoneme Transcription
Presenters
  • Aruna Srivastava, Senior, Computer Science
  • Alexander Le (Alex) Metzger, Senior, Mathematics, Computer Science
  • Ruslan Mukhamedvaleev, Junior, Computer Science, University of Washington
Mentors
  • Jian Zhu, Linguistics, University of British Columbia
  • S. M. Farhan Samir, Computer Science & Engineering
Session
  • CSE 305
  • 1:30 PM to 3:10 PM

Transcribing in Context: Evaluating Biases in English Phoneme Transcriptionclose

Speech technology is often evaluated under idealized conditions that privilege certain speaker profiles: native English speakers in optimal acoustic environments. This approach overlooks the reality that English, as a global lingua franca, is spoken by billions of non-native speakers. Similarly, speakers with speech disorders face potential exclusion. Accurate phonemic transcription is crucial both for analyzing speech patterns in post-stroke aphasia and Computer-Assisted Pronunciation Training (CAPT). We evaluate automatic phonemic transcription under realistic conditions, including varied noise levels, L2 accents, and speech variations. We find that standard models perform suboptimal under realistic conditions, and that applying vocabulary refinement and data augmentation improves error rates by 12-28 percentage points. To demonstrate the viability of our phonemic transcription models, we develop Machine Aided Pronunciation Learning via Entertainment (MAPLE). MAPLE maintains real-time performance on consumer devices, demonstrating the practical applicability of robust socioculturally-aware phonemic transcription in educational environments.


Enhancing Particle Behavior Analysis through Deep Learning in Biological Multiple Particle Tracking
Presenter
  • Ali Toghani, Senior, Computer Science UW Honors Program
Mentor
  • Elizabeth Nance, Chemical Engineering
Session
  • CSE 305
  • 1:30 PM to 3:10 PM

Enhancing Particle Behavior Analysis through Deep Learning in Biological Multiple Particle Trackingclose

Multiple Particle Tracking (MPT) is a powerful technique for studying microscopic particles, such as viruses and nanoparticles, by tracking individual displacement and movement. One application of MPT is to measure microstructural changes in the brain extracellular environment (ECM) in development, aging, and disease progression. MPT of nanoparticle probes generates thousands of trajectories, from which geometric features, diffusion coefficients, and viscosities can be extracted. The vast array of trajectories presents an opportunity for deep learning models to uncover meaningful insights. However, to enable MPT data to be trainable and predictable by deep learning models, we need to curate the data to be useable by these models. To enable this, I have created a database and developed a data architecture that would allow MPT data to be useable within deep learning models. Building upon this foundation, I am currently working on creating a Self-supervised deep learning model utilizing equivariant graph neural network, equivariant transformer, and Explainable AI methods. The current iteration of this model can predict a masked point of a trajectory with a 34% error rate. The goal is to reduce this error to 10% and, more importantly, to differentiate between healthy and pathological trajectories. To achieve this, we will use Saliency Maps, an Explainable AI method, to understand how the model distinguishes between these two datasets. This approach will provide insights into which part of the trajectory the model finds most relevant. My hypothesis is that the model can effectively learn to distinguish between healthy and pathological trajectories based on the trajectory properties with an error rate of 10%. I will verify my model by modifying the trained model’s output layer to explicitly classify trajectories as healthy or pathological. By fine-tuning this model, we will evaluate performance using error metric, which I will further validate using Saliency Map visualizations.


The University of Washington is committed to providing access and accommodation in its services, programs, and activities. To make a request connected to a disability or health condition contact the Office of Undergraduate Research at undergradresearch@uw.edu or the Disability Services Office at least ten days in advance.