News
For his senior thesis, Gabe Wu analyzed a new family of methods for analyzing loss in neural networks
Fourth-year computer science concentrators at the Harvard John A. Paulson School of Engineering and Applied Sciences (SEAS) have the option to write senior theses. Often taken for course credit through “CS91R: Supervised Reading and Research,” their theses seek to contribute to the general understanding of some problems within computer science.
Deduction-Projection Estimators for Understanding Neural Networks
Gabe Wu, A.B. '25, Computer Science
Advisor: Sitan Chen
• Please give a brief summary of your project.
For my project, I analyzed a new family of methods that may be useful for analyzing neural networks, which I call "Deduction-Projection Estimators." Essentially, this is a family of methods for estimating the loss of a neural network that, unlike traditional methods, don't rely on random sampling. Instead, they involve producing a deductive account of why the neural net gets the loss that it does. Their deductive nature allows DPEs to deal with an AI model on the level of underlying reasons instead of observed behaviors, similar to how neuroscience attempts to explain phenomena in humans on a more fundamental level than psychology. This makes them useful for estimating certain quantities that traditional sampling-based methods fail at. For example, if an AI model performs a catastrophic action very rarely (say, on one in a billion inputs), then estimating this failure rate using a sampling-based method would require running the model at least a billion times. In contrast, a DPE can estimate this probability much more efficiently by extrapolating a distribution that approximates the model's internal reasoning.
• How did you come up with this idea for your final project?
Most of the ideas in my thesis come from the research I've done at the Alignment Research Center in Berkeley, Calif. ARC is interested in deductive estimators in general as a tool that can be used to prevent worst-case behaviors from AI systems.
• What real-world challenge does this project address?
Deduction-Projection Estimators (DPEs) may be used to solve problems in AI alignment – that is, how do we get AI systems to continue to act as their developers intend, even when they become much smarter than humans? We currently do not know how to solve the AI alignment problem, but we urgently need a solution given the recent pace of progress in AI capabilities.
• What was the timeline of your project?
Most of my research was done over the past summer when I was working at the Alignment Research Center. I spent the past two semesters writing up the research and running additional experiments.
• What part of the project proved the most challenging?
The ideation part was the most challenging – designing DPEs that beat out traditional baselines on tasks. Many of my original ideas failed.
• What did you learn, or skills did you gain, through this project?
I learned a lot about how to do machine learning research: how to write ML training infrastructure, how to analyze experimental results, how to proceed when experiments fail, and how to write and submit a paper to a conference.
• What part of the project did you enjoy the most
My favorite part was writing the entire project up at the end! I appreciate how the challenge of communicating a complex idea to a broader audience forced me to think through my logic more carefully and make my thoughts more precise.
Topics: Academics, AI / Machine Learning, Computer Science
Cutting-edge science delivered direct to your inbox.
Join the Harvard SEAS mailing list.
Press Contact
Matt Goisman | mgoisman@g.harvard.edu