SEAS teams win Best Paper, honorable mentions at international conference on Human-Computer Interaction

This week, a group of computer scientists from the Harvard John A. Paulson School of Engineering and Applied Sciences (SEAS) flew to sunny Honolulu, Hawaiʻi for the Association of Computing Machinery (ACM) CHI conference on Human Factors in Computing Systems, one of the most important international conferences on Human-Computer Interaction.

There they presented papers on a range of topics, including the use of chatbots in LGBTQ+ communities, algorithms for social services, AI for text summarization and more. Among the hundreds of papers selected for the conference, 40 received awards for Best Paper, including research led by Priyan Vaithilingam, a PhD candidate in Assistant Professor Elena Glassman’s Variation Lab. 

The paper, primarily advised by Microsoft researchers Jeevana Priya Inala and Chenglong Wang during Priyan’s summer internship, describes a new program that allows users to create and edit data visualizations using natural language commands.

“A lot of people who aren't experts in data visualization are still required to create and edit visualizations for their work,” said Vaithilingam. “Current software tools such as Tableau and Microsoft’s PowerBI and Charticulator have reduced the effort it takes to create visualizations, but those tools provide a one-size-fits-all solution to millions of users with different needs and workflows. Our goal was to provide personalized tools to users.”

The program, named Dynavis, uses large-language models, including Chat GPT4, to generate personalized user interfaces by automatically creating widgets on the fly based on the user’s needs. This AI-generated dynamic UI can be added to any existing software tools.

Imagine, for example, you are creating a chart on Tableau to track sales over time and want to tweak the label on the X-axis. Today, a user would need to know exactly where to click in the complicated editing window to make the change. 

With Dynavis, the user would either type “rotate x-axis label 45 degrees” or simply ask for a widget to perform the edits with, for example: “give me a slider to control x-axis label angles”.

“Instead of a static user interface, users can now have dynamic and contextual UI for their tasks,” said Glassman. “This research could allow us to build software that is malleable and contextual truly from the ground up — re-defining user experience completely.”

Two other papers by members of Glassman’s team received honorable mentions at the conference. 

In “ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing,” the research team, which included Glassman, former Variation Lab postdoctoral fellow Ian Arawjo,  PhD students Chelse Swoopes and Priyan Vaithilingam, and Martin Wattenberg, Gordon McKay Professor of Computer Science at SEAS, described a new program to support people testing large language models (LLM).

“Evaluating the outputs of large language models is challenging, requiring making — and making sense of — many responses,” said Arawjo, who is currently an Assistant Professor at the University of Montréal.  “But tools that go beyond basic prompting tend to require advanced knowledge that most users interested in using LLMs for general tasks don’t have. ”

To address that gap, the researchers developed an open-source tool called ChainForge and deployed it to get feedback from real users. The team used that feedback — in addition to feedback from in-lab user studies — to iterate and refine the program.  

“Our goal is to support people in testing large language model behavior on the tasks that matter the most to them,” said Arawjo.

LLMs, as demonstrated by ChainForge, sometimes generate a lot of responses that can feel like a wall of text, but have a lot of utility for system designers, and even end-users. In “Supporting Sensemaking of Large Language Model Outputs at Scale,” authors Glassman, postdoctoral fellow Katy Ilonka Gero, PhD students Chelse Swoopes and Ziwei Gu and collaborator Jonathan Kummerfeld, of the University of Sydney, describe a new program that can present many LLM responses at once in a way that is easier for the human brain to parse. 

“Large language models are capable of generating multiple different responses to a single prompt, yet little effort has been expended to help end-users or system designers make use of this capability,” said Swoopes. “For instance, users may want to select the best option from among many responses or compose their own response by combining a few different LLM responses or audit a model by looking at the variety of possible responses.”

The researchers began by interviewing designers, model characterizers, and model auditors and found that the methods currently available for comparing dozens of responses were slow and painstaking. 

To address that challenge, the researchers designed and implemented text analysis algorithms and rendering techniques to capture possible variations and consistencies of LLM responses. The algorithm highlights similar text across LLM responses and aligns and grays out redundancies so users can easily compare responses. 

“As LLMs are increasingly adopted, supporting end-users, system developers, and system examiners in making sense of LLM responses is becoming an increasingly important area of study,” said Swoopes.

Topics: AI / Machine Learning, Computer Science

Scientist Profiles

Press Contact

Leah Burrows | 617-496-1351 |