Nora Graves has always looked at language as a fascinating puzzle to piece together.
Growing up in Wayne, Pennsylvania, she developed a passion for linguistic problem-solving — first in English, then in German, inspired by her grandmother’s fluency in the language and by a babysitter’s move to Austria.
Graves also worked on coding projects with her father, a computer scientist. At Princeton, she found a way to combine the two. She graduated in May with a major in computer science and minors in linguistics and German. “Language is probably more computational than people realize,” she said.
Her interdisciplinary senior thesis compared two automated approaches to adding accent marks, or diacritics, to text — in this case in the Yorùbá language of Nigeria. She wondered which approach worked better: today’s large language models or statistical methods that have largely fallen out of use.
She was co-advised by Laura Kalin, an associate professor of linguistics and associate director of the Program in Linguistics, and Christiane Fellbaum, a lecturer with the rank of professor in linguistics who helped develop the groundbreaking WordNet, a critical step in the modern AI revolution.
From “undecided” to “interdisciplinary”
Graves arrived at Princeton undecided on a major, so like any puzzler, she started from the edges. She had studied German since middle school and continued at Princeton, including participating in the Princeton in Vienna program the summer before her sophomore year.
In addition to pursuing courses in in computer science and linguistics, she translated work from German to English for a literary translation class with the novelist and translator Jenny McPhee, a lecturer in creative writing and the Lewis Center for the Arts.
“I really loved the puzzle aspect, learning more about languages and how they work,” Graves said.
For her junior independent work, she built a data visualizer to analyze sounds in children’s alphabet books, advised by Brian Kernighan, Princeton’s William O. Baker *39 Professor of Computer Science.
Her father, a Class of 1993 alum, had also been Kernighan’s student.
Statistical models vs. LLMs
Much of digitized Yorùbá text lacks diacritics because typing them out on an English keyboard can be inconvenient and even impossible. This complicates the language’s ability to be understood by non-native speakers and machine learning systems, contributing to a “technology gap,” Graves said.
A missing diacritic for a tonal pitch in Yorùbá, she said, can “change the word for a type of vegetable to a season of the year.”
Statistical methods for automatically adding diacritic marks to text had shown similar and sometimes superior results to LLMs in earlier studies. But as AI and deep learning advanced, the statistical methods had fallen out of favor in the field of “diacritic restoration.”
In her thesis research, Graves ran 43,470 sentences of Yorùbá text through both a syllable-based statistical model and a sequence-to-sequence LLM, repeating the test several times. The statistical model proved to be more accurate, while the LLM had a higher error and hallucination rate.

Accurate diacritic restoration is essential, Fellbaum said, because “once you have an unambiguous way of representing the speech in writing,” applications including machine translation, audio transcription and question answering become possible. “It’s super important that we explore those data sets, and find and create more.”
Kalin said Graves was undeterred by the project’s challenges and taught herself the coding skills she needed to train the large language model on Yorùbá text. Graves would explain the computational dimensions of the work to Kalin at their weekly meetings.
Kalin called Graves’ project “a perfect example of how a Princeton student can have an idea and cobble together the support needed to bring this idea to life” — in this case co-advisers from two distinct disciplines. “Her interests are in computer science and linguistics, so she was clearly always making connections between the two.”
“I think it’s really wonderful how Princeton makes it possible for students to find their own path this way,” she said. “You don’t have to just do one thing; you can have a new idea, and Princeton will then help you scaffold that.”
Graves said she was grateful for Princeton’s support in her cross-disciplinary scholarship and for its strong programs in both linguistics and computer science. “This is the place where you can find the best of both worlds,” she said.
She plans to spend the summer refining her research to eliminate hallucinations in the large language model to see if it might then prevail over the statistical method. She will start work as a project manager in September and hopes to one day pursue a master’s program in computational linguistics.




