Research

Contextualized Language Models

With the recent advances and success of contextualized language models (LMs) in NLP, there has also been a surge of interest in applying such models for investigations into historical language change. Yet, while these models have been successfully applied to research on lexical semantic change, i.e., changes in word meaning over time, it remains unclear as to how the models can be leveraged for historical linguistic research more broadly, e.g., for investigations into syntactic change, i.e., changes involving more functional language structures. Therefore, my recent research aims at shedding light on the usability of LMs for historical linguistic research. For one, I have developed DiaSense (Beck 2020), a system which is able to model word sense changes over time via pre-trained BERT embeddings. For another, together with my colleagues (Sevastjanova et al. 2021, Kalouli et al. 2022), I have investigated the limitations of LMs in terms of what they really learn, i.e., which kind of linguistic information is captured by their contextualized embeddings. This in turn plays into finding whether LMs such as BERT can be employed for investigations into syntactic change. More precisely, we investigated whether LMs are able to model the functional nature of function words adequately by examining how function words are contextualized, i.e., captured in the embeddings, and how (or if) functionality is learned during training. To this end, we have also developed a visualization system, LMFingerprints (Sevastjanova et al. 2022; image below taken from paper), facilitating our investigations.

Image
LMFingerprints


Dative Subjects: Historical Change Visualized

My thesis 'Dative Subjects: Historical Change Visualized' investigates the historical development of dative subjects in Icelandic using corpus and visual analytic methods in order to provide a window of understanding of the complex interrelation between case marking, word order, lexical semantics, and grammatical relations in the diachrony of the language. Moreover, I provide a theoretical analysis for my empirical findings by developing a novel linking theory based on the Lexical Mapping Theory, the linking theory established within the Lexical-Functional Grammar framework.

Image
Linking scheme for the experiencer predicate finnast 'find, feel'


Evaluation Metrics for Visual Analytics in Linguistics

Currently, I work in the DFG-funded research project 'Evaluation Metrics for Visual Analytics in Linguistics' which is embedded in the collaborative research center SFB-TRR 161 'Quantitative Methods for Visual Computing'. The overall aim of this project is to evaluate whether visual analytics represents a methodology that can yield improved results for linguistic research and to establish metrics for the evaluation of visual analytics within linguistics. In order to do so, we develop novel visualization techniques for the analysis of linguistic data and conduct linguistically motivated case studies using the visualizations.

Our most recent innovation is the HistoBankVis visualization system (Schätzle et al. 2017, Schätzle et al. 2019) which was developed for the investigation of diachronic interactions contained in multidimensional linguistic data. For more general updates on the research conducted within SFB-TRR 161, visit our Visual Computing Blog , where I am a regular author.

Image

HistoBankVis

Visual Analysis of Language Change and Use Patterns

I was part of a DFG-funded project on the visualization of language change and language use. The aim of this project was to combine new visualization methods from computer science with methods coming from historical and computational linguistics. Goal of the project was to push the state-of-the-art in linguistic data analysis on a qualitative as well as on a quantitative level. The project focused on a diverse set of data and phenomena related to language change, language genealogy and variation in language across time in order to gain insight into interrelations between complex data sets across time and across languages.

One of the visualizations developed within this project is the glyph visualization (Butt et al. 2014, Schätzle & Sacha 2016) shown on the right.
Image

Glyph visualization