Agent-based models of language change
In my PhD project (start: 2019), I use agent-based models to study the social, cognitive and language-specific factors that drive language change. I apply my models to case studies from the languages of the world. I am currently studying the relationship between morphological simplification and Papuan-Austronesian language contact in Eastern Indonesia. In addition to agent-based models, I draw upon techniques from deep learning and bioinformatics. My research is supported by an FWO PhD Fellowship Identifying drivers of language change using neural agent-based models. I was previously funded by the Flanders AI Programme.
In a master’s course project (2016-2017), I developed a, more language-specific, agent model of historical change of the genitive in Germanic languages [pdf] [code], using the Icelandic saga corpus , and modelling language contact between Scandinavian and Middle Low German.
Computational historical linguistics
In my MSc thesis (supervisors: Gerhard Jäger and Jelle Zuidema), and a subsequent journal article, I applied the machine learning paradigm to reconstruct language ancestry. I proposed the task of word prediction: by training a machine learning model on pairs of words in two languages, it learns sound correspondences between the two languages and should be able to predict unseen words. I used two neural network models, an encoder-decoder and a structured perceptron. By performing the task of word prediction, results for multiple tasks in historical linguistics can be obtained, such as phylogenetic tree reconstruction, identification of sound correspondences and cognate detection.
- Journal of Language Modelling Paper Word prediction in computational historical linguistics
- MSc thesis
- Interactive notebook on Github, thesis source code on Bitbucket
- Blog post on ILLC CLC lab web page
In my BSc thesis (supervisors: Alexis Dimitriadis and Martin Everaert) [pdf], I used bayesian inference to create a kinship tree of Dutch dialects, using data from the Reeks Nederlandse Dialectatlassen. The words were aligned and converted to phonetic features, in order to be processed by a bayesian inference algorithm.
Other projects: Linguistic data collection and processing
Crowdsourcing, asking lay people to perform a task, can be a powerful tool to collect data on language in use. At the Dutch Language Institute (INT), I developed the platform Taalradar (language radar), to ask speakers about their attitude towards neologisms and to chart Dutch language variation.
Furthermore, I was involved in tools to process linguistic data. I developed deep neural models for linguistic enrichment (POS tagging and lemmatization) of historical text. I developed the web interface for DiaMaNT, a diachronic semantic lexicon of the Dutch language. I was involved in the software development of CLARIAH Chaining Search, a Python library and Jupyter notebook that facilitates combined search in lexica, corpora and treebanks, and combination and processing of results. I contributed to TREC OpenSearch, a platform developed at, among others, the University of Amsterdam to evaluate academic search engine algorithms on real users.