**Virtual Master Class in Quantitative Linguistics**

**University of Tartu, Estonia**

**May 4 - May 7, 2020**

**Virtual Master Class in Quantitative Linguistics**

**University of Tartu, Estonia**

**May 4 - May 7, 2020**

The goal of this course is to familiarize students with a range of regression techniques that are available for the analysis of one response variable (e.g., reaction time, or pupil dilation, pitch, accuracy) that is to be modeled as a function of one or more predictors. Modeling techniques will be introduced conceptually, and emphasis will be on worked examples of their application. Basic knowledge of regression (the lm function in R) is presupposed.

The first session addresses the issue of collinearity in multiple regression, addressing the question of how to analyse data with strongly correlated predictors. The second session provides worked examples of how to address collinearity issues in data analysis. The third session introduces the generalized additive model (GAM), which relaxes the assumption that the functional relation between the response and one or more predictors is linear. It is ideal for modeling wiggly curves and wiggly (hyper)surfaces.

The fourth session illustrates how GAMs can be used by working through the analysis of a dataset with auditory lexical decision latencies. The fifth session discusses model criticism and introduces tools for dealing with model residuals that are not identically and independently distributed. This is followed by a session in which GAMs are used to analyse tone contours in Mandarin Chinese, and a session illustrating how GAMs can be used in dialectometry. The example that will be worked out concerns geographical variation in deretroflexion in Taiwan Mandarin.

The final session in the statistics series introduces Quantile Regression with GAMs. QGAMs make it possible to clarify whether predictors have effects that differ across the distribution of a response variable. Thus, quantile regression can tease apart which factors dominate, e.g., short acoustic durations, or short reaction times, and which are specifically influential for long durations or long reaction times. QGAMs are also very useful for datasets where GAMs cannot be used due to residuals resisting correction to normality.

**References**

(preprints available at http://www.sfs.uni-tuebingen.de/~hbaayen/publications.html)

**Collinearity**

Tomaschek, F., Hendrix, P., and Baayen, R. H. (2018). Strategies for addressing collinearity in multivariate linguistic data. *Journal of Phonetics*, 71, 249-267.

**Generalized Additive Model**

Baayen, R. H., and Divjak. D. (2017). Ordinal GAMMs: A New Window on Human Ratings. In Makarova, A., Dickey, S. M., and Divjak, D. (Eds.) *Each Venture a New Beginning. Studies in Honor of Laura A. Janda*. Bloomington, Slavica, 39-56.

Baayen, R. H., and Linke, M. (in press). An introduction to the generalized additive model. In Gries, S. Th. and M. Paquot (Eds.) *A practical handbook of corpus linguistics*. Springer, Berlin.

Baayen, R. H., Vasishth, S., Kliegl, R., and Bates, D. (2017). The cave of Shadows. Addressing the human factor with generalized additive mixed models. *Journal of Memory and Language*, 206 - 234.

Chuang, Y-Y., Fon, J., and Baayen, R. H. (2020). Analyzing phonetic data with generalized additive mixed models. PsyArXiv, February 28, 1-27.

Naive Discriminative Learning (NDL) and Linear Discriminative Learning (LDL) are computational implementations of central ideas of discriminative linguistics, a theory of language that is under development at the quantitative linguistics lab of the University of Tübingen. Instead of grounding language in a compositional calculus defined over phonemes and morphemes, discrimination, not composition, is taken to be fundamental to language and language processing. Discrimination is achieved through error-driven learning, with constant recalibration as experience accumulates over the lifetime. Mathematically, the core of the theory is equivalent to multivariate multiple linear regression.

The first session of this part of the course introduces the basic concepts of discriminative learning, and illustrates how the ndl and WpmWithLdl packages can be used for building computational models. The second session illustrates how the WpmWithLdl package can be used to understand the semantics of auditory nonwords (starting with the audio files of these nonwords as input). The third session of the course will illustrate, step by step, how one can model the case inflection of Estonian nouns without needing to define morphemes, stems, and inflectional classes.

**References**

(preprints available at http://www.sfs.uni-tuebingen.de/~hbaayen/publications.html)

Baayen, R. H., Milin, P., Filipovic Durdevic, D., Hendrix, P., and Marelli, M. (2011). An amorphous model for morphological processing in visual comprehension based on naive discriminative learning. Psychological Review, 118, 438-482.

Baayen, R. H., Chuang, Y. Y., and Blevins, J. P. (2018). Inflectional morphology with linear mappings. The Mental Lexicon, 13 (2), 232-270.

Baayen, R. H., Chuang, Y. Y., Shafaei-Bajestan, E., and Blevins, J. P. (2019). The discriminative lexicon: A unified computational model for the lexicon and lexical processing in comprehension and production grounded not in (de)composition but in linear discriminative learning. Complexity, 2019, 1-39.

Baayen, R. H., and Smolka, E. (2020). Modeling morphological priming in German with naive discriminative learning. *Frontiers in Communication*, section Language Sciences, 1-40.

Chuang, Y.-Y., Lõo, K., Blevins, J. P., and Baayen, R. H. (in press). Estonian case inflection made simple. A case study in Word and Paradigm morphology with Linear Discriminative Learning. In Körtvélyessy, L., and Štekauer, P. (Eds.) *Complex Words: Advances in Morphology*, 1-19.

Chuang, Y-Y., Vollmer, M-l., Shafaei-Bajestan, E., Gahl, S., Hendrix, P., and Baayen, R. H. (in press). The processing of pseudoword form and meaning in production and comprehension: A computational modeling approach using Linear Discriminative Learning. *Behavior Research Methods*.

Tomaschek, F., Plag, I., Ernestus, M., and Baayen, R. H. (2019). Modeling the duration of word-final s in English with Naive Discriminative Learning. *Journal of Linguistics*, 1-38.