IWCLUL 2019
Fifth International Workshop on Computational Linguistics for Uralic Languages is organised by ACL SIGUR (and University of Tartu) on January 7-8, 2019, Tartu, Estonia
The final proceedings version will be available in the ACL SIGUR section of ACL anthology.
Program
7.1.2019
8:30 Registration / Check-in / Coffee (self paid)
9:00 Welcome opening speech
9:15 Invited speech: Mans Hulden. Linguistics with Black Boxes
10:15 Miikka Silfverberg and Francis Tyers. Data-Driven Morphological Analysis of Nominal Morphology for Uralic Languages
10:45 Stig-Arne Grönroos, Sami Virpioja and Mikko Kurimo. North Sámi morphological segmentation with low-resource semi-supervised sequence labeling
11:15 Poster boasters
11:45 Lunch (self-paid)
13:00 Linda Wiechetek, Sjur Nørstebø Moshagen and Thomas Omma. Is this the end? Two-step tokenization of sentence boundaries
13:30 Johannes Huber and Myra Spiliopoulou. Learning multilingual topics through aspect extraction from monolingual texts
14:00 Valts Ernštreits. Electronical resources for Livonian
14:30 Kristian Kankainen. The use of Extract Morphology for Automatic Generation of Language Technology for Votic
15:00 Posters and demos / Coffee (self paid)
Noémi Ligeti-Nagy, Andrea Dömötör and Noémi Vadász. What does the Nom say? An algorithm for case disambiguation in Hungarian
Frankie Robertson. A Contrastive Evaluation of Word Sense Disambiguation Systems for Finnish
Kadri Muischnek and Liisi Torga. Elliptical Constructions in Estonian UD Treebank
Kimmo Kettunen. FiST – towards a free Semantic Tagger of modernstandard Finnish
Niko Partanen and Michael Rießler. An OCR system for the Unified Northern Alphabet
Joshua Wilbur. ELAN as a search engine for Freiburg-style tagged corpora
Tommi A Pirinen. Neural and rule-based Finnish NLP models - expectations, experiments and experiences
Timofey Arkhangelskiy, Anne Ferger and Hanna Hedeland. Uralic multimedia corpora: ISO/TEI corpus data in the project INEL
Timofey Arkhangelskiy. Corpora of social media in minority Uralic languages
19:00 Social dinner (self-paid)
8.1. 2019
10:00 ACL SIGUR "business" meeting
11:00 Tutorials and hands on
Tutorials:
Common voice + DeepSpeech
Invited speaker
Mans Hulden, University of Colorado in Boulder
Abstract:
I will discuss specific experiments with neural models and sound embeddings that reveal new information about the organization of sound systems in human languages, give us insight into the limits of complexity of word-formation, give us models of why and when irregular forms - surely an inefficiency in a communication system - can persist over long periods of time, and reveal what the boundaries of pattern learning is (how much information do we minimally need to learn a grammatical aspect of language such as its word inflection or sentence formation).
Submission
Via Easychair: https://easychair.org/conferences/?conf=iwclul2019
Important dates
- 29 June 2018: Call for papers announced
- 1st October 2018: 2nd call for papers
- 12th 21st November 2018: Paper submission deadline
- 6th December 2018: Paper notification
- 21st December 2018: Camera-ready deadline
- 7th–8th January 2019: Workshop held in Tartu
Call for papers
The purpose of the conference series International Workshop on Computational Linguistics for Uralic Languages is to bring together researchers working on computational approaches to working with these languages. We accept long and short papers as well as tutorial proposals working on the following languages: Finnish, Hungarian, Estonian, Võro, the Sámi languages, Komi (Zyrian, Permyak), Mordvin (Erzya, Moksha), Mari (Hill, Meadow), Udmurt, Nenets (Tundra, Forest), Enets, Nganasan, Selkup, Mansi, Khanty, Veps, Karelian (Olonets), Karelian, Ingrian (Izhorian), Votic, Livonian, Ludic, and other related languages.
All Uralic languages exhibit rich morphological structure, which makes processing them challenging for state-of-the-art computational linguistic approaches, the majority also suffer from a lack of resources and many are endangered.
Research papers should be original, substantial and unpublished research, that can describe work-in-progress systems, frameworks, standards and evaluation schemes. Demos and tutorials will present systems and standards towards the goal of interoperability and unification of different projects, applications and research groups Appropriate topics include (but are not limited to):
- Parsers, analysers and processing pipelines of Uralic languages
- Lexical databases, electronic dictionaries
- Finished end-user applications aimed at Uralic languages, such as spelling or grammar checkers, machine translation or speech processing
- Evaluation methods and gold standards, tagged corpora, treebanks
- Reports on language-independent or unsupervised methods as applied to Uralic languages
- Surveys and review articles on subjects related to computational linguistics for one or more Uralic languages
- Any work that aims at combining efforts and reducing duplication of work
- How to elicit activity from the language community, agitation campaigns, games with a purpose
To maximise the possibility of reproducibility, replication and reuse, we particularly encourage submissions which present free/open-source language resources and make use of free/open-source software. One of the aims of this gathering is to avoid unnecessary duplicated work in field of Uralistics by establishing connections and interoperability standards between researchers and research groups working at different sites. We have also identified a serious lack of gold standards and evaluation metrics for all Uralic languages including those with national support, any work towards better resources in these fields will be greatly appreciated.
In this year’s edition, we encourage people to present comparative evaluations of different NLP methods as applied to Uralic languages. With all the buzz around neural and deep-learning methods: Are they applicable to Uralic languages, which in general have very little training data --- even monolingual data --- and also richer morphology than the more widely treated Indo-European languages.
Submission of papers
Language of submission: Submissions should be made in English or Russian with an obligatory abstract in at least one of the Uralic Language(s).
Double submission: To maximise the impact of work in the field of computational linguistics for the Uralic languages we are open to the possibility of double submission, or submission of work which has been partially published elsewhere. Any double submission should however be reported to the programme committee at the time of submission. In the advent of double acceptance the authors should choose in which venue to publish.
Publication venue: Proceedings of the workshop will be published open-access in ACL anthology, SIG proceedings for SIGUR.
Conflicts of interest: The reviewing process will be anonymous (double-blind peer review).
Submission Guidelines: Submit via easychair. The LaTeX templates are here: https://github.com/acl-sigur/iwclul-latex/releases/tag/iwclul-2019. You may also submit a PDF generated from a Word Document or other LaTeX template, but if the paper is accepted you will need to format the camera-ready version according to the guidelines. There are no hard limits for page counts but for the benefit of reviewers please make it approximately 5-20 pages depending on the page layout.
List of Topics
- Parsers, analysers and processing pipelines of Uralic languages
- Lexical databases, electronic dictionaries
- Finished end-user applications aimed at Uralic languages, such as spelling or grammar checkers, machine translation or speech processing
- Evaluation methods and gold standards, tagged corpora, treebanks
- Reports on language-independent or unsupervised methods as applied to Uralic languages
- Surveys and review articles on subjects related to computational linguistics for one or more Uralic languages
- Any work that aims at combining efforts and reducing duplication of work
- How to elicit activity from the language community, agitation campaigns, games with a purpose
Organisers
Programme committee
- Tommi Pirinen, University of Hamburg
- Francis Tyers, Indiana University and Higher School of Economics
- Eszter Simon, Research Institute for Linguistics, Hungarian Academy of Sciences
- Anna Volkova, School of Linguistics, National Research University, Higher School of Economics, Moscow
- Heiki-Jaan Kaalep, University of Tartu
- Lene Antonsen, University of Tromsø
- Trond Trosterud, University of Tromsø
- Thierry Poibeau, LaTTiCe-CNRS
- Veronika Vincze, Hungarian Academy of Sciences, Research Group on Articial Intelligence
- Kadri Muischnek, University of Tartu
- Csilla Horvath, Research Institute for Linguistics, Hungarian Academy of Sciences
- Filip Ginter, University of Turku
- Mark Fišel, University of Tartu
- Kaili Müürisep, University of Tartu
- Michael Rießler, Albert-Ludwigs-Universität Freiburg
- Jeremy Bradley, Ludwig Maximilian University of Munich
Travel
Participants from outside the Schengen area may require a visa to visit Estonia. If you require an invitation letter confirming your participation, please get in contact with the local organisers.
Local organisers
Anneli Vainumäe, Heiki-Jaan Kaalep (firstname dot lastname att ut dot ee)
Contact
Organisers can be reached via google group: iwclul@googlegroups.com. Local organisers should be contacted directly.