IWCLUL 2019el-logo

Fifth International Workshop on Computational Linguistics for Uralic Languages is organised by ACL SIGUR (and University of Tartu) on January 7-8, 2019, Tartu, Estonia

The final proceedings version will be available in the ACL SIGUR section of ACL anthology.

Program

    7.1.2019
    
     8:30 Registration / Check-in / Coffee (self paid)
     9:00 Welcome opening speech
     9:15 Invited speech: Mans Hulden. Linguistics with Black Boxes
    10:15 Miikka Silfverberg and Francis Tyers. Data-Driven Morphological Analysis of Nominal Morphology for Uralic Languages
    10:45 Stig-Arne Grönroos, Sami Virpioja and Mikko Kurimo. North Sámi morphological segmentation with low-resource semi-supervised sequence labeling
    11:15 Poster boasters
    11:45 Lunch (self-paid)
    13:00 Linda Wiechetek, Sjur Nørstebø Moshagen and Thomas Omma. Is this the end? Two-step tokenization of sentence boundaries
    13:30 Johannes Huber and Myra Spiliopoulou. Learning multilingual topics through aspect extraction from monolingual texts
    14:00 Valts Ernštreits. Electronical resources for Livonian
    14:30 Kristian Kankainen. The use of Extract Morphology for Automatic Generation of Language Technology for Votic
    15:00 Posters and demos / Coffee (self paid)
              Noémi Ligeti-Nagy, Andrea Dömötör and Noémi Vadász. What does the Nom say? An algorithm for case disambiguation in Hungarian
              Frankie Robertson. A Contrastive Evaluation of Word Sense Disambiguation Systems for Finnish
              Kadri Muischnek and Liisi Torga. Elliptical Constructions in Estonian UD Treebank
              Kimmo Kettunen. FiST – towards a free Semantic Tagger of modernstandard Finnish
              Niko Partanen and Michael Rießler. An OCR system for the Unified Northern Alphabet
              Joshua Wilbur. ELAN as a search engine for Freiburg-style tagged corpora
              Tommi A Pirinen. Neural and rule-based Finnish NLP models - expectations, experiments and experiences
              Timofey Arkhangelskiy, Anne Ferger and Hanna Hedeland. Uralic multimedia corpora: ISO/TEI corpus data in the project INEL
              Timofey Arkhangelskiy. Corpora of social media in minority Uralic languages
                 
    19:00 Social dinner (self-paid)  

    8.1. 2019

    10:00 ACL SIGUR "business" meeting
    11:00 Tutorials and hands on

    Tutorials:
            Common voice + DeepSpeech

Invited speaker

Mans Hulden, University of Colorado in Boulder

Title: Linguistics with Black Boxes

Abstract:
Neural networks have in a short time brought about previously unimaginable advances in computational linguistics and natural language processing. The main criticism against them from a linguistic point of view is that neural models - while fine for "language engineering tasks" - are thought of as being black boxes, and that their parameter opacity prevents us from discovering new facts about the nature of language itself, or specific languages. In this talk I will challenge that assumption and argue that there are ways to uncover new facts about language, even with a black box learner.

I will discuss specific experiments with neural models and sound embeddings that reveal new information about the organization of sound systems in human languages, give us insight into the limits of complexity of word-formation, give us models of why and when irregular forms - surely an inefficiency in a communication system - can persist over long periods of time, and reveal what the boundaries of pattern learning is (how much information do we minimally need to learn a grammatical aspect of language such as its word inflection or sentence formation).

Submission

Via Easychair: https://easychair.org/conferences/?conf=iwclul2019

Important dates

  • 29 June 2018: Call for papers announced
  • 1st October 2018: 2nd call for papers
  • 12th 21st November 2018: Paper submission deadline
  • 6th December 2018: Paper notification
  • 21st December 2018: Camera-ready deadline
  • 7th–8th January 2019: Workshop held in Tartu

Call for papers

The purpose of the conference series International Workshop on Computational Linguistics for Uralic Languages is to bring together researchers working on computational approaches to working with these languages. We accept long and short papers as well as tutorial proposals working on the following languages: Finnish, Hungarian, Estonian, Võro, the Sámi languages, Komi (Zyrian, Permyak), Mordvin (Erzya, Moksha), Mari (Hill, Meadow), Udmurt, Nenets (Tundra, Forest), Enets, Nganasan, Selkup, Mansi, Khanty, Veps, Karelian (Olonets), Karelian, Ingrian (Izhorian), Votic, Livonian, Ludic, and other related languages.

All Uralic languages exhibit rich morphological structure, which makes processing them challenging for state-of-the-art computational linguistic approaches, the majority also suffer from a lack of resources and many are endangered.

Research papers should be original, substantial and unpublished research, that can describe work-in-progress systems, frameworks, standards and evaluation schemes. Demos and tutorials will present systems and standards towards the goal of interoperability and unification of different projects, applications and research groups Appropriate topics include (but are not limited to):

  • Parsers, analysers and processing pipelines of Uralic languages
  • Lexical databases, electronic dictionaries
  • Finished end-user applications aimed at Uralic languages, such as spelling or   grammar checkers, machine translation or speech processing
  • Evaluation methods and gold standards, tagged corpora, treebanks
  • Reports on language-independent or unsupervised methods as applied to Uralic   languages
  • Surveys and review articles on subjects related to computational linguistics   for one or more Uralic languages
  • Any work that aims at combining efforts and reducing duplication of work
  • How to elicit activity from the language community, agitation campaigns, games   with a purpose

To maximise the possibility of reproducibility, replication and reuse, we particularly encourage submissions which present free/open-source language resources and make use of free/open-source software.  One of the aims of this gathering is to avoid unnecessary duplicated work in field of Uralistics by establishing connections and interoperability standards between researchers and research groups working at different sites. We have also identified a serious lack of gold standards and evaluation metrics for all Uralic languages including those with national support, any work towards better resources in these fields will be greatly appreciated.

In this year’s edition, we encourage people to present comparative evaluations of different NLP methods as applied to Uralic languages. With all the buzz around neural and deep-learning methods: Are they applicable to Uralic languages, which in general have very little training data --- even monolingual data --- and also richer morphology than the more widely treated Indo-European languages.

Submission of papers

Language of submission: Submissions should be made in English or Russian with an obligatory abstract in at least one of the Uralic Language(s).

Double submission: To maximise the impact of work in the field of computational linguistics for the Uralic languages we are open to the possibility of double submission, or submission of work which has been partially published elsewhere. Any double submission should however be reported to the programme committee at the time of submission. In the advent of double acceptance the authors should choose in which venue to publish.

Publication venue: Proceedings of the workshop will be published open-access in ACL anthology, SIG proceedings for SIGUR.

Conflicts of interest: The reviewing process will be anonymous (double-blind peer review).

Submission Guidelines: Submit via easychair. The LaTeX templates are here: https://github.com/acl-sigur/iwclul-latex/releases/tag/iwclul-2019. You may also submit a PDF generated from a Word Document or other LaTeX template, but if the paper is accepted you will need to format the camera-ready version according to the guidelines. There are no hard limits for page counts but for the benefit of reviewers please make it approximately 5-20 pages depending on the page layout. 

List of Topics

  • Parsers, analysers and processing pipelines of Uralic languages
  • Lexical databases, electronic dictionaries
  • Finished end-user applications aimed at Uralic languages, such as spelling or   grammar checkers, machine translation or speech processing
  • Evaluation methods and gold standards, tagged corpora, treebanks
  • Reports on language-independent or unsupervised methods as applied to Uralic   languages
  • Surveys and review articles on subjects related to computational linguistics   for one or more Uralic languages
  • Any work that aims at combining efforts and reducing duplication of work
  • How to elicit activity from the language community, agitation campaigns, games   with a purpose

Organisers

Programme committee

  • Tommi Pirinen, University of Hamburg
  • Francis Tyers, Indiana University and Higher School of Economics
  • Eszter Simon, Research Institute for Linguistics, Hungarian Academy of Sciences
  • Anna Volkova, School of Linguistics, National Research University, Higher School of Economics, Moscow
  • Heiki-Jaan Kaalep, University of Tartu
  • Lene Antonsen, University of Tromsø
  • Trond Trosterud, University of Tromsø
  • Thierry Poibeau, LaTTiCe-CNRS
  • Veronika Vincze, Hungarian Academy of Sciences, Research Group on Articial Intelligence
  • Kadri Muischnek, University of Tartu
  • Csilla Horvath, Research Institute for Linguistics, Hungarian Academy of Sciences
  • Filip Ginter, University of Turku
  • Mark Fišel, University of Tartu
  • Kaili Müürisep, University of Tartu
  • Michael Rießler, Albert-Ludwigs-Universität Freiburg
  • Jeremy Bradley, Ludwig Maximilian University of Munich

Travel

Participants from outside the Schengen area may require a visa to visit Estonia. If you require an invitation letter confirming your participation, please get in contact with the local organisers.

Local organisers

Anneli Vainumäe, Heiki-Jaan Kaalep (firstname dot lastname att ut dot ee)

Contact

Organisers can be reached via google group: iwclul@googlegroups.com. Local organisers should be contacted directly.

 

forward