SANTI-morf dictionaries


  • Prihantoro Universitas Diponegoro



SANTI-morf, dictionary, Indonesian, corpus, Morphology


This article highlights the structure of dictionaries used in SANTI-morf (Sistem Analisis Teks Indonesia – morfologi), a multi-module pipeline system that performs annotations for an Indonesian corpus at the morpheme level and built using NooJ (Silberztein, 2003, 2016). SANTI-morf dictionaries, together with other SANTI-morf components, enable the system to tokenize each word in an Indonesian corpus into morphemes (e.g., cliticized and non-cliticized roots, affixes, reduplications) and associate these morphemes with their corresponding tags. Each entry in the SANTI-morf dictionary is encoded with a tag composed of morphological analysis (MA) labels. In most cases, these labels are combined with system implementation (SI) labels. Morphological analysis labels consist of formal and functional morphological criteria labels and are typically used for searching the annotated corpus (e.g., root part of speech (POS) labels). System implementation labels are used for system implementation and are mostly of interest to developers rather than end users. They include morphotactic and morphophonemic constraint labels, which are processed when the monomorphemic entries in dictionaries work together with SANTI-morf grammars (rules).

Author Biography

    Prihantoro is an associate professor of corpus linguistics in the department of Linguistics, Universitas Diponegoro, Indonesia. He earned his Ph.D from Lancaster University, and he manages some corpora in CQPweb Lancaster ( He is the author of SANTI-morf (a morphological annotation system for Indonesian) and Buku Referensi Pengantar Linguistik Korpus (Introduction to corpus linguistics reference book, written in Indonesian). He can be reached via [email protected], or his website


