STUDIA UNIVERSITATIS

AMBIENTUM BIOETHICA BIOLOGIA CHEMIA DIGITALIA DRAMATICA EDUCATIO ARTIS GYMNAST. ENGINEERING EPHEMERIDES EUROPAEA GEOGRAPHIA GEOLOGIA HISTORIA HISTORIA ARTIUM INFORMATICA IURISPRUDENTIA MATHEMATICA MUSICA NEGOTIA OECONOMICA PHILOLOGIA PHILOSOPHIA PHYSICA POLITICA PSYCHOLOGIA-PAEDAGOGIA SOCIOLOGIA THEOLOGIA CATHOLICA THEOLOGIA CATHOLICA LATIN THEOLOGIA GR.-CATH. VARAD THEOLOGIA ORTHODOXA THEOLOGIA REF. TRANSYLVAN ROMÂNA ENGLISH INFOKIOSK CONTACT ADDRESSES SPECIAL ACCESS SUBSCRIPTION FORM NEWSLETTER & DOWNLOAD NEWEST ISSUES THIS YEAR ISSUES ALL ISSUES IN ARCHIVE FIND IN ARCHIVE HISTORY TODAY SCOP & OBJECTIVES THE TEAM


	The STUDIA UNIVERSITATIS BABEŞ-BOLYAI issue article summary The summary of the selected article appears at the bottom of the page. In order to get back to the contents of the issue this article belongs to you have to access the link from the title. In order to see all the articles of the archive which have as author/co-author one of the authors mentioned below, you have to access the link from the author's name.


	STUDIA PHILOLOGIA - Issue no. 3 / 2020

	Article:	AUTOMATED CLASSIFICATION OF VARIANTS OF NORWEGIAN BY MEANS OF TEXT MINING OF UNANNOTATED TEXT / AUTOMATISERT KLASSIFIKASJON AV NORSKE MÅLFORMER VHA. DATAUTVINNING AV UANNOTERT TEKST. Authors: FARTEIN THORSEN ØVERLAND.


	Abstract: DOI: 10.24193/subbphilo.2020.3.08 Published Online: 2020-09-30 Published Print: 2020-09-30 pp. 107-124 FULL PDF *Automated Classification of Variants of Norwegian by Means of Text Mining of Unannotated Text. This article presents a model for automatically classifying different variants of modern Norwegian Language (bokmål* and nynorsk ranging from 1930 to 2011) by means of data mining unannotated text. The model is built in the Orange visual programming interface, and is based on a modification of an example model presented by the project which had the original purpose of semantical classification of fairy tale types in the Aarne-Thompson-Uther Index. The core modules of the model are Bag-of-Words and Logistic Regression. The model is trained with four different translations of the Gospel of John, and cross validated with various random texts. The model is proven to be very sound for classification of Norwegian language variation, and yields correct classification in 100% of the realistic tests. Keywords: Language Variation, Text mining, Orange Data Mining, Text Clustering, Text Classification, Bag-of-Words, Logistic Regression, Predictive Model, Norwegian Language, Nynorsk, Bokmål




			Back to previous page