Rezumat articol ediţie STUDIA UNIVERSITATIS BABEŞ-BOLYAI

În partea de jos este prezentat rezumatul articolului selectat. Pentru revenire la cuprinsul ediţiei din care face parte acest articol, se accesează linkul din titlu. Pentru vizualizarea tuturor articolelor din arhivă la care este autor/coautor unul din autorii de mai jos, se accesează linkul din numele autorului.

 
       
         
    STUDIA INFORMATICA - Ediţia nr.2 din 2013  
         
  Articol:   TEXT REPRESENTATION AND GENERAL TOPIC ANNOTATION BASED ON LATENT DIRICHLET ALLOCATION.

Autori:  DIANA INKPEN.
 
       
         
  Rezumat:  

We propose a low-dimensional text representation method for topic classification. A Latent Dirichet Allocation (LDA) model is built on a large amount of unlabelled data, in order to extract potential topic clusters. Each document is represented as a distribution over these clusters.We experiment with two datasets. We collected the first dataset from the FriendFeed social network and we manually annotated part of it with 10 general classes. The second dataset is a standard text classification bench-mark, Reuters 21578, the R8 subset (annotated with 8 classes). We show that classification based on the LDA representation leads to acceptable results, while combining a bag-of-words representation with the LDA representation leads to further improvements. We also propose a multi-level LDA representation that catches topic cluster distributions from generic ones to more specific ones.2010

Mathematics Subject Classification. 62Fxx Parametric inference, 62Pxx Applications.1998 CR Categories and Descriptors. code [I.2.7 Natural Language Processing]:Subtopic - Text analisys code [H.3.1 Content Analysis and Indexing]: Subtopic - Linguistic processing;

Key words and phrases. automatic text classification, topic detection, latent Dirichlet allocation.This paper has been presented at the International Conference KEPT2013: Knowledge Engineering Principles and Techniques, organized by Babeș-Bolyai University, Cluj-Napoca, July 5-7 2013.

 
         
     
         
         
      Revenire la pagina precedentă