The STUDIA UNIVERSITATIS BABEŞ-BOLYAI issue article summary

The summary of the selected article appears at the bottom of the page. In order to get back to the contents of the issue this article belongs to you have to access the link from the title. In order to see all the articles of the archive which have as author/co-author one of the authors mentioned below, you have to access the link from the author's name.

 
       
         
    STUDIA INFORMATICA - Issue no. 2 / 2017  
         
  Article:   A HYBRID APPROACH FOR SCHOLARLY INFORMATION EXTRACTION.

Authors:  ZALÁN BODÓ, LEHEL CSATÓ.
 
       
         
  Abstract:  
DOI: 10.24193/subbi.2017.2.01

Published Online: 2017-12-15
Published Print: 2017-12-15
pp. 5-16
VIEW PDF: A Hybrid Approach for Scholarly Information Extraction

Metadata extraction from documents forms an essential part of web or desktop search systems. Similarly, digital libraries that index scholarly literature require to find and extract the title, the list of authors and other publication-related information from an article. We present a hybrid approach for metadata extraction, combining classification and clustering to extract the desired information without the need of a conventional labelled dataset for training. An important asset of the proposed method is that the resulting clustering parameters can be used in other problems, e.g. document layout analysis.

Keywords: information extraction, metadata, machine learning.

2010 Mathematics Subject Classification. 62H30, 68P20.
 
         
     
         
         
      Back to previous page