AMBIENTUM BIOETHICA BIOLOGIA CHEMIA DIGITALIA DRAMATICA EDUCATIO ARTIS GYMNAST. ENGINEERING EPHEMERIDES EUROPAEA GEOGRAPHIA GEOLOGIA HISTORIA HISTORIA ARTIUM INFORMATICA IURISPRUDENTIA MATHEMATICA MUSICA NEGOTIA OECONOMICA PHILOLOGIA PHILOSOPHIA PHYSICA POLITICA PSYCHOLOGIA-PAEDAGOGIA SOCIOLOGIA THEOLOGIA CATHOLICA THEOLOGIA CATHOLICA LATIN THEOLOGIA GR.-CATH. VARAD THEOLOGIA ORTHODOXA THEOLOGIA REF. TRANSYLVAN
|
|||||||
The STUDIA UNIVERSITATIS BABEŞ-BOLYAI issue article summary The summary of the selected article appears at the bottom of the page. In order to get back to the contents of the issue this article belongs to you have to access the link from the title. In order to see all the articles of the archive which have as author/co-author one of the authors mentioned below, you have to access the link from the author's name. |
|||||||
STUDIA INFORMATICA - Issue no. 2 / 2017 | |||||||
Article: |
A HYBRID APPROACH FOR SCHOLARLY INFORMATION EXTRACTION. Authors: ZALÁN BODÓ, LEHEL CSATÓ. |
||||||
Abstract: DOI: 10.24193/subbi.2017.2.01 Published Online: 2017-12-15 Published Print: 2017-12-15 pp. 5-16 VIEW PDF: A Hybrid Approach for Scholarly Information Extraction Metadata extraction from documents forms an essential part of web or desktop search systems. Similarly, digital libraries that index scholarly literature require to find and extract the title, the list of authors and other publication-related information from an article. We present a hybrid approach for metadata extraction, combining classification and clustering to extract the desired information without the need of a conventional labelled dataset for training. An important asset of the proposed method is that the resulting clustering parameters can be used in other problems, e.g. document layout analysis. Keywords: information extraction, metadata, machine learning. 2010 Mathematics Subject Classification. 62H30, 68P20. |
|||||||