Information extraction from the web by matching visual presentation patterns

Minárik, Matej; Burget, Radek

Full metadata record

DC pole	Hodnota	Jazyk
dc.contributor.author	Minárik, Matej
dc.contributor.author	Burget, Radek
dc.contributor.editor	Steinberger, Josef
dc.contributor.editor	Zíma, Martin
dc.contributor.editor	Fiala, Dalibor
dc.contributor.editor	Dostal, Martin
dc.contributor.editor	Nykl, Michal
dc.date.accessioned	2017-10-09T12:39:34Z
dc.date.available	2017-10-09T12:39:34Z
dc.date.issued	2017
dc.identifier.citation	STEINBERGER, Josef ed.; ZÍMA, Martin ed.; FIALA, Dalibor ed.; DOSTAL, Martin ed.; NYKL, Michal ed. Data a znalosti 2017: sborník konference, Plzeň, Hotel Angelo 5. - 6. října 2017. 1. vyd. Plzeň: Západočeská univerzita v Plzni, 2017, s. 227-231. ISBN 978-80-261-0720-0.	cs
dc.identifier.isbn	978-80-261-0720-0
dc.identifier.uri	https://www.zcu.cz/export/sites/zcu/pracoviste/vyd/online/DataAZnalosti2017.pdf
dc.identifier.uri	http://hdl.handle.net/11025/26368
dc.format	5 s.	cs
dc.format.mimetype	application/pdf
dc.language.iso	en	en
dc.publisher	Západočeská univerzita v Plzni	cs
dc.rights	© Západočeská univerzita v Plzni	cs
dc.subject	integrace webových dat	cs
dc.subject	extrakce informací	cs
dc.subject	strukturovaná extrakce záznamů	cs
dc.subject	segmentace stránek	cs
dc.subject	klasifikace obsahu	cs
dc.subject	mapování ontologií	cs
dc.title	Information extraction from the web by matching visual presentation patterns	en
dc.type	konferenční příspěvek	cs
dc.type	conferenceObject	en
dc.rights.access	openAccess	en
dc.type.version	publishedVersion	en
dc.description.abstract-translated	There is a large amount of data available on the Web. Data are often represented as text, enriched with tables, lists, images or other visual structures. These data are usually coded in HTML without any additional semantics, which makes them nigh impossible to automatically process and extract. There are ap-proaches based on top-down document segmentation according to visual infor-mation and layout. We present a bottom-up approach which starts with the smallest consistent elements and matches the visual relationships among these elements to a pre-defined ontological structure of extracted records. This meth-od considers not only the visual attributes of a particular segment, but also its position amongst other segments.	en
dc.subject.translated	web data integration	en
dc.subject.translated	information extraction	en
dc.subject.translated	structured record extraction	en
dc.subject.translated	page segmentation	en
dc.subject.translated	content classification	en
dc.subject.translated	ontology mapping	en
dc.type.status	Peer-reviewed	en
Vyskytuje se v kolekcích:	Data a znalosti 2017 Data a znalosti 2017

Soubory připojené k záznamu:

Soubor	Popis	Velikost	Formát
Minarik.pdf	Plný text	376,7 kB	Adobe PDF	Zobrazit/otevřít

Zobrazit minimální záznam Zobrazit statistiky

Použijte tento identifikátor k citaci nebo jako odkaz na tento záznam: http://hdl.handle.net/11025/26368

Všechny záznamy v DSpace jsou chráněny autorskými právy, všechna práva vyhrazena.

hledání

navigace