Abstract:
The web is a large source for valuable data. Today, this data is not only provided by professional publishers, but everyone in the form of user-generated content. A large...Show MoreMetadata
Abstract:
The web is a large source for valuable data. Today, this data is not only provided by professional publishers, but everyone in the form of user-generated content. A large part of such content is located in web forums. As platforms to share knowledge, they are easily accessible for everyone. However, their vast amount makes it hard to find discussions on a specific topic. Automatic systems can filter and point to relevant information. Unfortunately, the content is presented in a human-readable layout and is not intended to be processed by automatic systems. Therefore, it is necessary to separate the content in a web forum discussion from the layout before doing any further information mining. This paper presents FODEX - a system for automatic forum data extraction. It extracts data from any forum and matches it to a unified data schema.
Published in: 2012 26th International Conference on Advanced Information Networking and Applications Workshops
Date of Conference: 26-29 March 2012
Date Added to IEEE Xplore: 19 April 2012
ISBN Information: