Using fuzzy logic for an improved feature selection from web pages

Web pages are generally rich in structure, which are written using HTML markup that defines the layout that is visualised to visitors. However, this HTML markup has rarely been exploited for selecting meaningful features that characterise web pages.

In a new paper, accepted for publication in the IEEE Transactions on Fuzzy Systems journal, we have introduced a new fuzzy logic approach that exploits this HTML markup for an improved selection of features from web pages. This novel fuzzy approach for web page representation has been tested for clustering, achieving competitive results that improve traditional representation techniques. This fuzzy approach imitates the way in which humans skim through content of web pages, e.g., by mainly focusing attention on highlighted content, or content that is located at a preferential position. Our approaches deems higher level of importance to these parts of a web page that are in some way emphasised by web developers.

For more details on the approach and experimentation, please refer to the paper.

