skip to main content
10.5555/2856151.2856175guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article
Free access

Using learning-based filters to detect rule-based filtering obsolescence

Published: 12 April 2000 Publication History

Abstract

For years, Caisse des Dépôts et Consignations has produced information filtering applications. To be operational, these applications require high filtering performances which are achieved by using rule-based filters. With this technique, an administrator has to tune a set of rules for each topic. However, filters become obsolescent over time. The decrease of their performances is due to diachronic polysemy of terms that involves a loss of precision and to diachronic polymorphism of concepts that involves a loss of recall.
To help the administrator to maintain his filters, we have developed a method which automatically detects filtering obsolescence. It consists in making a learning-based control filter using a set of documents which have already been categorised as relevant or not relevant by the rule-based filter. The idea is to supervise this filter by processing a differential comparison of its outcomes with those of the control one.
This method has many advantages. It is simple to implement since the training set used by the learning is supplied by the rule-based filter. Thus, both the making and the use of the control filter are fully automatic. With automatic detection of obsolescence, learning-based filtering finds a rich application which offers interesting prospects.

References

[1]
Chen, S., Billings, S. & Luo, W. (1989). Orthogonal least squares methods and their application to non-linear system identification. In International Journal of Control, 50(5), 1873--1896.
[2]
Hull D. (1998). The TREC-7 Filtering Track: Description and Analysis. In the 7th Text Retrieval Conference (TREC-7) (pp. 33--56). Gaithersburg, USA: NIST.
[3]
Klinkenberg, R. & Renz, I. (1998). Adaptive Information Filtering : Learning in the Presence of Concept Drifts. In Workshop Learning for Text Categorization, American Association for Artificial Intelligence. Madison, USA.
[4]
Landau, M.-C., Sillion, F. & Vichot, F. (1993). Exoseme: a Thematic Document Filtering System. In Intelligence Artificielle. Avignon, France.
[5]
Lanquillon C. (1999). Information Filtering in Changing Domains. In Workshop on Machine Learning for Information Filtering, International Joint Conference on Artificial Intelligence (IJCAI'99), Stockholm, Sweden.
[6]
Lewis, D. & Gale, W. (1994) A Sequential Algorithm for Training Text Classifiers. Proceedings of the 17th Annual International ACM/SIGIR Conference, 3--12.
[7]
Maes, P. (1994) Agents that Reduce Work and Information Overload. In Communications of the ACM, 37(7), 3 1--40.
[8]
McCallum, A. & Nigam, K. (1998). Employing EM and Pool-Based Active Learning for Text Classification. In Proceedings of the 15th International Conference on Machine Learning (ICML'98) (pp. 359--367).
[9]
Pitrat, J. (1993) Penser autrement l'informatique. Paris, France: Hermès.
[10]
Stricker, M., Vichot, F., Dreyfus, G. & Wolinski, F. (1999). Two-Step Feature Selection and Neural Network Classification for TREC-8 Routing. In the 8th Text Retrieval Conference (TREC-8), Gaithersburg, USA: NIST.
[11]
Stricker, M., Vichot, F., Dreyfus, G. & Wolinski, F. (2000). Vers la conception automatique de filtres d'informations efficaces. In Reconnaisance des Formes et Intelligence Artificielle (RFIA '00) (pp. I.129--I.137). AFRIF-AFIA. Paris, France.
[12]
Vichot, F., Wolinski, F., Ferri, H.-C. & Urbani, D. (1999). Feeding a Financial Decision Support System with Textual Information. In Journal of Intelligent and Robotic Systems, (26), 157--166.
[13]
Vichot, F., Wolinski, F., Tomeh, J., Guennou, S., Dillet, B. & Aidjan, S. (1997). High Precision Hypertext Navigation Based on NLP Automatic Extractions. In Hypertext Information Retrieval Multimedia (HIM '97) (pp. 161--174). Dortmund, Germany.
[14]
Wolinski, F., Vichot, F. & Dillet, B. (1995). Automatic Processing of Proper Names in Texts. In the Proceedings of the 7th Conference of the European Chapter of the Association for Computational Linguistics (EACL '95) (pp. 23--30). Dublin, Ireland: University College.
[15]
Wolinski, F., Vichot, F. & Grémont, O. (1998). Producing NLP-based On-line Contentware, In Natural Language Processing and Industrial Applications (NLP+IA '98) (pp. 253--259). Moncton, Canada: Université de Moncton.
[16]
Yang, Y. & Lui, X. (1999). A Re-examination of Text Categorization Methods, In the 22th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Berkeley, USA.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
RIAO '00: Content-Based Multimedia Information Access - Volume 2
April 2000
859 pages

Publisher

LE CENTRE DE HAUTES ETUDES INTERNATIONALES D'INFORMATIQUE DOCUMENTAIRE

Paris, France

Publication History

Published: 12 April 2000

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 72
    Total Downloads
  • Downloads (Last 12 months)58
  • Downloads (Last 6 weeks)8
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media