データベースからの決定木構成における数理的問題

Bibliographic Information

Other Title
  • Constructing Efficient Decision Trees by Using Optimized Numeric Association Rules

Search this article

Description

データベースからの決定木の構成において、数値属性の取り扱いは非常に難しいとされていた。実際、有名なエントロピーを用いた決定木構成法について、発案者のQuinlan自身、多くの数値属性があるデータに対しては効率を保証できないことを指摘している。この問題に対する解決法として、最適化問題として数理モデル化した二次元関連ルールを分岐法則に使う方法を提案し、効率的な決定木の構成法を、プロトタイプシステムをデータマイニングシステムSONAR(System for Optimized Numeric Association Rules)のサブシステムとして実現した。ここでは、数理的側面からの理論的裏付けと実験結果を報告する。
We propose an extension of an entropy-based heuristic of Quinlan[Q93] for constructing a decision tree from a large database with many numleric attributes. Quinlan pointed out that his original method (as well as other existing methods) may be inefficient, if any numeric attributes are strongly correlated. Our approach offers one solution to this problem. For each pair of numeric attributes with strong correlation, we compute a two-dimensional association rule with respect to these attributes and the objective attribute of the decision tree. In particular, we consider a family R of grid-regions in the plane associated with the pair of attributes. For R ∈e R, the data can be split into two classes: data inside R and data outside R. We compute the region R_<opt> R that minimizes the entropy of the splitting, and add the splitting associated with R_<opt> (for each pair of strongly correlated attributes) to the set of candidate tests in Quinlan's entropy-based heuristic. We give efficient, algorithms for cases in which R is (1) x-monotone connected regions, (2) based-monotone regions, (3) rectangles, and (4) rectilinear convex regions. The algorithm for the first case has been implemented as a subsystem of SONAR(System for Optimized Numeric Association Rules) developed by the authors. Tests show that our approach can create small-sized decision trees.

Journal

  • IPSJ SIG Notes

    IPSJ SIG Notes 8 1-8, 1996-07-26

    Information Processing Society of Japan (IPSJ)

References(15)*help

See more

Details 詳細情報について

  • CRID
    1572261552106249472
  • NII Article ID
    110002936355
  • NII Book ID
    AN10505667
  • ISSN
    09196072
  • Text Lang
    en
  • Data Source
    • CiNii Articles

Report a problem

Back to top