Statistical modeling of large distribution sets

Yasushi Sakurai, Yasuko Matsubara, Masatoshi Yoshikawa

doi:10.1145/1811136.1811144

In this paper we deal with a ubiquitous problem in data management: hierarchical model estimation for large distribution sets. This particular problem arises in many applications. Classification, top-k query processing, clustering and outlier detection are just a few possible applications. Our aim is to continuously and incrementally estimate the model parameters of 'typical' distributions that describe the characteristics of a database. Our approach to model estimation can handle arbitrary types of data (e.g., categorical and numerical data) in databases, incrementally, quickly, and with little resource consumption. Moreover, this paper proposes not only incremental algorithms for model fitting, but also a modeling framework in which the learning approach recognizes hierarchical groups, each of whose distributions has similar characteristics, and separately updates the model parameters of each group without scanning all the distributions in the database. Thus, it can provide a response, i.e., the parameters of typical distribution models, with an arbitrary level of granularity, at any time. Just as importantly, we demonstrate the utility of our approach by showing how it can be applied to two specific problems that arise in the context of data management.

Statistical modeling of large distribution sets

説明

収録刊行物

詳細情報詳細情報について

書き出し

問題の指摘

Statistical modeling of large distribution sets

説明

収録刊行物

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について