Pub/Sub環境におけるkNNデータモニタリングの分散処理のためのクエリ割当てアルゴリズム

鶴岡, 翔平, 西尾, 俊哉, 天方, 大地, 原, 隆浩

近年，パブリッシュ/サブスクライブ（Pub/Sub）モデルに基づいてデータを配信するアプリケーションが多く存在し，ユーザは自身の興味のあるデータのみを取得する．また，スマートフォンやSNSの普及により，データやクエリ（サブスクリプション）の数が増加しており，単一のワーカですべての処理を行うとリアルタイム性を保持することが困難になっている．本稿では，Pub/Subモデルにおいて，複数のワーカにクエリを分散して処理を分担することにより，各クエリに対して，クエリのキーワードを含むデータの中でクエリの指定位置に最も近いk個のデータ（kNNデータ）を効率的にモニタリングする問題に取り組む．単純にクエリを分散させると，各ワーカの処理量に偏りが生まれ，その結果，新たなデータが生成されたとき，処理量の大きいワーカの処理時間が長くなり，全体の処理時間も長くなる．この問題を解決するため，各ワーカの処理時間が均一かつ全体の処理時間が短くなるようにクエリを分散するアルゴリズムを提案する．実データを用いた実験により，提案手法がベースライン手法と比べて数倍から数十倍高速にkNNデータを更新できることを確認した．

Recently, in many applications, data objects have been generated based on Publish/Subscribe (Pub/Sub) model, and users receive only their preferable data objects. Due to the prevalence of smartphones and SNS, a large numbers of data objects and queries (subscriptions) have been published and registered, respectively. This makes a single worker difficult to monitor the result of each query. In this paper, we address the problem of monitoring k nearest neighbor data objects which contain the query keywords in a distributed Pub/Sub system. A straightforward approach, which assigns the queries to workers randomly, cannot balance the workloads of the workers. To solve this, we propose an algorithm that assigns queries to balance the workload of each worker and to minimize the total amount of workload. Our experiments using real datasets verify that our proposed algorithm can update the kNN results more than several times faster compared with baseline algorithms

Pub/Sub環境におけるkNNデータモニタリングの分散処理のためのクエリ割当てアルゴリズム

Bibliographic Information

Search this article

Abstract

Journal

Keywords

Details 詳細情報について

Export

Report a problem