データベース分割を目的としたデータ仮想化によるデータベースの仮想統合

齋藤, 和広, 米田, 信之, 渡辺, 泰之, 黒川, 茂莉, 村松, 茂樹, 小林, 亜令, Kazuhiro, Saito, Nobuyuki, Maita, Yasuyuki, Watanabe, Mori, Kurokawa, Shigeki, Muramatsu, Arei, Kobayashi

通信インフラの運用管理設備など，大規模なデータの逐次追加を要するDBにおいては，規模が大きくなることで，リソース設計が複雑となり，メンテナンスコストも増加する．そこで，単一のDBを複数のDBに分割しつつ，ユーザの利便性を下げない，データベースの仮想的な統合が求められている．データ仮想化は，複数のDBを仮想統合する技術の1つで，複数のDBのスキーマ情報を統合し，単一のDBと同様にSQLクエリでそれぞれのDBにアクセスできる．しかしそれぞれのDBからネットワークを介してデータを取得してクエリ処理をするため，大規模なデータを扱う際に，通信がボトルネックとなり，クエリ処理性能が低下するという課題がある．そこで本論文では，データ仮想化における通信ボトルネックを解消しうる3つの手法：クエリプッシュダウン手法，データ配置分析手法，中間データ再配置手法の有効性を，クエリ処理性能の観点から定量的に比較評価する．評価では，通信インフラの運用ログ（実データ）を利用し，通信品質を測定するクエリを対象に，クエリ実行時間を計測した．本評価の結果から，データ仮想化によるDB分割においては，クエリプッシュダウン手法の適用だけでなく，クエリログを用いたデータ配置分析手法や，クエリ実行時のデータ再配置手法を複合的に用いることで，通信ボトルネックをおおむね解消できることが明らかとなった．特に，12種のクエリのうち11種で，3つの手法適用前のデータ仮想化ではDB分割前の環境と比較して通信ボトルネックにより平均2010倍遅延していたが，3つの手法を適用することで平均1.8倍の遅延に改善された．

As a scale of an integrated DB that has big data increasing day by day such as facility logs about telecommunication infrastructures gets bigger, its resource planning is more complex and its maintenance cost is higher. The virtual database integration technology is now necessary to divide such a big DB to multiple DBs without inconvenience for users. Data virtualization is one of such a technology that integrates only the schemas in multiple DBs and provides the integrated SQL interface for users like a single DB. However, because the data virtualization system accesses data in each DB through network, a communication between a DB and the data virtualization system degrades its query performance as a bottleneck. In this paper, to resolve the bottleneck problem of the data virtualization, we evaluate three methods: query pushdown, data placement analysis and replacement of intermediate data regarding query performance. In this evaluation, we use telecommunication logs as the real environment, and measure the execution time using analysis queries for network quality in 3DB environment integrated by data virtualization implemented three methods. As the evaluation result, we revealed to be able to resolve the network bottleneck problem by applying not only query pushdown but also data placement analysis using query logs and data replacement of intermediate data in the virtual database integration by data virtualization for the purpose of the divide of a single DB in real environment. Especially, in 11 queries of 12 queries, whereas the original data virtualization without three methods results in on average 2010 times slower than a single DB environment by network bottleneck, the data virtualization with three methods improves the network bottleneck to on average 1.8 times slower.

データベース分割を目的としたデータ仮想化によるデータベースの仮想統合

書誌事項

この論文をさがす

説明

収録刊行物

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

データベース分割を目的としたデータ仮想化によるデータベースの仮想統合

書誌事項

この論文をさがす

説明

収録刊行物

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について