API-Based Software Birthmarking Method Using Fuzzy Hashing

  • KANG Dongwoo
    College of Information and Communication Engineering, Sungkyunkwan University
  • LEE Donghoon
    College of Information and Communication Engineering, Sungkyunkwan University
  • KIM Jiye
    College of Information and Communication Engineering, Sungkyunkwan University
  • CHOI Younsung
    Department of Cyber Security, Howon University
  • WON Dongho
    College of Information and Communication Engineering, Sungkyunkwan University

書誌事項

公開日
2016
DOI
  • 10.1587/transinf.2015edp7379
公開者
一般社団法人 電子情報通信学会

この論文をさがす

説明

The software birthmarking technique has conventionally been studied in fields such as software piracy, code theft, and copyright infringement. The most recent API-based software birthmarking method (Han et al., 2014) extracts API call sequences in entire code sections of a program. Additionally, it is generated as a birthmark using a cryptographic hash function (MD5). It was reported that different application types can be categorized in a program through pre-filtering based on DLL/API numbers/names. However, similarity cannot be measured owing to the cryptographic hash function, occurrence of false negatives, and it is difficult to functionally categorize applications using only DLL/API numbers/names. In this paper, we propose an API-based software birthmarking method using fuzzy hashing. For the native code of a program, our software birthmarking technique extracts API call sequences in the segmented procedures and then generates them using a fuzzy hash function. Unlike the conventional cryptographic hash function, the fuzzy hash is used for the similarity measurement of data. Our method using a fuzzy hash function achieved a high reduction ratio (about 41% on average) more than an original birthmark that is generated with only the API call sequences. In our experiments, when threshold ε is 0.35, the results show that our method is an effective birthmarking system to measure similarities of the software. Moreover, our correlation analysis with top 50 API call frequencies proves that it is difficult to functionally categorize applications using only DLL/API numbers/names. Compared to prior work, our method significantly improves the properties of resilience and credibility.

収録刊行物

参考文献 (20)*注記

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ