MEMORY-Based Hardware Architectures to Detect ClamAV Virus Signatures with Restricted Regular Expression Features

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journal

5 Scopus Citations
View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Original languageEnglish
Article number7115115
Pages (from-to)1225-1238
Journal / PublicationIEEE Transactions on Computers
Volume65
Issue number4
Publication statusPublished - 1 Apr 2016

Abstract

We aim to implement a single-chip hardware detection engine for virus scanning. Our study is based on the ClamAV virus database, which contains 88.9 K strings and 9.6 K extended hex-signatures with restricted regular expression (regex) features. We have previously presented cost-effective hardware architectures to detect the 88.9K strings and 3.2K regex patterns that are composed of multiple string segments. In this paper, we shall present hardware architectures to detect the remaining 6.4 K regex patterns. Our method is based on the information reduction approach. We transform the byte-oriented matching problem to a token-based matching problem. A regex pattern contains one or more segments, and a segment may be subdivided into multiple non-trivial tokens. In general, a token is associated with one or a few regexes only. The input byte-stream is converted into a token-stream using dedicated hardware units, where the number of tokens is much less than the number of bytes. The token-stream is processed by a NFA-based aggregation unit to determine if any segment can be found. Detected segments are further processed by a scoreboard to determine if any multi-segment pattern can be found. For proof-of-concept, our method is implemented on a Virtex-6 FPGA which consumes 1.84 MB on-chip memory.

Research Area(s)

  • Hardware architecture, regular expression matching, string matching, virus detection