MEMORY-Based Hardware Architectures to Detect ClamAV Virus Signatures with Restricted Regular Expression Features

Nga Lam Or, Xing Wang, Derek Pao

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

10 Citations (Scopus)

Abstract

We aim to implement a single-chip hardware detection engine for virus scanning. Our study is based on the ClamAV virus database, which contains 88.9 K strings and 9.6 K extended hex-signatures with restricted regular expression (regex) features. We have previously presented cost-effective hardware architectures to detect the 88.9K strings and 3.2K regex patterns that are composed of multiple string segments. In this paper, we shall present hardware architectures to detect the remaining 6.4 K regex patterns. Our method is based on the information reduction approach. We transform the byte-oriented matching problem to a token-based matching problem. A regex pattern contains one or more segments, and a segment may be subdivided into multiple non-trivial tokens. In general, a token is associated with one or a few regexes only. The input byte-stream is converted into a token-stream using dedicated hardware units, where the number of tokens is much less than the number of bytes. The token-stream is processed by a NFA-based aggregation unit to determine if any segment can be found. Detected segments are further processed by a scoreboard to determine if any multi-segment pattern can be found. For proof-of-concept, our method is implemented on a Virtex-6 FPGA which consumes 1.84 MB on-chip memory.
Original languageEnglish
Article number7115115
Pages (from-to)1225-1238
JournalIEEE Transactions on Computers
Volume65
Issue number4
DOIs
Publication statusPublished - 1 Apr 2016

Research Keywords

  • Hardware architecture
  • regular expression matching
  • string matching
  • virus detection

Fingerprint

Dive into the research topics of 'MEMORY-Based Hardware Architectures to Detect ClamAV Virus Signatures with Restricted Regular Expression Features'. Together they form a unique fingerprint.

Cite this