Safeguard Against Unicode-based Internet Identity Frauds

Project: Research

View graph of relations



The utilization/popularization of Unicode has greatly boosted the internationalization of software systems in general and Internet applications in particular. However, since there are lots of visually/semantically similar characters in the character set of Unicode, Unicode-based obfuscation may potentially cause severe security problems, e.g., Internet identity frauds. It is quite possible that many fraudulent website addresses, email addresses, and usernames can be registered that look very similar to or exactly the same as known ones and hence may confuse readers. In this project the researchers will investigate a series of counter measures to Unicode-based obfuscation (and therefore Unicode-based Internet identity frauds). They will propose a scheme of using different colours to visually differentiate the characters from different languages and hope that it will become a part of the Unicode standards. They will also propose a scheme of evaluating text similarity from both visual and semantic aspects, and at character, word, and document levels, and construct the similar Unicode character index (SUCI) as another standard for quick reference. Based on these two standards, they will build prototype systems and API packages, which will be used for various applications and have a substantial impact in the areas of Unicode-based language processing in general and information security in particular.


Project number9041247
Effective start/end date1/09/078/09/10