Skip to main navigation Skip to search Skip to main content

Text style analysis using trace ratio criterion patch alignment embedding

  • Peng Tang
  • , Mingbo Zhao
  • , Tommy W.S. Chow

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

An effective algorithm for extracting cues of text styles is proposed in this paper. When processing document collections, the documents are first converted to a high dimensional data set with the assistant of a group of style markers. We also employ the Trace Ratio Criterion Patch Alignment Embedding (TR-PAE) to obtain lower dimensional representation in a textual space. The TR-PAE has some advantages that the inter-class separability and intra-class compactness are well characterized by the special designed intrinsic graph and penalty graph, which are based on discriminative patch alignment strategy. Another advantage is that the proposed method is based on trace ratio criterion, which directly represents the average between-class distance and average within-class distance in the low-dimensional space. To evaluate our proposed algorithm, three corpuses are designed and collected using existing popular corpuses and real-life data covering diverse topics and genres. Extensive simulations are conducted to illustrate the feasibility and effectiveness of our implementation. Our simulations demonstrate that the proposed method is able to extract the deeply hidden information of styles of given documents, and efficiently conduct reliable text analysis results on text styles can be provided. © 2014 Elsevier B.V.
Original languageEnglish
Pages (from-to)201-212
JournalNeurocomputing
Volume136
DOIs
Publication statusPublished - 20 Jul 2014

Research Keywords

  • Style markers
  • Text clustering
  • Text style analysis
  • Trace ratio criterion patch alignment embedding

Fingerprint

Dive into the research topics of 'Text style analysis using trace ratio criterion patch alignment embedding'. Together they form a unique fingerprint.

Cite this