HisTrace : Building a search engine of historical Events

Research output: Chapters, Conference Papers, Creative and Literary Works (RGC: 12, 32, 41, 45)32_Refereed conference paper (with ISBN/ISSN)

1 Scopus Citations
View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Original languageEnglish
Title of host publicationProceeding of the 17th International Conference on World Wide Web 2008, WWW'08
Pages1155-1156
Publication statusPublished - 2008

Conference

Title17th International Conference on World Wide Web 2008, WWW'08
PlaceChina
CityBeijing
Period21 - 25 April 2008

Abstract

In this paper, we describe an experimental search engine on our Chinese web archive since 2001. The original data set contains nearly 3 billion Chinese web pages crawled from past 5 years. From the collection, 430 million "article-like" pages are selected and then partitioned into 68 million sets of similar pages. The titles and publication dates are determined for the pages. An index is built. When searching, the system returns related pages in a chronological order. This way, if a user is interested in news reports or commentaries for certain previously happened event, he/she will be able to find a quite rich set of highly related pages in a convenient way.

Research Area(s)

  • Replica detection, Text mining, Web archive

Citation Format(s)

HisTrace : Building a search engine of historical Events. / Huang, HuangLian'en; Zhu, Jonathan J. H.; Li, Xiaoming.

Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08. 2008. p. 1155-1156.

Research output: Chapters, Conference Papers, Creative and Literary Works (RGC: 12, 32, 41, 45)32_Refereed conference paper (with ISBN/ISSN)