Dash : A novel search engine for database-generated dynamic web pages

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

2 Scopus Citations
View graph of relations

Author(s)

  • Ken C.K. Lee
  • Kanchan Bankar
  • Baihua Zheng
  • Chi-Yin Chow
  • Honggang Wang

Related Research Unit(s)

Detail(s)

Original languageEnglish
Title of host publicationProceedings - International Conference on Distributed Computing Systems
Pages435-444
Publication statusPublished - 2012

Conference

Title32nd IEEE International Conference on Distributed Computing Systems, ICDCS 2012
PlaceChina
CityMacau
Period18 - 21 June 2012

Abstract

Database-generated dynamic web pages (db-pages, in short), whose contents are created on the fly by web applications and databases, are now prominent in the web. However, many of them cannot be searched by existing search engines. Accordingly, we develop a novel search engine named Dash, which stands for Db-pAge SearcH, to support db-page search. Dash determines db-pages possibly generated by a target web application and its database through exploring the application code and the related database content and supports keyword search on those db-pages. In this paper, we present its system design and focus on the efficiency issue. To minimize costs incurred for collecting, maintaining, indexing and searching a massive number of db-pages that possibly have overlapped contents, Dash derives and indexes db-page fragments in place of db-pages. Each db-page fragment carries a disjointed part of a db-page. To efficiently compute and index db-page fragments from huge datasets, Dash is equipped with MapReduce based algorithms for database crawling and db-page fragment indexing. Besides, Dash has a top-k search algorithm that can efficiently assemble db-page fragments into db-pages relevant to search keywords and return the k most relevant ones. The performance of Dash is evaluated via extensive experimentation. © 2012 IEEE.

Research Area(s)

  • Database crawling, Database-generated dynamic web pages, Hadoop and performance, Indexing, Mapreduce, Search engine, Top-k search

Citation Format(s)

Dash: A novel search engine for database-generated dynamic web pages. / Lee, Ken C.K.; Bankar, Kanchan; Zheng, Baihua et al.
Proceedings - International Conference on Distributed Computing Systems. 2012. p. 435-444 6258016.

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review