Extracting a largest redundancy-free XML storage structure from an acyclic hypergraph in polynomial time

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

5 Scopus Citations
View graph of relations

Author(s)

  • Wai Yin Mok
  • Joseph Fong
  • David W. Embley

Related Research Unit(s)

Detail(s)

Original languageEnglish
Pages (from-to)804-824
Journal / PublicationInformation Systems
Volume35
Issue number7
Publication statusPublished - Nov 2010

Abstract

Given a hypergraph and a set of embedded functional dependencies, we investigate the problem of determining the conditions under which we can efficiently generate redundancy-free XML storage structures with as few scheme trees as possible. Redundancy-free XML structures guarantee both economy in storage space and the absence of update anomalies, and having the least number of scheme trees requires the fewest number of joins to navigate among the data elements. We know that the general problem is intractable. The problem may still be intractable even when the hypergraph is acyclic and each hyperedge is in BoyceCodd normal form (BCNF). As we show here, however, given an acyclic hypergraph with each hyperedge in BCNF, a polynomial-time algorithm exists that generates a largest possible redundancy-free XML storage structure. Successively generating largest possible scheme trees from among hyperedges not already included in generated scheme trees constitutes a reasonable heuristic for finding the fewest possible scheme trees. For many practical cases, this heuristic finds the set of redundancy-free XML storage structures with the fewest number of scheme trees. In addition to a correctness proof and a complexity analysis showing that the algorithm is polynomial, we also give experimental results over randomly generated but appropriately constrained hypergraphs showing empirically that the algorithm is indeed polynomial. © 2010 Elsevier B.V. All rights reserved.

Research Area(s)

  • Acyclic hypergraphs, Large XML storage structures, XML data redundancy, XML-Schema generation

Citation Format(s)

Extracting a largest redundancy-free XML storage structure from an acyclic hypergraph in polynomial time. / Mok, Wai Yin; Fong, Joseph; Embley, David W.

In: Information Systems, Vol. 35, No. 7, 11.2010, p. 804-824.

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review