Refereed Papers
Track: XML II
Paper Title:
Utility-driven Load Shedding for XML Stream Processing
Authors:
- Mingzhu Wei(Worcester Polytechnic Institute)
- Elke A. Rundensteiner(Worcester Polytechnic Institute)
- Murali Mani(Worcester Polytechnic Institute)
Abstract:
Because of the high volume and unpredictable arrival rate, stream
processing systems may not always be able to keep up with the input
data streams--- resulting in buffer overflow and uncontrolled loss
of data. Load shedding, the prevalent strategy for solving this
overflow problem, has so far only been considered for relational
stream processing, but not for XML. Shedding applied to XML stream
processing brings new opportunities and challenges due to complex
nested nature of XML structures. In this paper, we tackle this
unsolved XML shedding problem using a three-pronged approach. First,
we develop an XQuery preference model that enables users to specify
the relative importance of preserving different subpatterns in the
XML result structure. This transforms shedding into the problem of
rewriting the user query into shed queries that return approximate
query answers with utility as measured by the given user preference
model. Second, we develop a cost model to compare the performance of
alternate shed queries. Third, we develop two shedding algorithms,
OptShed and FastShed. OptShed guarantees to find an optimal solution
however at the cost of exponential complexity. FastShed, as
confirmed by our experiments, achieves a close-to-optimal result in
a wide range of test cases. Finally we describe the in-automaton
shedding mechanism for XQuery stream engines. The experiments show
that our proposed utility-driven shedding solutions consistently
achieve higher utility results compared to the existing relational
shedding techniques.
Inquiries can be sent to: