skip to main content
10.1145/1600193.1600197acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
research-article

Efficient change control of XML documents

Published: 16 September 2009 Publication History

Abstract

XML-based documents play a major role in modern information architectures and their corresponding workflows. In this context, the ability to identify and represent differences between two versions of a document is essential. Several approaches to finding the differences between XML documents have already been proposed. Typically, they are based on tree-to-tree correction, or sequence alignment. Most of these algorithms, however, are too slow and do not support the subsequent merging of changes. In this paper, we present a differencing algorithm tailored to ordered XML documents, called DocTreeDiff. It relies on our context-oriented XML versioning model which allows for document merging, presented in earlier work. An empiric evaluation demonstrates the efficiency of our approach as well as the high quality of the generated deltas.

References

[1]
L. Bergroth, H. Hakonen, and T. Raita. A survey of longest common subsequence algorithms. In SPIRE '00: Proceedings of the 7th International Symposium on String Processing Information Retrieval, page 39, Washington, DC, USA, 2000. IEEE Computer Society.
[2]
M. Bernard, L. Boyer, A. Habrard, and M. Sebban. Learning probabilistic models of tree edit distance. Pattern Recogn., 41(8):2611--2629, 2008.
[3]
P. Bille. A survey on tree edit distance and related problems. Theor. Comput. Sci., 337(1-3):217--239, 2005.
[4]
M. Brauer, R. Weir, and M. McRae. OpenDocument v1.1 specification, 2007.
[5]
S. S. Chawathe and H. Garcia--Molina. Meaningful change detection in structured data. SIGMOD Rec., 26(2):26--37, 1997.
[6]
S. S. Chawathe, A. Rajaraman, H. Garcia-Molina, and J. Widom. Change detection in hierarchically structured information. In SIGMOD '96: Proceedings of the 1996 ACM SIGMOD conference on Management of data, pages 493--504, New York, NY, USA, 1996. ACM.
[7]
W. Chen. New algorithm for ordered tree-to-tree correction problem. Journal of Algorithms, 40(2):135 -- 158, 2001.
[8]
G. Cobéna, S. Abiteboul, and A. Marian. Detecting Changes in XML Documents. In Proceedings of the 18th International Conference on Data Engineering, 26 February -- 1 March 2002, San Jose, CA, pages 41--52. IEEE Computer Society, 2002.
[9]
S. DeRose and J. Clark. XML path language (XPath) version 1.0. W3C recommendation, W3C, Nov. 1999. http://www.w3.org/TR/1999/REC-xpath-19991116.
[10]
R. L. Fontaine. Merging xml files: a new approach providing intelligent merge of xml data sets. In Proceedings of XML Europe 2002, 2002.
[11]
K.-S. Huang, C.-B. Yang, K.-T. Tseng, H.-Y. Ann, and Y.-H. Peng. Efficient algorithms for finding interleaving relationship between sequences. Information Processing Letters, 105(5):188 -- 193, 2008.
[12]
J. Jansson and A. Lingas. A fast algorithm for optimal alignment between similar ordered trees. In CPM '01: Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching, pages 232--240, London, UK, 2001. Springer-Verlag.
[13]
T. Jiang, L. Wang, and K. Zhang. Alignment of trees -- an alternative to tree edit. Theoretical Computer Science, 143(1):137 -- 148, 1995.
[14]
T. Lindholm. A three-way merge for xml documents. In DocEng '04: Proceedings of the 2004 ACM symposium on Document engineering, pages 1--10, New York, NY, USA, 2004. ACM.
[15]
T. Lindholm, J. Kangasharju, and S. Tarkoma. Fast and simple xml tree differencing by sequence alignment. In DocEng '06: Proceedings of the 2006 ACM symposium on Document engineering, pages 75--84, New York, NY, USA, 2006. ACM.
[16]
E. W. Myers. An o(nd) difference algorithm and its variations. Algorithmica, 1:251--266, 1986.
[17]
J. Paoli, I. Valet-Harper, A. Farquhar, and I. Sebestyen. ECMA-376 Office Open XML File Formats, 2006.
[18]
S. Pemberton. XHTML™ 1.0 the extensible hypertext markup language (second edition). W3C recommendation, W3C, Aug. 2002. http://www.w3.org/TR/2002/REC-xhtml1-20020801.
[19]
M. O. Rabin. Fingerprinting by random polynomials. Technical Report TR-CSE-03-01, Center for Research in Computing Technology, Harvard University, 1981.
[20]
S. Rönnau and U. M. Borghoff. Versioning xml-based office documents. Multimedia Tools and Applications, 43(3):253--274, 2009.
[21]
S. Rönnau, C. Pauli, and U. M. Borghoff. Merging changes in xml documents using reliable context fingerprints. In DocEng '08: Proceeding of the eighth ACM symposium on Document engineering, pages 52--61, New York, NY, USA, 2008. ACM.
[22]
S. Rönnau, J. Scheffczyk, and U. M. Borghoff. Towards xml version control of office documents. In DocEng '05: Proceedings of the 2005 ACM symposium on Document engineering, pages 10--19, New York, NY, USA, 2005. ACM.
[23]
L. A. Rosado, A. P. Márquez, and J. M. Gil. Managing branch versioning in versioned/temporal xml documents. In XSym 2007: Proceedings of 5th International XML Database Symposium, pages 107--121, 2007.
[24]
S. M. Selkow. The tree-to-tree editing problem. Inf. Process. Lett., 6(6):184--186, 1977.
[25]
K.-C. Tai. The tree-to-tree correction problem. J. ACM, 26(3):422--433, 1979.
[26]
G. Valiente. An efficient bottom-up distance between trees. In SPIRE, pages 212--219, 2001.
[27]
R. A. Wagner and M. J. Fischer. The string-to-string correction problem. J. ACM, 21(1):168--173, 1974.
[28]
K. Zhang. Efficient parallel algorithms for tree editing problems. In CPM '96: Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching, pages 361--372, London, UK, 1996. Springer-Verlag.
[29]
K. Zhang and D. Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput., 18(6):1245--1262, 1989.
[30]
K. Zhang, J. T.-L. Wang, and D. Shasha. On the editing distance between undirected acyclic graphs and related problems. In CPM '95: Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching, pages 395--407, 1995.

Cited By

View all
  • (2021)Analysis of XML Data Integrity Using Multiple Digest SchemesInformation and Communication Technology for Competitive Strategies (ICTCS 2020)10.1007/978-981-16-0882-7_16(203-214)Online publication date: 6-Jul-2021
  • (2019)The Next Millennium Document FormatProceedings of the ACM Symposium on Document Engineering 201910.1145/3342558.3345419(1-4)Online publication date: 23-Sep-2019
  • (2019)Multi-layered edits for meaningful interpretation of textual differencesProceedings of the ACM Symposium on Document Engineering 201910.1145/3342558.3345406(1-4)Online publication date: 23-Sep-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DocEng '09: Proceedings of the 9th ACM symposium on Document engineering
September 2009
264 pages
ISBN:9781605585758
DOI:10.1145/1600193
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 September 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. XML diff
  2. XML merge
  3. office documents
  4. tree-to-tree correction
  5. version control

Qualifiers

  • Research-article

Conference

DocEng '09
DocEng '09: ACM Symposium on Document Engineering
September 16 - 18, 2009
Munich, Germany

Acceptance Rates

Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Analysis of XML Data Integrity Using Multiple Digest SchemesInformation and Communication Technology for Competitive Strategies (ICTCS 2020)10.1007/978-981-16-0882-7_16(203-214)Online publication date: 6-Jul-2021
  • (2019)The Next Millennium Document FormatProceedings of the ACM Symposium on Document Engineering 201910.1145/3342558.3345419(1-4)Online publication date: 23-Sep-2019
  • (2019)Multi-layered edits for meaningful interpretation of textual differencesProceedings of the ACM Symposium on Document Engineering 201910.1145/3342558.3345406(1-4)Online publication date: 23-Sep-2019
  • (2016)Measuring the quality of diff algorithmsComputer Standards & Interfaces10.1016/j.csi.2015.12.00546:C(52-65)Online publication date: 1-May-2016
  • (2016)UI Tags: Confidentiality in Office Open XMLCyber Security10.1007/978-3-319-28313-5_2(19-33)Online publication date: 8-Jan-2016
  • (2016)Bridging the gap between tracking and detecting changes in XMLSoftware—Practice & Experience10.1002/spe.230546:2(227-250)Online publication date: 1-Feb-2016
  • (2014)TeiCoPhiLib: A Library of Components for the Domain of Collaborative PhilologyJournal of the Text Encoding Initiative10.4000/jtei.1285Online publication date: 28-Dec-2014
  • (2014)Interoperable Document CollaborationProceedings of the 2nd International Workshop on (Document) Changes: modeling, detection, storage and visualization10.1145/2723147.2723155(1-4)Online publication date: 16-Sep-2014
  • (2014)Fine-grained change detection in structured text documentsProceedings of the 2014 ACM symposium on Document engineering10.1145/2644866.2644880(87-96)Online publication date: 16-Sep-2014
  • (2013)Document changesProceedings of the 2013 ACM symposium on Document engineering10.1145/2494266.2494322(281-282)Online publication date: 10-Sep-2013
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media