Abstract
XML-based documents play a major role in modern information architectures and their corresponding workflows. In this context, the ability to identify and represent differences between two versions of a document is essential, as well as the merging of document versions resulting from parallel editing processes.
Different approaches try to meet these challenges using operational transformation or document annotation. In both approaches, the changes are tracked during editing, which requires corresponding editing applications. In the context of software development, however, a state-based approach is common. Here, versions are compared and merged using external tools, called diff and patch. This allows the users for editing documents without being tightened to editing tools. Approaches exist that are able to compare XML documents, but lack a corresponding merge capability.
In this article, we present a comprehensive framework that allows for comparing and merging of XML documents using a state-based approach. Its design is based on an analysis of XML documents and their modification patterns. The framework is built on top of a context-oriented delta model. We present a diff algorithm that appears to be highly efficient in terms of speed and delta quality. The patch algorithm is able to merge document versions efficiently and reliably. The efficiency and the reliability of our approach are verified using a competitive test scenario.
Similar content being viewed by others
References
Brauer M, Weir R, McRae M (2007) OpenDocument v1.1 specification. URL http://docs.oasis-open.org/office/v1.1/OS/OpenDocument-v1.1.pdf
Paoli J, Valet-Harper I, Farquhar A, Sebestyen I (2006) ECMA-376 office open XML file formats. URL http://www.ecma-international.org/publications/standards/Ecma-376.htm
Pemberton S (2002) XHTML™1.0 the extensible hypertext markup language (second edition). http://www.w3.org/TR/2002/REC-xhtml1-20020801
Selkow SM (1977) The tree-to-tree editing problem. Inf Process Lett 6(6):184–186 doi:10.1016/0020-0190(77)90064-3
Rönnau S, Pauli C, Borghoff UM (2008) Merging changes in XML documents using reliable context fingerprints. In: DocEng ’08: Proc of the 8th ACM symp on document engineering. ACM, New York, pp 52–61. doi:http://doi.acm.org/10.1145/1410140.1410151
Rönnau S, Borghoff UM (2009) Versioning XML-based office documents. Multimed Tools Appl 43(3):253–274. doi:http://dx.doi.org/10.1007/s11042-009-0271-2
Rönnau S, Philipp G, Borghoff UM (2009) Efficient change control of XML documents. In: DocEng ’09: Proc of the 9th ACM symp on document engineering. ACM, New York, pp 3–12
Wagner RA, Fischer MJ (1974) The string-to-string correction problem. J ACM 21(1):168–173. doi:http://doi.acm.org/10.1145/321796.321811
Bergroth L, Hakonen H, Raita T (2000) A survey of longest common subsequence algorithms. In: SPIRE ’00: Proc of the 7th int symp on string processing information retrieval. IEEE Computer Society, Washington, p 39
Tai KC (1979) The tree-to-tree correction problem. J ACM 26(3):422–433. doi:http://doi.acm.org/10.1145/322139.322143
Bille P (2005) A survey on tree edit distance and related problems. Theor Comput Sci 337(1-3):217–239. doi:http://dx.doi.org/10.1016/j.tcs.2004.12.030
Chawathe SS, Garcia-Molina H (1997) Meaningful change detection in structured data. SIGMOD Rec 26(2):26–37. doi:http://doi.acm.org/10.1145/253262.253266
DeRose S, Clark J XML path language (XPath) version 1.0 (1999). http://www.w3.org/TR/1999/REC-xpath-19991116
Chen W (2001) New algorithm for ordered tree-to-tree correction problem. J Algorithms 40(2):135–158. doi:10.1006/jagm.2001.1170
Chawathe SS, Rajaraman A, Garcia-Molina H, Widom J (1996) Change detection in hierarchically structured information. In: SIGMOD ’96: Proc of the 1996 ACM int conf on management of data. ACM, New York, pp 493–504. doi:http://doi.acm.org/10.1145/233269.233366
Fetterly D, Manasse M, Najork M, Wiener J (2003) A large-scale study of the evolution of web pages. In: WWW ’03: Proc of the 12th int conf on world wide web. ACM, New York, pp 669–678. doi:http://doi.acm.org/10.1145/775152.775246
Ntoulas A, Cho J, Olston C (2004) What’s new on the web?: the evolution of the web from a search engine perspective. In: WWW ’04: Proc of the 13th int conf on world wide web. ACM, New York, pp 1–12. doi:http://doi.acm.org/10.1145/988672.988674
Adar E, Teevan J, Dumais ST, Elsas JL (2009) The web changes everything: understanding the dynamics of web content. In: WSDM ’09: Proc of the 2nd ACM int conf on web search and data mining. ACM, New York, pp 282–291. doi:http://doi.acm.org/10.1145/1498759.1498837
Davison W (1990) Unified context diff tools. vol 14, Issue 70 of comp.sources.misc
Mouat A (2002) XML diff and patch utilities. Master’s thesis, Heriot-Watt University, Edinburgh Scotland
Myers EW (1986) An o(nd) difference algorithm and its variations. Algorithmica 1:251–266
Zhang K (1996) Efficient parallel algorithms for tree editing problems. In: CPM ’96: Proc of the 7th ann symp on combinatorial pattern matching. Springer, London, pp 361–372
Khanna S, Kunal K, Pierce BC (2007) A formal investigation of diff3. In: Prasad A (ed) Foundations of software technology and theoretical computer science (FSTTCS)
Zhang K, Shasha D (1989) Simple fast algorithms for the editing distance between trees and related problems. SIAM J Comput 18(6):1245–1262. doi:http://dx.doi.org/10.1137/0218082
Rönnau S, Scheffczyk J, Borghoff UM (2005) Towards XML version control of office documents. In: DocEng ’05: Proc of the 2005 ACM symp on document engineering. ACM, New York, pp 10–19. doi:http://doi.acm.org/10.1145/1096601.1096606
Cobéna G, Abiteboul S, Marian A (2002) Detecting changes in XML documents. In: Proc of the 18th int conf on data engineering. IEEE Computer Society, Los Alamitos, pp 41–52
Lindholm T, Kangasharju J, Tarkoma S (2006) Fast and simple XML tree differencing by sequence alignment. In: DocEng ’06: Proc of the 2006 ACM symp on document engineering. ACM, New York, pp 75–84. doi:http://doi.acm.org/10.1145/1166160.1166183
Lindholm T (2004) A three-way merge for XML documents. In: DocEng ’04: Proc of the 2004 ACM symp on document engineering. ACM, New York, pp 1–10. doi:http://doi.acm.org/10.1145/1030397.1030399
Fontaine RL (2002) Merging XML files: a new approach providing intelligent merge of XML data sets. In: Proc of XML Europe 2002
Rosado LA, Márquez AP, Gil JM (2007) Managing branch versioning in versioned/temporal XML documents. In: Barbosa D, Bonifati A, Bellahsene Z, Hunt E, Unland R (eds) XSym. Lecture notes in computer science, vol 4704. Springer, Berlin, pp 107–121
Sun C, Ellis C (1998) Operational transformation in real-time group editors: issues, algorithms, and achievements. In: CSCW ’98: Proc of the 1998 ACM conf on computer supported cooperative work. ACM, New York, pp 59–68. doi:http://doi.acm.org/10.1145/289444.289469
Molli P, Skaf-Molli H, Oster G, Jourdain S (2002) Sams: synchronous, asynchronous, multi-synchronous environments. pp 80–84. doi:10.1109/CSCWD.2002.1047653
Wang D, Mah A (2009) Google wave operational transformation. http://www.waveprotocol.org/whitepapers/operational-transform
Ignat CL, Oster G (2008) Peer-to-peer collaboration over XML documents. In: CDVE ’08: Proc of the 5th int conf on cooperative design, visualization, and engineering. Springer, Berlin/Heidelberg, pp 66–73
Sun C, Sosič R (1999) Optimal locking integrated with operational transformation in distributed real-time group editors. In: PODC ’99: Proc. of the 18th ann ACM symp on principles of distributed computing. ACM, New York, pp 43–52. doi:http://doi.acm.org/10.1145/301308.301322
Mens T (2002) A state-of-the-art survey on software merging. IEEE Trans Softw Eng 28(5):449–462
Gropengießer F, Hose K, Sattler KU (2009) An extended transaction model for cooperative authoring of XML data. Comput Sci Res Dev 24(1–2):85–100
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rönnau, S., Borghoff, U.M. XCC: change control of XML documents. Comput Sci Res Dev 27, 95–111 (2012). https://doi.org/10.1007/s00450-010-0140-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00450-010-0140-2