Abstract
The ability to reliably merge independent updates of a document is a crucial prerequisite to efficient collaboration in office work. However, merge support for common office document standards like OpenDocument or OfficeOpenXML is still in its infancy. In this paper, we present a consistent versioning model for XML documents in general including merge support. This is achieved by using context-aware fingerprints that identify edit operations and allow for a conflict detection. We show how to extract tracked changes from office documents and map them on our delta model. Experimental results indicate that our fingerprinting technique is efficient and reliable.







Similar content being viewed by others
Notes
In terms of GNU diff, v and v′ would be called a hunk.
At least OpenOffice uses such an internal representation. Since Microsoft Office is closed-source, we can only guess.
During the re-implementation of our merge procedure, we were able to increase the speed by a factor of over 50 compared to the first version presented in [24].
By default, we avoid a neighborhood search if the fingerprint matches completely.
The apparent discrepance to over 700 edit operations in the performance evaluation is derived from the fact that our approach has glued the tracked changes on the ODF-level together to a significant lower amount of edit operations on the delta level.
Within ODF documents, so-called soft-page-breaks can be included that indicate a page break at that position to avoid orphans and widow lines in the document view. An edit operation that tries to change a paragraph containing such a soft-page-break would therefore be reported as a conflict. To avoid this, it is possible to extract all soft-page-breaks before delta application, without breaking the document content (see [4]). We omitted this to not distort our test results.
References
Balasubramaniam S, Pierce BC (1998) What is a file synchronizer? In: 4th annual ACM/IEEE int. conference on mobile computing and networking (MobiCom ’98), Dallas, 25–30 October 1998
Boyer J (2001) Canonical XML version 1.0
Boyer JM (2008) Interactive office documents: a new face for web 2.0 applications. In: DocEng ’08: proceedings of the 8th ACM symposium on document engineering. ACM, New York, pp 8–17. doi: http://doi.acm.org/10.1145/1410140.1410145
Brauer M, Weir R, McRae M (2007) OpenDocument v1.1 specification. http://docs.oasis-open.org/office/v1.1/OS/OpenDocument-v1.1.pdf
Chamberlin D, Florescu D, Melton J, Robie J, Siméon J (2008) XQuery update facility 1.0. http://www.w3.org/TR/xquery-update-10
Chawathe SS, Garcia-Molina H (1997) Meaningful change detection in structured data. SIGMOD Rec 26(2):26–37. doi: http://doi.acm.org/10.1145/253262.253266
Clark J, deRose S (1999) XML path language (XPath). Tech. rep., World Wide Web Consortium, http://www.w3.org/TR/xpath
Cobéna G, Abiteboul S, Marian A (2002) Detecting changes in XML documents. In: Proceedings of the 18th international conference on data engineering. 26 February–1 March 2002, San Jose, CA. IEEE Computer Society, Los Alamitos, pp 41–52
Fayzullin M, Subrahmanian VS (2004) An algebra for powerpoint sources. Multimedia Tools Appl 24(3):273–301. doi: http://dx.doi.org/10.1023/B:MTAP.0000039422.87260.52
Fontaine RL (2002) Merging xml files: a new approach providing intelligent merge of xml data sets. In: Proceedings of XML Europe 2002. Barcelona, 20–23 May 2002
FSF (2002) Comparing and merging files. Free Software Foundation, Boston
Ignat CL, Norrie MC (2006) Flexible collaboration over xml documents. In: CDVE, pp 267–274
Khanna S, Kunal K, Pierce BC (2007) A formal investigation of diff3. In: Arvind V, Prasad S (eds) Foundations of software technology and theoretical computer science. Springer, New York
Lam F, Lam N, Wong R (2002) Efficient synchronization for mobile xml data. In: CIKM ’02: proceedings of the eleventh international conference on information and knowledge management. ACM, New York, pp 153–160. doi: http://doi.acm.org/10.1145/584792.584820
Lindholm T (2004) A three-way merge for xml documents. In: DocEng ’04: proceedings of the 2004 ACM symposium on document engineering. ACM, New York, pp 1–10. doi: http://doi.acm.org/10.1145/1030397.1030399
Lindholm T, Kangasharju J, Tarkoma S (2005) A hybrid approach to optimistic file system directory tree synchronization. In: Kumar V, Zaslavsky AB, Cetintemel U, Labrinidis A (eds) MobiDE. ACM, New York, pp 49–56
Lindholm T, Kangasharju J, Tarkoma S (2006) Fast and simple xml tree differencing by sequence alignment. In: DocEng ’06: proceedings of the 2006 ACM symposium on document engineering. ACM, New York, pp 75–84. doi: http://doi.acm.org/10.1145/1166160.1166183
Marian A, Abiteboul S, Cobéna G, Mignet L (2001) Change-centric management of versions in an XML warehouse. VLDB J 581–590
Maruyama H, Tamura K, Uramoto N (2000) Digest values for dom (domhash)
Mens T (2002) A state-of-the-art survey on software merging. IEEE Trans Softw Eng 28(5):449–462
Neuwirth CM, Chandhok R, Kaufer DS, Erion P, Morris J, Miller D (1992) Flexible diff-ing in a collaborative writing system. In: CSCW ’92: proceedings of the 1992 ACM conference on computer-supported cooperative work. ACM, New York, pp 147–154. doi:10.1145/143457.143473
Paoli J, Valet-Harper I, Farquhar A, Sebestyen I (2006) ECMA-376 office open XML file formats. http://www.ecma-international.org/publications/standards/Ecma-376.htm
Rönnau S, Scheffczyk J, Borghoff UM (2005) Towards xml version control of office documents. In: DocEng ’05: proceedings of the 2005 ACM symposium on document engineering. ACM, New York, pp 10–19. doi:10.1145/1096601.1096606
Rönnau S, Pauli C, Borghoff UM (2008) Merging changes in xml documents using reliable context fingerprints. In: DocEng ’08: proceedings of the 8th ACM symposium on document engineering. ACM, New York, pp 52–61. doi:10.1145/1410140.1410151
Rosado LA, Márquez AP, Gil JM (2007) Managing branch versioning in versioned/temporal xml documents. In: Barbosa D, Bonifati A, Bellahsene Z, Hunt E, Unland R (eds) XSym, Lecture notes in computer science, vol 4704. Springer, New York, pp 107–121
Tatarinov I, Ives ZG, Halevy AY, Weld DS (2001) Updating xml. In: SIGMOD ’01: proceedings of the 2001 ACM SIGMOD international conference on management of data. ACM, New York, pp 413–424. doi:10.1145/375663.375720
Acknowledgements
The authors would like to thank their students Geraint Philipp and Maik Teupel, who showed exceptional enthusiasm when implementing parts of the tool-set presented in this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rönnau, S., Borghoff, U.M. Versioning XML-based office documents. Multimed Tools Appl 43, 253–274 (2009). https://doi.org/10.1007/s11042-009-0271-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-009-0271-2