skip to main content
10.1145/1096601.1096606acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
Article

Towards XML version control of office documents

Published: 02 November 2005 Publication History

Abstract

Office applications such as OpenOffice and Microsoft Office are widely used to edit the majority of today's business documents: office documents. Usually, version control systems consider office documents as binary objects, thus severely hindering collaborative work. Since XML has become a de-facto standard for office applications, we focus on versioning office documents by structured XML version control approaches. This enables state-of-the-art version control for office documents.A basic prerequisite to XML version control is a diff algorithm, which detects structural changes between XML documents. In this paper, we evaluate state-of-the-art XML diff algorithms w.r.t. their suitability to OpenOffice XML documents and the future OASIS office document standard. It turns out that, due to the specific XML office format, a careful examination of the diff algorithm characteristics is necessary. Therefore, we identify important features for XML diff approaches to handle office documents. We have implemented a first OpenOffice versioning API that can be used in version control systems as a replacement for line-based or binary diffs, which are currently used.

References

[1]
D. T. Barnard, G. Clarke, and N. Duncan. Tree-to-tree correction for document trees. Technical report, Queen's University Kingston, Ontario, Canada, January 1995.
[2]
Better SCM initiative. better-scm.berlios.de.
[3]
Bitkeeper: version control system. www.bitkeeper.com.
[4]
U. M. Borghoff and J. H. Schlichter. Computer-Supported Cooperative Work: Introduction to Distributed Applications. Springer-Verlag, 2000.
[5]
P. Cederqvist et al. Version Management with CVS, 2002. www.cvshome.org/docs/manual/.
[6]
S. S. Chawathe, A. Rajaraman, H. Garcia-Molina, and J. Widom. Change detection in hierarchically structured information. In Proc. of the 1996 ACM SIGMOD Int. Conf. on Management of Data, pages 493--504, Montreal, Canada, 1996. ACM Press.
[7]
G. Cobena, S. Abiteboul, and A. Marian. Detecting changes in XML documents. In Proc. of the 18th Int. Conf. on Data Engineering, pages 41--52, San Jose, CA, 2002. IEEE CS Press.
[8]
B. Collins-Sussman, B. W. Fitzpatrick, and C. M. Pilato. Version Control with Subversion. O'Reilly and Associates, 2004. subversion.tigris.org/.
[9]
S. Dekeyser and J. Hidders. Conflict scheduling of transactions on XML documents. In Proc. of the 15th Conf. on Australasian database, pages 93--101, Darlinghurst, Australia, Australia, 2004. Australian Computer Society, Inc.
[10]
J. D. Eisenberg. OpenOffice.org XML Essentials - Using OpenOffice.org's XML Data Format. O'Reilly & Associates, to appear 2005.
[11]
S. C. Gupta, T. N. Nguyen, and E. V. Munson. The software concordance: Using a uniform document model to integrate program analysis and hypermedia. In Proc. of 10th Asia-Pacific Software Engineering Conf., pages 164 -- 173, Chiang Mai, Thailand, 2003. IEEE CS Press.
[12]
B. Krieg-Brückner et al. Multimedia instruction in safe and secure systems. In Recent Trends in Algebraic Development Techniques, volume 2755 of LNCS, pages 82--117. Springer-Verlag, 2003.
[13]
F. Lam, N. Lam, and R. Wong. Efficient synchronization for mobile XML data. In Proc. of the 11th Int. Conf. on Information and Knowledge Management, pages 153--160, New York, NY, 2002. ACM Press.
[14]
T. Lindholm. A three-way merge for XML documents. In Vion-Dury citeproceedingsDocEng04, pages 1--10.
[15]
J. I. Maletic, E. V. Munson, A. Marcus, and T. N. Nguyen. Using a hypertext model for traceability link conformance analysis. In Proc. of the 2nd Int. Wkshp. on Traceability in Emerging Forms of Software Engineering, Montreal, Canada, 2003. IEEE CS Press.
[16]
A. Marian, S. Abiteboul, G. Cobena, and L. Mignet. Change-centric management of versions in an XML warehouse. In Proc. of the 27th Int. Conf. on Very Large Data Bases, pages 581--590, Roma, Italy, 2001. Morgan Kaufmann Publishers Inc.
[17]
A. Mouat. XML diff and patch utilities. Master's thesis, Heriot-Watt University, Edinburgh, Scotland, 2002.
[18]
E. W. Myers. An O(ND) difference algorithm and its variations. Algorithmica, 1(2):251--266, 1986.
[19]
C. Nentwich, W. Emmerich, A. Finkelstein, and E. Ellmer. Flexible consistency checking. ACM Trans. Softw. Eng. Methodol., 12(1):28--63, 2003.
[20]
Sun follows EC recommendation, proposes OpenOffice as ISO standard format. europa.eu.int/idabc/en/document/3308.
[21]
V. Quint and I. Vatton. Techniques for authoring complex XML documents. In Vion-Dury citeproceedingsDocEng04, pages 115--123.
[22]
D. Roundy. Darcs: David's advanced revision control system, 2005. www.darcs.net.
[23]
J. Scheffczyk, U. M. Borghoff, P. Rödig, and L. Schmitz. Managing inconsistent repositories via prioritized repairs. In Vion-Dury citeproceedingsDocEng04, pages 137--146.
[24]
J. Scheffczyk, U. M. Borghoff, P. Rödig, and L. Schmitz. Towards efficient consistency management for informal applications. Int. Journal of Computer & Information Science, 5(2):109--121, 2004.
[25]
C. Stutz, J. Siedersleben, D. Kretschmer, and W. Krug. Analysis beyond UML. In 10th Anniversary IEEE Joint Int. Conf. on Requirements Engineering, pages 215--218, Essen, Germany, 2002. IEEE CS Press.
[26]
J.-Y. Vion-Dury, editor. Proc. of the 2004 ACM Symp. on Document Engineering, Milwaukee, WI, 2004. ACM Press.
[27]
Y. Wang, D. J. DeWitt, and J. Cai. X-Diff: An effective change detection algorithm for XML-documents. In 19th Int. Conf. on Data Engineering, pages 519--530, Bangalore, India, 2003. IEEE CS Press.
[28]
R. K. Wong and N. Lam. Managing and querying multi-version XML data with update logging. In Proc. of the 2002 ACM Symp. on Document Engineering, pages 74--81. ACM Press, 2002.
[29]
R. K. Wong and N. Lam. Efficient re-construction of document versions based on adaptive forward and backward change deltas. In Proc. of 14th Int. Conf. of Database and Expert Systems Applications, volume 2736 of LNCS, pages 266--275, Prague, Czech Republic, 2003. Springer-Verlag.
[30]
XUpdate - XML update language. xmldb-org.sourceforge.net/xupdate/.
[31]
H. Zhang and F. W. Tompa. Querying XML documents by dynamic shredding. In Vion-Dury citeproceedingsDocEng04, pages 21--30.

Cited By

View all
  • (2022)Towards Creative Version ControlProceedings of the ACM on Human-Computer Interaction10.1145/35557566:CSCW2(1-25)Online publication date: 11-Nov-2022
  • (2022)Semantics to the rescue of document‐based XML diff: A JATS case studySoftware: Practice and Experience10.1002/spe.307452:6(1496-1516)Online publication date: 12-Feb-2022
  • (2020)Change Detection on JATS Academic ArticlesProceedings of the ACM Symposium on Document Engineering 202010.1145/3395027.3419581(1-10)Online publication date: 29-Sep-2020
  • Show More Cited By

Index Terms

  1. Towards XML version control of office documents

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DocEng '05: Proceedings of the 2005 ACM symposium on Document engineering
    November 2005
    252 pages
    ISBN:1595932402
    DOI:10.1145/1096601
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 November 2005

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. XML diffing
    2. office applications
    3. version control

    Qualifiers

    • Article

    Conference

    DocEng05
    Sponsor:
    DocEng05: ACM Symposium on Document Engineering
    November 2 - 4, 2005
    Bristol, United Kingdom

    Acceptance Rates

    Overall Acceptance Rate 194 of 564 submissions, 34%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Towards Creative Version ControlProceedings of the ACM on Human-Computer Interaction10.1145/35557566:CSCW2(1-25)Online publication date: 11-Nov-2022
    • (2022)Semantics to the rescue of document‐based XML diff: A JATS case studySoftware: Practice and Experience10.1002/spe.307452:6(1496-1516)Online publication date: 12-Feb-2022
    • (2020)Change Detection on JATS Academic ArticlesProceedings of the ACM Symposium on Document Engineering 202010.1145/3395027.3419581(1-10)Online publication date: 29-Sep-2020
    • (2017)DEXProceedings of the 2017 ACM International Conference on Management of Data10.1145/3035918.3064056(171-186)Online publication date: 9-May-2017
    • (2016)Bridging the gap between tracking and detecting changes in XMLSoftware—Practice & Experience10.1002/spe.230546:2(227-250)Online publication date: 1-Feb-2016
    • (2014)Fine-grained change detection in structured text documentsProceedings of the 2014 ACM symposium on Document engineering10.1145/2644866.2644880(87-96)Online publication date: 16-Sep-2014
    • (2014)Temporal and multi-versioned XML documentsInformation Processing and Management: an International Journal10.1016/j.ipm.2013.08.00350:1(113-131)Online publication date: 1-Jan-2014
    • (2013)RWS-DiffProceedings of the 22nd ACM international conference on Information & Knowledge Management10.1145/2505515.2505763(339-348)Online publication date: 27-Oct-2013
    • (2013)Introduction to the universal delta modelProceedings of the 2013 ACM symposium on Document engineering10.1145/2494266.2494284(47-56)Online publication date: 10-Sep-2013
    • (2013)Improving the reuse of computational models through version controlBioinformatics10.1093/bioinformatics/btt01829:6(742-748)Online publication date: 1-Mar-2013
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media