skip to main content
10.1145/585058.585077acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
Article

Mapping and displaying structural transformations between XML and PDF

Published: 08 November 2002 Publication History

Abstract

Documents are often marked up in XML-based tagsets to delineate major structural components such as headings, paragraphs, figure captions and so on, without much regard to their eventual displayed appearance. And yet these same abstract documents, after many transformations and 'typesetting' processes, often emerge in the popular format of Adobe PDF, either for dissemination or archiving.Until recently PDF has been a totally display-based document representation, relying on the underlying PostScript semantics of PDF. Early versions of PDF had no mechanism for retaining any form of abstract document structure but recent releases have now introduced an internal structure tree to create the so called 'Tagged PDF'.This paper describes the development of a plugin for Adobe Acrobat which creates a two-window display. In one window is shown an XML document original and in the other its Tagged PDF counterpart is seen, with an internal structure tree that, in some sense, matches the one seen in XML. If a component is highlighted in either window then the corresponding structured item, with any attendant text, is also highlighted in the other window.Important applications of correctly Tagged PDF include making PDF documents reflow intelligently on small screen devices and enabling them to be read out in correct reading order, via speech synthesiser software, for the visually impaired. By tracing structure transformation from source document to destination one can implement the repair of damaged PDF structure or the adaptation of an existing structure tree to an incrementally updated document.

References

[1]
Adobe Systems Incorporated, PDF Reference (Second Edition) version 1.3, ISBN 0-201-61588-6, Addison-Wesley, July 2000.
[2]
Adobe Systems Incorporated, PDF Reference (Third Edition) version 1.4, ISBN 0-201-75839-3, Addison-Wesley, December 2001.
[3]
David F. Brailsford, "Separable hyperstructure and delayed link binding," ACM Computing Surveys, vol. 31, no. 4es, December 1999. http://doi.acm.org/10.1145/345966.346029
[4]
Kenneth Brooks, "A two-view document editor with user-definable document structure," DEC Research Report No. 33, November 1988. Available online via ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-033.pdf
[5]
Donald D. Chamberlin, James C. King, Donald R. Slutz, Stephen J. Todd, and Bradford W. Wade, "JANUS: An interactive formatter based on declarative tags" IBM Systems Journal, vol. 21, no. 3, pp. 250--271, 1982.
[6]
Donald D. Chamberlin, H.F. Hasselmeier, A. W. Luniewski, D.P. Paris, B. W. Wade, and M. L. Zolliker, "Quill: An extensible system for editing documents of mixed type," in Proc. 21st Hawaii Int. Conf. on System Sciences, pp. 317--326, IEEE Computer Society Press, April 1988.
[7]
The Document Object Model (DOM). http://www.w3c.org/TR/2000/REC-DOMLevel-2-Core-20001113/
[8]
W.S. Lovegrove and D. F. Brailsford, "Document Analysis of PDF Files: Methods, Results and Implications," Electronic Publishing-Origination, Dissemination and Design, vol. 8, no. 2 & 3, pp. 207--220, June & September 1995.
[9]
Vincent Quint and Irène Vatton, "Grif: An interactive system for document structure manipulation," in Proceedings International Conference on Text Processing and Document Manipulation, ed. J. C. van Vliet, pp. 200--213, Cambridge University Press, April 1986.
[10]
Namespaces in XML. http://www.w3c.org/TR/1999/REC-xml-names-19990114/
[11]
Philip N. Smith, David F. Brailsford, David R. Evans, Leon Harrison, Steve G. Probets, and Peter E. Sutton, "Journal Publishing with Acrobat: the CAJUN project," Electronic Publishing - Origination, Dissemination and Design, vol. 6, no. 4, pp. 481--493, December 1993. http://cajun.cs.nott.ac.uk/compsci/epo/papers/epoddtoc.html
[12]
The treediff project. http://www.alphaworks.ibm.com/tech/xmltreediff

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DocEng '02: Proceedings of the 2002 ACM symposium on Document engineering
November 2002
168 pages
ISBN:1581135947
DOI:10.1145/585058
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 November 2002

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. PDF
  2. XML
  3. document structure transformation

Qualifiers

  • Article

Conference

DocEng02

Acceptance Rates

DocEng '02 Paper Acceptance Rate 21 of 46 submissions, 46%;
Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2015)Towards automated assessment of students' preliminary thesis submissions2015 13th International Conference on Emerging eLearning Technologies and Applications (ICETA)10.1109/ICETA.2015.7558513(1-6)Online publication date: Nov-2015
  • (2010)Lessons from the dragonProceedings of the 10th ACM symposium on Document engineering10.1145/1860559.1860573(65-68)Online publication date: 21-Sep-2010
  • (2009)Engineering Information Into Open DocumentsOpen Information Management10.4018/978-1-60566-246-6.ch002(9-19)Online publication date: 2009
  • (2008)Tracking sub-page components in document workflowsProceedings of the eighth ACM symposium on Document engineering10.1145/1410140.1410156(86-89)Online publication date: 16-Sep-2008
  • (2008)Development of the XML Digital Library from the Parliament of Andalucía for Intelligent Structured RetrievalFoundations of Intelligent Systems10.1007/978-3-540-68123-6_45(417-423)Online publication date: 2008
  • (2007)The Mars projectProceedings of the 2007 ACM symposium on Document engineering10.1145/1284420.1284461(161-170)Online publication date: 28-Aug-2007
  • (2005)Enhancing composite digital documents using XML-based standoff markupProceedings of the 2005 ACM symposium on Document engineering10.1145/1096601.1096647(177-186)Online publication date: 2-Nov-2005
  • (2004)Creating structured PDF files using XML templatesProceedings of the 2004 ACM symposium on Document engineering10.1145/1030397.1030418(99-108)Online publication date: 28-Oct-2004

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media