OpenOffice.org XML integration for Zope/CMF

This presentation displays how to make use of the OpenOffice.org XML File Format from Python and Zope. And how OpenOffice.org can be used as full featured web authoring tool with Zope.

epLogo ooLogo strukturLogo

Charleroi, Belgium - 27. June 2003

-- Simon Eisenmann, struktur AG
   mailto:simon@struktur.de

Overview

  • The OpenOffice.org document file format
  • Available XML/XSLT processors from a python view
  • The xml.openoffice.org website
  • Current solutions to integrate OpenOffice.org with Zope
  • OpenOffice.org integration with Zope in the future
-- all stuff mentioned here related to 
   OpenOffice.org applies to 
   StarOffice 6 as well

First Question

How many of you use Microsoft Office?

Second Question

How many of you use OpenOffice.org or StarOffice?

Third Question

How many of you think they would switch from Microsoft Office to OpenOffice.org?

The OpenOffice.org document file format

  • All OpenOffice.org applications use same XML-based file format
  • All applications (except Math) use the same format as defined in specification
  • The Math component uses the openoffice package structure and format, but uses MathML inside the package

Core Requirements (these items are absolutely required)

  1. The file format must be capable of being used as an office program's native file format. The format must be "non-lossy" and must support (at least) the full capability of a StarOffice/OpenOffice document. The format is likely to be used for document interchange but that use alone is not enough.
  2. Structured content should make use of XML's structuring capabilities and be represented in terms of XML elements and attributes.
  3. The file format must be fully documented and have no "secret" features.
  4. OpenOffice must be the reference implementation for this file format.

Core Goals (these items are highly desired)

  1. The file format should be developed in such a way that it will be accepted by the community and can be placed under community control for future development and format evolution.
  2. The file formats should be suitable for all office types: text processing, spreadsheet, presentation, drawing, charting, and math.
  3. The file formats should reuse portions of each other as much as possible (so for example a spreadsheet table definition can work also as a text processing table definition).

Package Format

  • Well known ZIP file format
  • XML-based manifest (describes package content)
  • Single Document is splitted into several streams

Package content streams

File Summary
meta.xml information about the document (author, time of last save, ...)
styles.xml styles that are used in the document
content.xml main document content (text, tables, graphical elements)
settings.xml document and view settings
META-INF/ manifest.xml provides additional information about the other files (such as MIME type or encrpytion method)
Pictures/ directory containing images (in their native, binary formats)
Dialogs/ directory containing dialogs used by document macros
Basic/ directory containing StarBasic macros
obj.../ directories containing embedded objects

The DTD

  • Very well documented XML format
  • Split into lots of several small parts

http://xml.openoffice.org/source/browse/xml/xmloff/dtd/

Enhancing the format

Alien attributes, i.e. attributes not defined in the OpenOffice.org DTD, will be preserved if they are attached to <style:properties> elements in style definitions. All other alien content will be discarded by the OpenOffice.org import filters. Since you can attach styles to arbitrary text ranges, you can use this mechanism to attach your information to arbitrary text ranges, too.

Note: The above mechanism seems to only work in Writer. The issue is under investigation.

It is planned that you can also put additional files with your own content into the packages. However, this doesn't work yet.

Available XML/XSLT processors from a python view

The following xslt processors can be used from python, by their Python binding.

The Microsoft Office Problem

  • Lot's of users use Microsoft's Office Suite
  • Microsoft's Office Format cant be used for xml transformation
  • XML Support in MS Office 2003 will only be available for enterprice customers
  • MS Office Documents have to be converted into usable formats

The OpenOffice.org API (UNO)

  • All OpenOffice.org functionality is available through the API
  • Python module (pyUNO)
  • Can be used to convert Microsoft Office Documents to OpenOffice.org
  • Hard to install, yet
  • With the API OpenOffice.org can be used as server side processing tool

The xml.openoffice.org website

  • Home of all OpenOffice.org xml releated stuff
  • Lots of documentation
  • Public cvs repository

ooSourceLogo http://xml.openoffice.org

Filters #1

Type Summary
DocBook filter Allows you to load and save DocBook files with OpenOffice.org through XSLT. The filter is still in alpha state and does not support all DocBook elements.
Eric Bellot's DocBook converter This is another approach at translating between OpenOffice.org XML and DocBook. Eric Bellot uses Python and XSLT to perform the transformation. With examples and description
HTML ( + WML, Palm compatible HTML) An elaborate XSLT transformation which renders OpenOffice.org documents in XSLT. There is (limited) support for WML built-in as well. (sx2ml)
Writer2LaTeX Henrik Just's Writer2LaTeX converter is a command line utility which converts OpenOffice.org documents to LaTeX. It is written in Java

Filters #2

Type Summary
Flat XML Filter The 'flat' XML filter lets you read and write office documents in plain XML files, i.e. without ZIP packages
OOO2txt Frederic Labbe's OOO2txt tool generates a plain text representation of OpenOffice.or documents
sxw2html Dicky Wahyu Purnomo's sxw to HTML converter
sxw2txt Dicky Wahyu Purnomo's sxw to ASCII converter
libwpd William Lachance's libwpd and WordPerfect filter for OpenOffice.org Writer
OOo to HTML Steve Slaven's StarOffice/OpenOffice.org to HTML converter Uses XSLTproc and ImageMagick

http://xml.openoffice.org/filters.html

sx2ml XSLT

  • One of the most complete available XSLT filters
  • produces XHTML including css style sheet
  • Only has little problems with complex tables
  • Developed for use with Java XML Processors but most functions work with Python as well
  • Active developement and public cvs
  • Sun Microsystems Inc. actively push development
  • GNU Lesser General Public License Version 2.1 or Sun Industry Standards Source License Version 1.1

http://xml.openoffice.org/sx2ml/

A simple Zope example

  • OpenOffice.org Documents can be used in lot of different ways

http://www.zopelabs.com/cookbook/1043777422

Current solutions to integrate OpenOffice.org with Zope

There are a couple of ZOPE Products available, which rely on OpenOffice.org

  • Archetypes
  • ZOODocument
  • CMFOODocument

Archetypes

Archetypes is a content-type creating framework based on schema definitions

  • PyUNO (OpenOffice.org Python API) integration
  • Uses OpenOffice.org to convert Microsoft Office docs to html

http://sf.net/projects/archetypes

ZOODocument

Allows uploading OpenOffice.org documents to Zope. The uploaded Docs are rendered to html by using xsl transformations.

  • Native Zope Product
  • OpenOffice.org to html transformation by using xsl
  • Relies on XMLTransform product to access xml processors

http://www.zope.org/Members/philikon/ZooDocument, http://www.zope.org/Members/arielpartners/XMLTransform

CMFOODocument

Allows uploading OpenOffice.org documents to CMF/Plone/icoya. The uploaded Docs are rendered to html by using sx2ml xslt stylesheets.

  • CMF Product
  • OpenOffice.org to html transformation by using sx2ml stylesheets from xml.openoffice.org
  • Extracts, scales and stores included images to ZODB
  • Relies on libxml2/libxslt python bindings
  • Dublincore Metadata support

http://www.zope.org/Members/longsleep/CMFOODocument http://www.icoya.de/support/download_area/zope/CMFOODocument

Using OpenOffice.org as content authoring tool

  • CMFOODocument pluggs into CMF/Plone/icoya
  • ExternalEditor works with OpenOffice.org (NOTE: OpenOffice.org or its quicklauch must not run in background)
  • WebDAV integration also works pretty well

Test CMFOODocument on http://demo.plone.org

OpenOffice.org + Zope in the future

  • CMFOODocument currently is implemeted using Archetypes
  • Upcoming release of CMFOODocument will provide an ooo2html transformer python
  • New products can easily make use of OpenOffice.org document
  • sx2ml is getting more complete day by day.

Questions?

If someone got a question .. please feel free to ask or write me an email to simon@struktur.de

Thank you.

strukturLogo

Copyright Simon Eisenmann http://longsleep.org/ - License under Creative Commons License - valid: xhtml, css

powered by icoya