OpenOffice.org XML integration for Zope/CMF

Charleroi, Belgium - 27. June 2003 -- Simon Eisenmann, struktur AG mailto:simon@struktur.de
- Overview
- First Question
- Second Question
- Third Question
- The OpenOffice.org document file format
- Core Requirements (these items are absolutely required)
- Core Goals (these items are highly desired)
- Package Format
- Package content streams
- The DTD
- Enhancing the format
- Available XML/XSLT processors from a python view
- The Microsoft Office Problem
- The OpenOffice.org API (UNO)
- The xml.openoffice.org website
- Filters #1
- Filters #2
- sx2ml XSLT
- A simple Zope example
- Current solutions to integrate OpenOffice.org with Zope
- Archetypes
- ZOODocument
- CMFOODocument
- Using OpenOffice.org as content authoring tool
- OpenOffice.org + Zope in the future
- Questions?
Overview
- The OpenOffice.org document file format
- Available XML/XSLT processors from a python view
- The xml.openoffice.org website
- Current solutions to integrate OpenOffice.org with Zope
- OpenOffice.org integration with Zope in the future
-- all stuff mentioned here related to OpenOffice.org applies to StarOffice 6 as well
First Question
How many of you use Microsoft Office?
Second Question
How many of you use OpenOffice.org or StarOffice?
Third Question
How many of you think they would switch from Microsoft Office to OpenOffice.org?
The OpenOffice.org document file format
- All OpenOffice.org applications use same XML-based file format
- All applications (except Math) use the same format as defined in specification
- The Math component uses the openoffice package structure and format, but uses MathML inside the package
Core Requirements (these items are absolutely required)
- The file format must be capable of being used as an office program's native file format. The format must be "non-lossy" and must support (at least) the full capability of a StarOffice/OpenOffice document. The format is likely to be used for document interchange but that use alone is not enough.
- Structured content should make use of XML's structuring capabilities and be represented in terms of XML elements and attributes.
- The file format must be fully documented and have no "secret" features.
- OpenOffice must be the reference implementation for this file format.
Core Goals (these items are highly desired)
- The file format should be developed in such a way that it will be accepted by the community and can be placed under community control for future development and format evolution.
- The file formats should be suitable for all office types: text processing, spreadsheet, presentation, drawing, charting, and math.
- The file formats should reuse portions of each other as much as possible (so for example a spreadsheet table definition can work also as a text processing table definition).
Package Format
- Well known ZIP file format
- XML-based manifest (describes package content)
- Single Document is splitted into several streams
Package content streams
| File | Summary |
|---|---|
| meta.xml | information about the document (author, time of last save, ...) |
| styles.xml | styles that are used in the document |
| content.xml | main document content (text, tables, graphical elements) |
| settings.xml | document and view settings |
| META-INF/ manifest.xml | provides additional information about the other files (such as MIME type or encrpytion method) |
| Pictures/ | directory containing images (in their native, binary formats) |
| Dialogs/ | directory containing dialogs used by document macros |
| Basic/ | directory containing StarBasic macros |
| obj.../ | directories containing embedded objects |
The DTD
- Very well documented XML format
- Split into lots of several small parts
Enhancing the format
Alien attributes, i.e. attributes not defined in the OpenOffice.org DTD, will be preserved if they are attached to <style:properties> elements in style definitions. All other alien content will be discarded by the OpenOffice.org import filters. Since you can attach styles to arbitrary text ranges, you can use this mechanism to attach your information to arbitrary text ranges, too.
Note: The above mechanism seems to only work in Writer. The issue is under investigation.
It is planned that you can also put additional files with your own content into the packages. However, this doesn't work yet.
Available XML/XSLT processors from a python view
The following xslt processors can be used from python, by their Python binding.
- 4Suite http://www.4suite.org
- Libxml2/Libxslt http://www.xmlsoft.org
- Sablotron http://www.gingerall.org/charlie/ga/xml/p_sab.xml
- Pyana http://pyana.sourceforge.net
The Microsoft Office Problem
- Lot's of users use Microsoft's Office Suite
- Microsoft's Office Format cant be used for xml transformation
- XML Support in MS Office 2003 will only be available for enterprice customers
- MS Office Documents have to be converted into usable formats
The OpenOffice.org API (UNO)
- All OpenOffice.org functionality is available through the API
- Python module (pyUNO)
- Can be used to convert Microsoft Office Documents to OpenOffice.org
- Hard to install, yet
- With the API OpenOffice.org can be used as server side processing tool
The xml.openoffice.org website
- Home of all OpenOffice.org xml releated stuff
- Lots of documentation
- Public cvs repository
Filters #1
| Type | Summary |
|---|---|
| DocBook filter | Allows you to load and save DocBook files with OpenOffice.org through XSLT. The filter is still in alpha state and does not support all DocBook elements. |
| Eric Bellot's DocBook converter | This is another approach at translating between OpenOffice.org XML and DocBook. Eric Bellot uses Python and XSLT to perform the transformation. With examples and description |
| HTML ( + WML, Palm compatible HTML) | An elaborate XSLT transformation which renders OpenOffice.org documents in XSLT. There is (limited) support for WML built-in as well. (sx2ml) |
| Writer2LaTeX | Henrik Just's Writer2LaTeX converter is a command line utility which converts OpenOffice.org documents to LaTeX. It is written in Java |
Filters #2
| Type | Summary |
|---|---|
| Flat XML Filter | The 'flat' XML filter lets you read and write office documents in plain XML files, i.e. without ZIP packages |
| OOO2txt | Frederic Labbe's OOO2txt tool generates a plain text representation of OpenOffice.or documents |
| sxw2html | Dicky Wahyu Purnomo's sxw to HTML converter |
| sxw2txt | Dicky Wahyu Purnomo's sxw to ASCII converter |
| libwpd | William Lachance's libwpd and WordPerfect filter for OpenOffice.org Writer |
| OOo to HTML | Steve Slaven's StarOffice/OpenOffice.org to HTML converter Uses XSLTproc and ImageMagick |
sx2ml XSLT
- One of the most complete available XSLT filters
- produces XHTML including css style sheet
- Only has little problems with complex tables
- Developed for use with Java XML Processors but most functions work with Python as well
- Active developement and public cvs
- Sun Microsystems Inc. actively push development
- GNU Lesser General Public License Version 2.1 or Sun Industry Standards Source License Version 1.1
A simple Zope example
- OpenOffice.org Documents can be used in lot of different ways
Current solutions to integrate OpenOffice.org with Zope
There are a couple of ZOPE Products available, which rely on OpenOffice.org
- Archetypes
- ZOODocument
- CMFOODocument
Archetypes
Archetypes is a content-type creating framework based on schema definitions
- PyUNO (OpenOffice.org Python API) integration
- Uses OpenOffice.org to convert Microsoft Office docs to html
ZOODocument
Allows uploading OpenOffice.org documents to Zope. The uploaded Docs are rendered to html by using xsl transformations.
- Native Zope Product
- OpenOffice.org to html transformation by using xsl
- Relies on XMLTransform product to access xml processors
http://www.zope.org/Members/philikon/ZooDocument, http://www.zope.org/Members/arielpartners/XMLTransform
CMFOODocument
Allows uploading OpenOffice.org documents to CMF/Plone/icoya. The uploaded Docs are rendered to html by using sx2ml xslt stylesheets.
- CMF Product
- OpenOffice.org to html transformation by using sx2ml stylesheets from xml.openoffice.org
- Extracts, scales and stores included images to ZODB
- Relies on libxml2/libxslt python bindings
- Dublincore Metadata support
http://www.zope.org/Members/longsleep/CMFOODocument http://www.icoya.de/support/download_area/zope/CMFOODocument
Using OpenOffice.org as content authoring tool
- CMFOODocument pluggs into CMF/Plone/icoya
- ExternalEditor works with OpenOffice.org (NOTE: OpenOffice.org or its quicklauch must not run in background)
- WebDAV integration also works pretty well
Test CMFOODocument on http://demo.plone.org
OpenOffice.org + Zope in the future
- CMFOODocument currently is implemeted using Archetypes
- Upcoming release of CMFOODocument will provide an ooo2html transformer python
- New products can easily make use of OpenOffice.org document
- sx2ml is getting more complete day by day.
Questions?
If someone got a question .. please feel free to ask or write me an email to simon@struktur.de
Thank you.

