download the exporter
 

Free/Open code for generating XML from OpenOffice.org Calc spreadsheets

Background

XML is used a lot by developers, but not so much by your average person. So, quite often you'll find yourself needing to get data that was sent to you by others into an XML format. When it comes to any type of tabular data or repeating record, it is almost certain that the data will come to you in the form of a spreadsheet.

OpenOffice.org Calc has built-in capabilities to allow you to transform your spreadsheets (or any other OO.o document, for that matter) by writing custom XSLT filters. While this is powerful, it is also time consuming for many tasks where you just want to get a basic export of the data. Writing a custom exporter is fine when you are targeting a specific (and fixed) output format, but it is overkill when you simply need to get arbitrary data into a well-formed XML document for further use.

The Simple XML Exporter for OpenOffice.org Calc uses the built-in XML transformation capabilities of the application to provide a general XML exporter to allow a quick and easy way to get your data in XML format. This data can then be further manipulated in any text editor to allow you to provide any tweaks necessary to get it into your desired output format.

How it works

The exporter installs into OpenOffice.org (version 3 and above) and appears as an option in the File>Export dialog box from within the Calc application. The exporter will create an output file with the contents of your workbook - including all sheets that have content. Each sheet is included in the exported file within <ooo_sheet> </ooo_sheet> tags. The name of the sheet is included as an attribute to the <ooo_sheet> - to help you keep track of what sheet generated which section of output.

Each row on the sheet (after the first) is exported as a data row in your XML file, using the values from the first row in the column as the tag name. Since XML tag names have more restrictions than cell values can have - the exporter will attempt to use your tag name when it can, and will try to generate a valid name when it cannot. Quite often, the first row of a sheet already contains column names, so this is an easy way to set tag names for your output XML.

How to install / use

To install, make sure you are running version 3 (or above) of OpenOffice - as the exporter uses some XSLT features not available in version 2 of OpenOffice.

  1. Download the Simple XML Exporter JAR file (MD5: b6a22d870dd334b8a2b84b965d653fc3) and remember where on your system you save the file.
    [If prompted, make sure to SAVE the file and not RUN/OPEN it. You may need to right-click the link and choose to save the file to prevent Java (if installed on your system) from attempting to run the file.]
  2. Install the JAR file from within OpenOffice by going to the Tools menu and choosing XML Filter Settings...
    This will bring up the XML Filter Settings dialog box
  3. Click the 'Open Package' button
  4. Select the JAR file from the location on your system to which you saved the file.
  5. Close the XML Filter Settings dialog.

To use the exporter, simply go to the File menu and choose the Export option.
The exporter will only be visible from within Calc since it works specifically on spreadsheet data.

Choose 'XML File' from the 'File Format' drop-down box, and enter a filename and location to save your file.

That's it.

FAQ

For the most part, things are pretty straight-forward and (hopefully) easy to use. There are a couple small items that need to be noted - mostly due to the way in which the XSL transformation interface accesses the "internal" structure of the document, which can cause a few minor irregularities.

Can I limit the export to selected cells?
No. The 'Selection' checkbox in the default export dialog has no effect on the exported data when using this filter. The XML file will be generated for all worksheets and all active columns/rows in the sheet.
How do I know what names are valid for tag/header values?
You can name your header rows anything you wish, however the names may be modified when generating the XML in order to ensure a well-formed XML document.
The values pulled from the first row will be modified in this way when converting them into tag names:
  • leading and trailing space will be removed
  • any non-alphanumeric characters (including space) will be converted to underscores
  • if the name does not begin with an alphabetic character, the name will be prepended with 'COL_NUM_' followed by the header value (where NUM is the number of the column)
  • if the header cell is blank, an auto-generated tag name of 'COL_NUM' will be used (where NUM is the number of the column)
Why do I sometimes get empty rows/sheets in the output XML?
This seems to sometimes happen in spreadsheets where there was once data, but it is no longer there. Occasionally, the application will 'hold onto' the empty rows in memory, and they get processed (like all others) by the exporter.
Why does the row 'num' attribute sometimes not match the actual row number?
If you have empty rows in your sheet, they may be completely skipped in the export, causing the numbering to be off on subsequent rows.
What if I want to modify or improve on this, where is the source code?
The JAR file is simply a ZIP file with some specific files that tell OO.o how to handle the contents. You can either open the JAR with a Zip tool, or edit the XML filter properties (after installing) to locate the XSLT file that contains the export transformation code.
If you do have improvements, I encourage you to send them to me so that they can (possibly) be incorporated here and others can benefit from the improvements as well.
Is there any way to use a different tag name for the row (instead of 'ooo_row')?
You can modify the values used by editing the XSLT file that gets unpacked on your system after installing the exporter. The ooo_row value was used to be easy to find/replace later in a text editor. It is unlikely the exported XML is exactly what you will end up needing and you'll likely want to do a replace on the row (and sheet) tagnames in a text editor later.
Will this work in OpenOffice.org 2.4, or other office suites based on the OpenOffice.org codebase?
The exporter uses some XSLT features not available in OpenOffice.org 2.x - so the exporter will not function. Other office suites that are based on OpenOffice code may or may not work, depending on what OpenOffice.org and XSLT features they implement. Additionally, they would need to use the ODS file format or the internal document structure would not be formatted in a way to transform properly.
Is there any kind of warranty in case this messes up my data?
Since this is an export filter, your original data should remain untouched, however it is a script running in the OpenOffice.org environment and is susceptible to any bugs within the application itself. The export transformation code is available inside the package and should be reviewed if you are concerned about its operation. (The long way of saying 'no')

Contact

Questions, comments, or any other feedback?
 
Email me at:
oooexport(at)digitalimprint(dot)com