download the exporter
 

Export your OpenOffice.org Calc or LibreOffice Calc spreadsheet to XML

Background

XML is used a lot by developers, but not so much by your average person. So, quite often you'll find yourself needing to get data that was sent to you by others into an XML format. When it comes to any type of tabular data or repeating record, it is almost certain that the data will come to you in the form of a spreadsheet.

OpenOffice.org and LibreOffice have built-in capabilities to allow you to transform your spreadsheets (or any other office documents, for that matter) by writing custom XSLT filters. While this is powerful, it is also time consuming and not suitable for many tasks where you just want to get a basic export of the data. Writing a custom exporter is fine when you are targeting a specific (and fixed) output format, but it is overkill when you simply need to get arbitrary data into a well-formed XML document for further use.

This Simple XML Exporter for uses the built-in XML transformation capabilities of the application to provide a simple way to save your spreadsheet as XML. This data can then be further manipulated in any text editor to allow you to provide any tweaks necessary to get it into your desired output format.

How it works

The exporter installs into OpenOffice.org or LibreOffice and appears as an option in the File>Export dialog box from within the Calc application. The exporter will create an output file with the contents of your workbook - including all sheets that have content. Each sheet is included in the exported file within <ooo_sheet> </ooo_sheet> tags. The name of the sheet is included as an attribute to the <ooo_sheet> - to help you keep track of what sheet generated which section of output.

Each row on the sheet (after the first) is exported as a data row in your XML file, using the values from the first row in the column as the tag name. Since XML tag names have more restrictions than cell values can have - the exporter will attempt to use your tag name when it can, and will try to generate a valid name when it cannot. Quite often, the first row of a sheet already contains column names, so this is an easy way to set tag names for your output XML.

Note to LibreOffice users: Due to differences in the XSLT processor available within LibreOffice, not all XSL features are supported. This means the LibreOffice exporter cannot currently use the column header as the XML tag name. Instead, tags are named numerically, and can be renamed in a text editor (eg. <column_1>).

How to install / use

Choose the exporter that is right for you depending on your needs and which application suite you use.

LibreOffice Users:

Note: The XSL processor used in LibreOffice does not support XSLT 2.0 features. This means that the code used to generate tag names from the header row does not work. Instead, an XML file will be generated with numerically named tags (eg. <column_1>) which can be renamed by a simple search/replace in your text editor application afterward.

XML Exporter for LibreOffice (MD5: 05d7c3999ccdb077bf55018faba3ee6e)


OpenOffice.org v. 3.0 Users:

Note: If you have a large speadsheet, the exporter may take a long time (or even hang) when generating the output file due to the way it must process the source to determine tag names. You can use the alternate exporter on larger spreadsheets for MUCH better performance. You can install both exporters on the same system at the same time. The alternate exporter will not name the tags in the output with the text from the header row, but will name them numerically (eg. <column_1>) which can be renamed by a simple search/replace in your text editor application afterward.

XML Exporter for OpenOffice.org v3 and higher (MD5: a6029000ab931ce658803e3fe7252380)

Large Spreadsheet XML Exporter for OpenOffice.org v3 and higher (MD5: 05d7c3999ccdb077bf55018faba3ee6e)


OpenOffice.org v. 2.0 Users:

Note: Older versions of OpenOffice.org do not support all the features used by the standard exporter. You can try the alternate exporter, which does not use these features. This means that the code used to generate tag names from the header row does not work. Instead, an XML file will be generated with numerically named tags (eg. <column_1>) which can be renamed by a simple search/replace in your text editor application afterward.

XML Exporter for OpenOffice.org v2 (MD5: 05d7c3999ccdb077bf55018faba3ee6e)


How to install
  1. Download the correct exporter (see above) and remember where on your system you save the JAR file.
    [If prompted, make sure to SAVE the file and not RUN/OPEN it. You may need to right-click the link and choose to save the file to prevent Java (if installed on your system) from attempting to run the file.]
  2. Install the JAR file from within OpenOffice by going to the Tools menu and choosing XML Filter Settings...
    This will bring up the XML Filter Settings dialog box
  3. Click the 'Open Package' button
  4. Select the JAR file from the location on your system to which you saved the file.
  5. Close the XML Filter Settings dialog.

To use the exporter, simply go to the File menu and choose the Export option.
The exporter will only be visible from within Calc since it works specifically on spreadsheet data.

Choose 'XML File' from the 'File Format' drop-down box, and enter a filename and location to save your file.

That's it.

Note: The exporter for LibreOffice, the large spreadsheet exporter for OpenOffice.org, and the exporter for legacy versions of OpenOffice.org will show up as 'XML File (numbered cols)' in the 'File Format' drop-down.

FAQ

For the most part, things are pretty straight-forward and (hopefully) easy to use. There are a couple small items that need to be noted - mostly due to the way in which the XSL transformation interface accesses the "internal" structure of the document, which can cause a few minor irregularities.

I'm getting errors on my Ubuntu (or other Linux system) box. Why?
The exporter for OpenOffice.org requires packages not installed by default on some systems.
Use your package manager to install the following packages:
openoffice.org-java-common
openoffice-dev This one may not be necessary on your system. Install it if installing the previous does not fix your problem.
Can I limit the export to selected cells?
No. The 'Selection' checkbox in the default export dialog has no effect on the exported data when using this filter. The XML file will be generated for all worksheets and all active columns/rows in the sheet.
How do I know what names are valid for tag/header values?
Note: This only applies to the standard exporter for OpenOffice.org v3 - all other versions of the exporter auto-generate column names numerically.
You can name your header rows anything you wish, however the names may be modified when generating the XML in order to ensure a well-formed XML document.
The values pulled from the first row will be modified in this way when converting them into tag names:
  • leading and trailing space will be removed
  • any non-alphanumeric characters (including space) will be converted to underscores
  • if the name does not begin with an alphabetic character, the name will be prepended with 'COL_NUM_' followed by the header value (where NUM is the number of the column)
  • if the header cell is blank, an auto-generated tag name of 'COL_NUM' will be used (where NUM is the number of the column)
Why do I sometimes get empty rows/sheets in the output XML?
This seems to sometimes happen in spreadsheets where there was once data, but it is no longer there. Occasionally, the application will 'hold onto' the empty rows in memory, and they get processed (like all others) by the exporter.
Why does the row 'num' attribute sometimes not match the actual row number?
If you have empty rows in your sheet, they may be completely skipped in the export, causing the numbering to be off on subsequent rows.
What if I want to modify or improve on this, where is the source code?
The JAR file is simply a ZIP file with some specific files that tell OO.o how to handle the contents. You can either open the JAR with a Zip tool, or edit the XML filter properties (after installing) to locate the XSLT file that contains the export transformation code.
If you do have improvements, I encourage you to send them to me so that they can (possibly) be incorporated here and others can benefit from the improvements as well.
Is there any way to use a different tag name for the row (instead of 'ooo_row')?
You can modify the values used by editing the XSLT file that gets unpacked on your system after installing the exporter. The ooo_row value was used to be easy to find/replace later in a text editor. It is unlikely the exported XML is exactly what you will end up needing and you'll likely want to do a replace on the row (and sheet) tagnames in a text editor later.
Why are there different exporters for the different suites (and versions)?
The exporter uses some XSLT features not available in older versions of OpenOffice.org Calc or in current versions of LibreOffice. In order to allow the basic functionality to work, the alternate versions of the exporter forego the tag naming to allow the data to still be exported. Typically, most people take the output data into a text editor after generation anyway, so renaming the main columns is a suitable workaround compared to not having an exporter at all on those platforms.
Is there any kind of warranty in case this messes up my data?
Since this is an export filter, your original data should remain untouched, however it is a script running in the OpenOffice.org environment and is susceptible to any bugs within the application itself. The export transformation code is available inside the package and should be reviewed if you are concerned about its operation. (The long way of saying 'no')
How do I know if I have the most current version of the exporter?
As of version 2 of the exporter, the version number will appear in the output XML.
(eg. <ooo_calc_export scriptVersion="2.2.0" ...)
 
If reporting any problems, please include the full version number in your email.

Contact

Questions, comments, or any other feedback?
 
Email me at:
oooexport(at)digitalimprint(dot)com