Free/Open code for generating XML from OpenOffice.org Calc spreadsheets
Background
XML is used a lot by developers, but not so much by your average person. So,
quite often you'll find yourself needing to get data that was sent to you
by others into an XML format. When it comes to any type of tabular data
or repeating record, it is almost certain that the data will come to
you in the form of a spreadsheet.
OpenOffice.org Calc has built-in capabilities to allow you to transform
your spreadsheets (or any other OO.o document, for that matter) by
writing custom XSLT filters. While this is powerful,
it is also time consuming for many tasks where you just want to get a
basic export of the data. Writing a custom exporter is fine when you
are targeting a specific (and fixed) output format, but it is overkill
when you simply need to get arbitrary data into a well-formed XML
document for further use.
The Simple XML Exporter for OpenOffice.org Calc uses the built-in
XML transformation capabilities of the application to provide a
general XML exporter to allow a quick and easy way to get your data
in XML format. This data can then be further manipulated in any
text editor to allow you to provide any tweaks necessary to get it
into your desired output format.
How it works
The exporter installs into OpenOffice.org (version 3 and above) and
appears as an option in the File>Export dialog box from within the
Calc application. The exporter will create an output file with the
contents of your workbook - including all sheets that have content.
Each sheet is included in the exported file within <ooo_sheet>
</ooo_sheet> tags. The name of the sheet is included as an
attribute to the <ooo_sheet> - to help you keep track of
what sheet generated which section of output.
Each row on the sheet (after the first) is exported as a data row
in your XML file, using the values from the first
row in the column as the tag name. Since XML tag names have more
restrictions than cell values can have - the exporter will attempt to
use your tag name when it can, and will try to generate a valid name
when it cannot. Quite often, the first row of a sheet already contains
column names, so this is an easy way to set tag names for your output
XML.
How to install / use
To install, make sure you are running version 3 (or above) of OpenOffice -
as the exporter uses some XSLT features not available in version 2
of OpenOffice.
-
Download the Simple XML Exporter JAR file
(MD5: b6a22d870dd334b8a2b84b965d653fc3) and
remember where on your system you save the file.
[If prompted, make sure to SAVE the file and not RUN/OPEN it. You may
need to right-click the link and choose to save the file to prevent
Java (if installed on your system) from attempting to run the file.]
-
Install the JAR file from within OpenOffice by going to the
Tools menu and choosing
XML Filter Settings...
This will bring up the XML Filter Settings dialog box
-
Click the 'Open Package' button
-
Select the JAR file from the location on your system to which you
saved the file.
-
Close the XML Filter Settings dialog.
To use the exporter, simply go to the File menu
and choose the Export option.
The exporter will only be visible from within Calc
since it works specifically on spreadsheet data.
Choose 'XML File' from the 'File Format' drop-down box, and
enter a filename and location to save your file.
That's it.
FAQ
For the most part, things are pretty straight-forward and
(hopefully) easy to use. There are a couple small items that need to
be noted - mostly due to the way in which the XSL transformation
interface accesses the "internal" structure of the document,
which can cause a few minor irregularities.
- Can I limit the export to selected cells?
-
No. The 'Selection' checkbox in the default export dialog has no effect on
the exported data when using this filter. The XML file will be generated for all worksheets
and all active columns/rows in the sheet.
- How do I know what names are valid for tag/header values?
-
You can name your header rows anything you wish, however the names
may be modified when generating the XML in order to ensure a
well-formed XML document.
The values pulled from the first row will be modified in this way
when converting them into tag names:
- leading and trailing space will be removed
- any non-alphanumeric characters (including space) will be converted to underscores
- if the name does not begin with an alphabetic character, the name will be prepended with 'COL_NUM_' followed by the header value (where NUM is the number of the column)
- if the header cell is blank, an auto-generated tag name of 'COL_NUM' will be used (where NUM is the number of the column)
- Why do I sometimes get empty rows/sheets in the output XML?
-
This seems to sometimes happen in spreadsheets where there was
once data, but it is no longer there. Occasionally, the application
will 'hold onto' the empty rows in memory, and they get processed
(like all others) by the exporter.
- Why does the row 'num' attribute sometimes not match the actual row number?
-
If you have empty rows in your sheet, they may be completely skipped
in the export, causing the numbering to be off on subsequent rows.
- What if I want to modify or improve on this, where is the source code?
-
The JAR file is simply a ZIP file with some specific files that tell
OO.o how to handle the contents. You can either open the JAR with
a Zip tool, or edit the XML filter properties (after installing) to
locate the XSLT file that contains the export transformation code.
If you do have improvements, I encourage you to send them to me so that
they can (possibly) be incorporated here and others can benefit from
the improvements as well.
- Is there any way to use a different tag name for the row (instead of 'ooo_row')?
-
You can modify the values used by editing the XSLT file that gets
unpacked on your system after installing the exporter. The
ooo_row value was used to be easy to find/replace later in a text
editor. It is unlikely the exported XML is exactly what you
will end up needing and you'll likely want to do a replace on the row
(and sheet) tagnames in a text editor later.
- Will this work in OpenOffice.org 2.4, or other office suites based on the OpenOffice.org codebase?
-
The exporter uses some XSLT features not available in OpenOffice.org 2.x -
so the exporter will not function. Other office suites that are based
on OpenOffice code may or may not
work, depending on what OpenOffice.org and XSLT features they implement.
Additionally, they would need to use the ODS file format or the
internal document structure would not be formatted in a way to
transform properly.
- Is there any kind of warranty in case this messes up my data?
-
Since this is an export filter, your original data should remain
untouched, however it is a script running in the OpenOffice.org
environment and is susceptible to any bugs within the application
itself. The export transformation code is available inside the
package and should be reviewed if you are concerned about its
operation. (The long way of saying 'no')
Contact
Questions, comments, or any other feedback?
Email me at:
oooexport(at)digitalimprint(dot)com