A solution to transform an XLSX file into Content XML (CXML)

Share:

At the 2022 Balisage markup conference, Gayanthika Udeshani, Associate Architect at Typefi, presented her paper “Getting useful XML out of Microsoft Excel.” Gayanthika spoke about converting the XSLX format with XML technologies, which saw great engagement from attendees.

“Congratulations on presenting your excellent work in such detail. It exhausts me just to think of the amount of work this must have taken!” — Balisage attendee.

Group photo of the Typefi XSLT Colombo team (L to R): Anupama, Nimesh, Mahesha, Himasha, and Gayanthika.
The Typefi XLST Colombo team.

Transformations using XProc: a great match with XSLT

Gayanthika first attended Balisage in 2021, which inspired her to submit a paper for the 2022 conference. “I wanted to work on something new for Typefi and thought about a transformation from Excel to CXML that utilised XSLT. I knew there were some limitations with the existing solution which used Java, so I wanted to focus on those and try to resolve them,” explained Gayanthika. “This solution will now be used as a separate converter to convert Excel files into CXML, so I’m pretty proud of that.”

Gayanthika was introduced to XProc (XML pipeline) by fellow XSLT team member, Max Zhaloba, and it was a great learning curve for her. Gayanthika focused on XProc, hoping to include some interesting outputs in her paper.

“It was an outstanding presentation with some of the 69 attendees, legends of the XML world.” — Damian Gibbs.

“This was not an individual effort,” shared Gayanthika, “I want to thank the XSLT team: Anupama Wimalasooryia , Nimesh Kottege, Mahesha Muthumala, and Himasha Maduwanthi—who handled the workload while I was busy with the research paper—and Max Zhaloba for the XSLT and XProc guidance. Thank you again to Anupama, and Damian Gibbs and Kate Prentice for reviewing my paper, and to Typefi for the motivation and freedom to pursue this research


Balisage 2022 Paper:

Getting Useful XML out of Microsoft Excel

Gayanthika’s paper presents a solution that transforms a Microsoft Excel Open XML Spreadsheet (XLSX) file into a shallow-structured XML file used at Typefi, called Content XML (CXML), using XSLT and XProc—an XML-based programming language for processing documents in pipelines.

Gayanthika’s solution has three main research areas:

  1. The XProc pipeline to read the Excel file content.
  2. Map the XLSX table information to the CALS table elements and attributes using XSLT functions.
  3. Transform Excel charts and embedded images. Significant information is read from the chart.xml and converted to a Scalable Vector Graphics (SVG) file, and then referenced as an image in the output XML.

To learn more about Gayanthika’s solution, read her paper and Balisage presentation—both available to download below.

Read Paper (html)
Download Presentation (PDF)


Gayanthika Udeshani, Typefi Associate Architect.

Gayanthika Udeshani

Associate Architect | Sri Lanka

Gayanthika is an Associate Architect at Typefi, where she leads the XSLT team and provides support for other software solutions. Gayanthika holds a Masters of Sciences (MSc) in Software Architecture from the University of Moratuwa, Sri Lanka, and PMP certification.