Working with mathematics


In this presentation from the 2022 Typefi User Conference, Damian Gibbs, Typefi Solutions Consultant, explores two common methods of publishing equations—EPS images and MathML—and explains how each is used in the publishing production workflow.

Damian also discusses the strengths and weaknesses of each method and delves into how the upcoming end of support for PostScript Type 1 fonts will affect mathematical publishing workflows. He also provides some resources to help you with your mathematical publishing.

“Fortunately with Typefi, EPS and MathML is not a one or the other choice…Typefi can extract the MathML from the EPS when composing the pages and use MathTools to create the equations instead of placing the images.”

Transcript | Presenter


00:00 Intro
00:54 The history of mathematical notation
02:33 The languge of mathematical notation
05:25 Mathematical fonts
06:36 Mathematical markup languages
07:47 Mathematical authoring tools
09:40 Publishing equations
11:20 EPS equations
13:57 MathML
14:40 Mathematical publishing workflows
16:29 MathJax
16:58 PostScript Type 1 fonts
18:23 Summary & resources

Intro (00:00)

DAMIAN: All right. Hi. Okay, so as you all know, my name’s Damian Gibbs, solutions consultant at Typefi.

I mostly work with organisations who use structured content as inputs such as NISO and ISO STS and DITA. However, at Typefi I never work on one project at a time and I also work with customers who use Microsoft Word as an input. And then some also other for really fun types of input such as CSV and straight text files and JSON.

Types of organisations I work with are quite varied from technical documentation through to finance and meteorological reports and newspapers and also with general publishers. So we have quite a wide scope of what I work with. So there’s never a dull day and there’s always something fun and interesting to learn.

So today, as Chandi mentioned, I’m gonna be talking about publishing with mathematics and Typefi.

The history of mathematical notation (00:54)

So mathematical notation, a little bit of a history lesson, has been part of literature almost since the maths was first published in a written form.

And one of the earliest examples I could find was in Genesis chapter 43, verse 34, and it talks about multiplication. So I’m just gonna paraphrase. It says, “when portions were served from Joseph’s table, Benjamin’s portion was five times as much as anyone else’s. So they feasted and drank freely from his serving.”

As the years passed, mathematical notation was elevated to the status of kings and royalty, sorry, to kings and royalty. So we have Henry the 5th, Henry the 8th, and James the 6th. But let’s not go on a tangent.

So just a short history of the evolution of mathematical notation, with thanks to Stephen Wolfram. Here we can see how the writing of equations has evolved over the centuries. This is a snippet of Isaac Newton’s manuscript from Principia and it shows some early algebraic notation.

Later in October, 1675, we see for the first time the use of the integral sign by Leibniz. Euler in the 1700s, a little while later, systematically started using notation and followed much of what Leibniz had done. And Euler appears to be one of the first regular users of Greek and Roman symbols to indicate variables.

And just for interest, here’s a page from the father of modern computing, Alan Turing’s notebook, showing his use also of the integral sign.

Finally, here we are today. The huge advancements thanks to Turing and his team on equations set digitally in Adobe InDesign.

The languge of mathematical notation (02:33)

So mathematics, like a natural language, involves strings of text notation defined by grammar and is context free. In other words, a representation of the notation should be the same irrespective of the final output.

So mathematical notation evolved naturally like any other language. And here when we look at this equation, we can see that the equation expressions are made up of a number of components which are familiar to natural language.

So the approximation symbol behaves like an adjective. Operations represented by symbols and behave similar to verbs. Variables are shown by letters and sometimes associated as nouns. Numbers, well of course they’re in a league of their own. And then we have conjunctions such as and, or, or not and, which I always find really confusing.

In a keynote address by Stephen Wolfram back in 2000, he noted that there were already an additional two and a half thousand commonly used symbols over and above our natural normal language that appear as ordinary text.

So just to give a correlation and a fun fact is that if you want to pass the HSK proficiency test, you would need to know about 2,600 Mandarin symbol, Mandarin characters. And to read a Chinese newspaper, you’ll need to know about three and a half thousand characters.

So we know that general language can benefit from a little bit of creative licence and interpretation. However, for mathematical notation, accuracy is imperative. A change of a character or even the formatting, whether it’s italic or a different font, superscript, can change the meaning or render the equation nonsensical.

Fortunately, errors around formatting have been mitigated by structured markup, which is used in both authoring and publishing environments. And I’ll touch on this in the next couple of slides.

So mathematics in some respects is not unlike general language. There can be different methods to express the idea or the concept, but unlike general language, where phrases are linear, mathematic concepts are often expressed in a matrix within a line of text.

So here if we consider the simple fraction, a half. In general language, we say the cup is half full. But if we had to express the same mathematical notation over a number of lines, we obviously still say that the cup is half full, but as we can see, the one and the two are starting to crash into each other.

And to remedy this we need to take some action. So for equations which are placed on their own line or paragraph, this is not an issue as it’s trivial just to adjust the spacing above and below the paragraph. There are typographical designs, these are some of the typographical designs and decisions which need to be made when publishing irrespective of the final format.

Mathematical fonts (05:25)

So the wonderful authoring tools we have to create and insert mathematical notation rely on some building blocks. And they’re two key building blocks that we rely on.

So the one is fonts. The range of fonts suitable for mathematical notation is limited in comparison to the general language. So bearing in mind that the font family should, a font family should have a good coverage of symbols needed to render the vast number of equations correctly.

So remember, there are approximately three and a half thousand symbols, which should be covered. And of course that is a considerable amount of effort to build a font to cover all those characters. To overcome some of these limitations around the number of characters, authoring tools use different fonts, or sometimes use different fonts for different parts of the equation.

More recently, font families such as STIX have a very good coverage. For example, the font STIX General has a coverage of 3,302 characters.

And just as a side note, STIX as an acronym for Scientific and Technical Information Exchange. And it arose out of a person called Arie de Ruiter’s proposal in 1995, to have a font, an open source font, that would cover all alphabetic, symbolic and special characters used in any facet of scientific publishing.

Mathematical markup languages (06:36)

So the next building block is a markup language. So here there are a number of types and variants, which you might be familiar, such as LateX and Mathematica, which is used by Wolfram Alpha.

But for simple equations, no markup is really needed as these can be accomplished with natural language and formatting, either making the font italic or using one of the symbols in the font. However, the crucial difference is that the equation will not have semantic markup or be context free.

More recently, MathML has evolved as a widely adopted structured format for mathematical notation and is now used in preparation of print and digital publishing products. Those who are familiar with XML will immediately recognise the structure and semantics once you understand the tags, and it becomes human readable, well, to a point.

And this is the cost—as much as it covers clarity, the parsability between systems and avoiding ambiguity, the markup is really very verbose and it makes it quite difficult for humans to type equations in directly.

Mathematical authoring tools (07:47)

And this is where the authoring tools come in. So they provide a convenient interface to the markup, allowing the author and editors to insert and update equations easily. I won’t go into the details of creating equations as far as to say it’s a lot of point and clicking.

So here we have Microsoft Word, which has a built in equation editor, and Wiris have one called MathType, which is a plugin for Word and expands on Word’s capabilities. eXtyles is another popular and powerful add in for Word and is compatible with MathType.

So what you should be looking at or be aware of between all these different tools is what options are available and how these affect the markup of the equations when supplying their contents to an automated publishing workflow such as Typefi.

So for Word, for example, the equations can be inserted as inline or as display, so display meaning the equation is in its own paragraph. And Word offers templates, which can be used to build common equations. And you can also create your own library of equations. There’s no export option to generate equations as images. And the equation information is stored within the DOCX file. In the equation setup, you can see that the font used to render the equation in Word is Cambria Math.

The Wiris MathType plugin is more sophisticated and uses a similar method to building equations where the user can insert equations using preset templates or build your own. There are a myriad of options for this one, so highly configurable, which I find, which I’m sure power users find really useful.

A notable difference is that multiple fonts are used to style the equation in the authoring environment. With MathType, equations are exported as image files when preparing for publishing. So this is typically as EPS and we’ll dig into what a EPS is in a minute.

Publishing equations (09:40)

So now we’ve created the equations, and so onto the publishing part.

There are two streams for publishing equations. You can save as images from the source file, such as what you do with MathType and let them travel along with the contents to be published in the final output. So this is typically as an EPS or an Encapsulated Post Script file, which is a vector based image file, which allows equations to be scaled up without pixelating.

The opposite, or another image file format, is a pixel based image, such as a GIF, which is also sometimes used. But the drawback is that when it’s scaled up, you’ll lose resolution and you’ll have the appearance of pixelation.

The other stream is to send MathML markup as part of the content and let the final publishing environment or tool render the equation according to the markup.

So this screenshot is of the same equation using the exact same MathML composed in InDesign using the MathTools plugin by movemen. The plugin is highly configurable and builds equations based on the structure of the underlying MathML. So styling of the equation can be done on, with whatever fonts you have in your system, but it does come preconfigured with the STIX fonts, which are installed as part of the plugin.

And of course with images there is a limited scope for including accessibility to your equation. So possibly the only option for equations as images would be to write alt tags. With equations structured as, structured with markup, there’s an opportunity, increased opportunity for accessibility. So equations could be read by screen readers. You could scale fonts up for low vision and manage colour for colour bindness.

EPS equations (11:20)

So let’s have a quick look inside an EPS.

If we crack it open in a text editor, we can see a lot of information. The header section contains metadata about the files such as what application created the equation, the fonts used, and a little further down we can start seeing some MathML, which is included in the EPS file. And then towards the bottom we’ve got from about line 31 is all the post script code which draws the equation along with the help of the fonts.

So just a note here that the font information there is only a name. It does not stipulate a version or a font type. And you’ll see why this is important in a little bit.

Like software, fonts are also software and each version have differences in the way they behave. For example, how characters are mapped or kerning tables, which define the space between the different characters.

So when the EPS is placed in the rendering software, for example, InDesign, the system will look for the font of the same name available on the host system. This makes EPS images for equations quite fickle and tricky to debug sometimes.

Even software upgrades on your operating system can cause a hiccup. Some of the authoring tools use system fonts and during the upgrade, these fonts are sometimes upgraded as part of your system upgrade without you knowing, and the new font would have different mappings or a slightly different kerning tables.

So when this happens, either a default is shown, so if there’s an incorrect mapping or a glyph is missing, a default font is used by the system, which will probably have the incorrect character. Or as a final fall back, a square little square block is shown, which will indicate a missing character.

So here is an example of a scenario in InDesign where we have a font conflict. So two equations, one using Symbol and the other one using Symbol MT. Although the top equation looks okay on screen, this is the one with the missing font and the exporting to PDF can give unexpected results. The complication is that the file name for these two different fonts are exactly the same in the system.

So here’s a closer look at the font information for the two equations. They both refer to Symbol fonts, but installing both with the same file name on the same system is quite tricky.

So here is the system, here is the Symbol font on the authoring system. And at the bottom we have the same font made by the same foundry on the publishing system, but it has a different version number. And so it could render the contents incorrectly. There are methods of mitigating this, but still even so variations of these types of issues still occur regularly.

MathML (13:57)

So just a brief background on MathML.

MathML is a semantic markup language for describing mathematical notation. And its intent is to capture both the structure and the content, and was first released in 1998 by the W3C. The goal of MathML is to enable mathematics to be served, received, and processed in the same manner as HTML is, as HTML enables this functionality for text.

So in the short time since 2015, it has gained wide support on both sides of the publishing equation. MathML has two flavours. There’s presentation markup and content markup. And for MathML workflows using InDesign and MathTools, the presentation format is used.

Mathematical publishing workflows (14:40)

Okay, so just to summarise a general workflow using EPS equations in MathML.

For EPS equations, content is authored in Word with MathType, and then at the time of publishing, the equations are exported as EPS, packaged along with the content and then passed onto the publishing environment. And here the fonts need to be in sync between the author and the publishing environment. And this is crucial, especially when you’re converting EPS equations to other formats such as JPG, PNG and JPEG for HTML type publications.

For pure MathML workflows on the authoring side, MathML is included in the source document, if you’re using Word’s equation editor, and pass through to the publishing environment inside the documents. So there’s no exporting of images. On the publishing side, MathML is supported in Adobe InDesign via the MathTools plugin.

Once the pages are composed or you’re repurposing the content for EPUB and HTML, the images are then exported from InDesign along with the help of MathTools.

So when publishing in HTML, some browsers support MathML natively, and for those that don’t, they are JavaScript libraries that will also render MathML on the fly, such as MathJax.

Also, equations have structured markup which allows for diff compare using a tool such as DeltaXML for redlining documents.

Fortunately with Typefi, EPS and MathML is not a one or the other choice as the EPS equations from MathType include MathML. So Typefi can extract the MathML from the EPS when composing the pages and use MathTools to create the equations instead of placing the images.

However, just a caveat for this is that it’s not really practical to try and have both options in a single automated workflow.

MathJax (16:29)

So just a note here about MathML in the browser.

The JavaScript library MathJax provides really good coverage for MathML. Here is the same MathML that we used in the previous examples placed in an HTML document and rendered with the default setting for MathJax.

MathJax also has some really interesting accessibility features which can be configured and while MathML isn’t supported in some browsers, MathJax can be used in all the major browsers.

PostScript Type 1 fonts (16:58)

So as you might be aware, Adobe are stopping support for legacy PostScript Type 1 fonts, which will affect InDesign 2023 onwards, but 2022 and previous versions remain unchanged.

So if you were around in 1984 when Type 1 fonts were first created, consider yourself part of a legacy.

PostScript Type 1 fonts are sometimes used with EPS equations, especially the older equations. Unfortunately, you cannot just replace a Type 1 font with the modern Open Type equivalent and then expect to reuse the EPS equation as the font definitions might not be the same and probably won’t be the same. The glyph tables might not necessarily match and the result, which as we saw earlier, would be missing and different characters.

So just to be clear, it’s not that the EPS format is disappearing, although it is quite an old and useful format, it’s only the fonts which some of the EPS equations files refer to.

Technically this is quite challenging as you can’t know which EPS files refer to old outdated fonts just by looking at the metadata because as we saw earlier, all it contains is the font name. It does not have the font type or the version. So to continue using EPS equations, the source document will need to be reopened on a system that is configured to use the newer Open Type version of the fonts, the equations checked and then re-exported, or you can start using MathML.

Summary & resources (18:23)

So in closing, just some key points to consider when working with mathematical notation.

Consider which fonts will be used in the authoring and what will be used in the final output, and the coverage of the fonts for the required characters.

How will you manage backlist and legacy documents with legacy fonts and future publishing, rebuilding EPS equations or whether to extract the MathML and render the equations at the time of publishing?

How will you want to publish content in different contexts and ensure a level of accessibility?

Also, consider if you want to have the ability to granular red line with tools such as DeltaXML. How will you manage the styling of mathematical notation in different outputs? Do you want to set up your publishing platform to format on the fly, or regenerate EPS files from source content for each context?

And as you move documents through different publishing workflows, how do you ensure the integrity of the equations?

And finally, new digital products are continually evolving and emerging and they will all benefit from structured content.

So here are some resources which will just help you with your decisions for creating equations in the future.

Thank you for your time. And are there any questions?

Well, that was lucky *laughs*. Thank you.

Damian Gibbs

Damian Gibbs

Solutions Consultant | Typefi

Damian started out as an apprentice typesetter over 20 years ago at a leading South African educational publisher, and from the start was curious about opportunities that digital technologies bring to publishing. He transitioned to general market publishing and eventually became a service provider to local and offshore publishers covering a diverse range of publishing markets, all requiring varying workflows and output requirements.

Damian has extensive experience working with publishers to use evolving technologies and innovative digital publishing products to improve workflows, and to transition from pure print to digital outputs such as web, e-books, and CMS publishing.