A guide to structuring content for automation


Learn how to make magic happen with intelligent content! Join Marie Gollentz, Typefi Project Manager, as she offers a high-level practical guide to developing and implementing a content strategy.

If you’re considering how content management systems, automation, or XML might fit into your publishing processes, or if you’re simply wondering how to get your content in order, this is the perfect place to start.

What you’ll learn

  • Why it’s important to develop ‘intelligent content’ that separates form from function;
  • How to analyse and structure content, develop style naming conventions, and create editorial style guides;
  • How to build design style guides and templates;
  • And more!

You’ll also see a demo of Typefi’s integration with Adobe Experience Manager Guides (formerly XML Documentation for Adobe Experience Manager*). This is just one of the many ways that structured content can be used to automatically create highly-designed professional publications fast!

This webinar was hosted by the Center for Information-Development Management (CIDM), which facilitates the sharing of information about current trends, best practices, and development within the global information industry.

If you’d like to share your thoughts about this webinar or if you have any questions, don’t hesitate to drop us a line—we’d love to talk publishing with you!

*Adobe Experience Manager Guides was formerly branded as XML Documentation for Adobe Experience Manager. A name change occurred on 11 May, 2022. Please note certain references within the documentation may still refer to prior branding but are still applicable to the current offering.

Transcript | Presenters


00:00 Introduction to Comtech Services and CIDM

  • Training
  • Events
  • Webinars
05:49 Presentation: A guide to structuring content for automation
07:28 Modern content is intelligent content
11:32 The Institutes: Doing more, faster
14:05 Developing a content strategy

  • Set goals
  • Analyse your content
  • Decide what you want to do with your content
19:36 Creating structured content

  • Determine the level of granularity
  • Organise content into types
24:20 Developing naming conventions, style guides, and templates

  • Style naming conventions
  • Editorial style guides
  • Design style guides
  • Style templates
28:22 XML-based authoring
31:45 Content Management Systems
34:25 Automated publishing
35:45 Demo: Automated publishing with Typefi and Adobe Experience Manager Guides

  • Marketing brochures
  • Disclosure documents
43:21 Q&A session

  • Semantic analysis
  • Validating source XML against a DTD or schema
  • Using Adobe InDesign for automated layout
  • Enabling collaboration across business units
  • Industries that use Typefi
  • Integration with XML authoring tools
  • Content reuse
  • Integration with Content Management Systems
  • Developing transforms
  • How long does it take to set up a Typefi workflow?
  • Transforming proprietary DTDs
1:07:20 Typefi at ConVEx
1:09:18 Q&A continued

  • Typefi and PDF accessibility
1:11:32 Conclusion

Introduction to Comtech Services and CIDM (00:00)

KATHY: Welcome everybody. My name’s Kathy Madison. I am going to be the moderator today for the webinar, Structuring Your Content For Automation.

I am with Comtech Services and CIDM, and for those that don’t know, they are one and the same. Comtech Services is the consulting company that started many, many years ago under JoAnn Hackos and is now run by Dawn Stevens, and we provide consulting on all things to make your content better. We do training sessions on all of those topics as well.

So that’s what we’re known for there at Comtech, and on the CIDM side, that is our membership organisation, where we provide content to the community as a whole, doing networking opportunities, we have round table discussions with our members, we have a newsletter, and there’s various levels that you can join.

And my role here, because I wear both hats, I’m a consultant on the Comtech side, so I work on projects, mostly process maturity models and user studies and benchmark studies and some information modelling, but I do a little bit of everything on the Comtech side.

On the CIDM side, I am our member liaison. Part of my job is to recruit new members, so if you’re interested, please let us know.


All right. So on the Comtech side, I mentioned, we do trainings. Right now, all of our trainings are online, as you would expect. We have several classes coming up that are relatively new.

Developing Your Content Strategy, that’s a new one, and that’s more of a workshop where you’ve got a lot more interactive sessions to actually create a content strategy for your organisation.

Then Editing Essentials for Writers is a brand new course for writers and editors, and that’s happening in October.

All of these classes are kicking off in October, and our DITA Basics class and our Advanced Reuse DITA class, those typically run every quarter. So, if you can’t make the October sessions, they’ll be going next year as well.

You can see all of our upcoming workshops because we have listed some January ones, but I just wanted to get these up for now, so go to our workshop page.


I mentioned on the CIDM side, we put on events and one of our big, huge events that we have going on right now is the ConVEx event.

That’s online and virtual, and it’s not your typical event. We have pre-recorded a lot of presentations, and then you’re going to have interactive sessions with the presenter. We call those Candid Conversations. Those preloaded conversations or presentations are available now.

If you’re thinking about coming, which we hope you do, we encourage you to go ahead and get registered as soon as possible so you can start watching those recordings.

We also are going to do live presentations as well, and some panel sessions, and those will be recorded and then put on the platform for you to watch for a year after ConVEx.

And I also want to point out, we have a Slack workspace dedicated to ConVEx, and there’s been a lot of chatter. People are posting comments about the presentations that they’re watching, and they’re also having some fun posting comments about their pets and crazy puns, and there’s a DITA user group if you’re using DITA.

So, there’s some fun things on the Slack, and again, we encourage you to register as soon as possible so you can participate in that.

Our speakers today, or a team from Typefi, are participating in ConVEx, so we encourage you to visit with them if you are going to participate in ConVEx as well.

Our other premier conference that we do is called Best Practices, and that’s geared for managers and managers of content developers, and that will also be virtual. And we’ve been so focused on ConVEx that we haven’t really got the agenda going on that one yet, but we will get that up very, very soon for you.


Also under the CIDM hat, we do put on webinars like this. We have a couple of other webinars coming up in the next month or so. The folks from Oxygen, the Syncro Soft team, will be presenting, and also the folks from Stilo, we’ll be talking about conversion. You can see all of our upcoming webinars on the CIDM website.

Before I turn it over to today’s speaker, I want to go over some logistics. We are recording this, and you will get a link to the recording a day or so after the webinar.

If you have questions for Marie, our speaker, go ahead and type those into the questions dialogue, and we’ll field those at the end of her presentation. And if you have any audio issues or any technical issues, you can put that in the chat and we’ll have a look at those as well. But again please use the questions dialogue box for your questions.

With that, I am very thrilled to have Marie Gollentz from Typefi here. It’s the first time we’ve done a webinar with them, and we’re excited to have them part of the CIDM community. And with that, I’m going to go ahead and turn it over to Marie.

A Guide to Structuring Content for Automation (05:49)

MARIE: Good morning, afternoon or evening. Welcome to this webinar, A Guide to Structuring Content for Automation.

My name is Marie. I’m a Senior Solutions Consultant at Typefi, and it’s a pleasure to spend the next hour together to talk about how you can start developing and implementing a content strategy and how you can benefit from it.

We’ll start with a high-level practical guide, outlining a few steps you can follow. And then I’ll do a quick demonstration of how you can create really beautiful content from the Adobe Experience Manager Guides solution using InDesign.

Then we’ll follow up with some questions and answers. You can put your questions in the chat bar, so feel free to ask questions as you go along. And if they’re unanswered within the presentation, we’ll definitely address those questions in the Q and A section.

I’m saying we, because with me today from Typefi, Chris Hausler is also part of the webinar, he is Director of Business Development, and Caleb Clauset as well, the Vice President of Product. They will be here to answer all your questions after my presentation.

Modern content is intelligent content (07:28)

So first, why are we gathering today to speak about content strategy and structured content? Well, modern digital content is intelligent content. In the digital era, content is not limited to one purpose, technology, or output.

A few facts here. In three years, nearly two thirds of the global population will have internet access. By that time, there will be more than three networked devices per person, on average. In the US today, consumers already spent over eight hours a day engaging with digital content.

And when it comes to content formats, just in the last year, 250 billion PDFs were opened.

Visual content is the new black. Short form videos and static images are the best performing content formats across every geography and industry.

In terms of content type, customers prefer product-focused content than all the types of content that are more typically associated with content marketing.

So, as we have more people connected, more devices, more content formats, and more content types, modern content needs to be intelligent content.

Intelligent content is structurally rich and semantically categorised, and is therefore automatically discoverable, reusable, reconfigurable, and adaptable. You see, that’s plenty of -able.

If we take a step back and look at the traditional approach to content, content is static and linear, and is managed at a document level. The authoring combines both the style and the content, and it happens at the document level too.

Most of the time, the production workflow is print-centric. As a result, we’re facing failures in efficiencies. For example, we can have content that is siloed, duplicated or even inaccurate. The manual workflow can be very lengthy with repetitive or even redundant effort.

In this print-first production process, all other outputs are afterthoughts.

If we take a modern approach to intelligent content, then content is dynamic and modular, and it is managed at the component level. It is entirely free from formats, from context, and is based on topics.

And when content is intelligent, we can implement multi-format, digital production workflows.

A few benefits from that—my content is then up to date, structured and reusable, I can implement an automated single source publishing workflow, and my digital production process allows me to generate simultaneously multiple output formats.

The Institutes: Doing more, faster (11:32)

To make all that a bit more real, I would like to talk a little bit about The Institutes now. They changed their content strategy over a decade ago, and have since been able to do more and faster.

Originally, they had a print-centric workflow. They produced three print publications, textbook, course guide, and review notes, and they could create and revise about four courses in a given year.

The process was very lengthy, very manual, and very tedious and supported by a small production team.

The business requirements, at that point about a decade ago, was to embrace online learning, write content in a granular way as learning objects, then reuse these learning objects across any course, and also automatically lay out the different InDesign documents, as well as flashcards and online learning modules.

So, they implemented a new solution based on a topic-based authoring with DITA XML. We’ll discuss the DITA in a bit more detail in a moment.

They had all their learning objects and digital assets stored in an XML-based CMS. They were able then to automate their print and online publishing, using Typefi and Adobe InDesign, and they implemented a digital workflow to manage all that content development.

Over five years, they’ve been able to increase their output from four new products a year to over 80. That was between 2009 and 2013.

Developing a content strategy (14:05)

So the question is, how can you get there? Well, let’s start with implementing a semantic and structured approach to content.

First, we need to define a content strategy. A content strategy is everything that has to do with planning, creating, delivering, managing your content. As you can see, there are several aspects of it, and we will only scratch the surface today.

Set some goals

Let’s start with setting high level goals.

Do you want to improve quality?

If you had more time and money, how would you make the experience better for your audience?

Do you want to reduce production time? If you increase your efficiency, you’ll be able to spend more time on crafting new content, for example.

Do you want to reduce costs? Who wouldn’t?

Do you want to offer more? Again, by making your process more efficient, you might be able to add more content and create more content for your audiences in different formats, or add new ways of distribution.

You have to be prepared to revisit and refine your goals and objectives regularly during the development of your strategy. As you gain a clear understanding of your content, you will be better placed to define realistic targets and outcomes.

Analyse your content

First, analyse the content you have.

Where is your content coming from? Who’s creating it? Are you writing it in-house? Do you have freelancers? How much control do you have over the authors?

Is the content always new? Is it, or can it be, reused from previous versions of your publications?

In what formats is your content? Is it Microsoft Word, Adobe InDesign, PDFs, databases, or even paper?

Who is your audience, your customers—other organisations, government agencies, regulatory agencies? Are you dealing with different countries or regions with different language requirements? Do the regions where you’re publishing have accessibility requirements?

What formats do your audiences want—print only, electronic formats such as EPUB or PDF? Do they want your content in XML, in accessible formats, such as DAISY? Do they want to access it online or do they want it as a mobile app?

What additional types of content do you want to produce? Would you like to add interactive features, videos, if you were creating an electronic version of a publication, for example?

What is your current workflow for publishing? That includes authoring, editing, reviewing, and the composition of your content. How does this current workflow manage output into multiple media? Does it add significant time to the process? How do you deal with changes to content after production?

What do you want to do with your content?

Once you’ve answered these questions about your current content, you can move on to what do you want to do with it?

Sometimes just being able to organise all your stuff in a systematic, managed way is very beneficial. Maybe you want to be able to search for all your content in order to find and to do something with it.

Transforming content might be another thing you want to do with your content. You want to convert existing content to another format or another context for a different audience.

You might want to track—literally tracking the content through the various steps of your workflow, or in a broader sense, keeping track of different versions of your content.

And if you’re already tracking your content, you might as well run reports about it, where things have been used, what stage of the process they’re in, who’s working on it and so on.

And finally, do you want to be able to reuse the same content in different formats, different versions or different contexts for different audience needs. Would you like to offer customised or print on demand publications to your customers, for example?

Creating structured content (19:36)

Once you have answered these questions, you can start developing semantically rich, structured content. And to do that, start with breaking down your content.

Determine the level of granularity you want

The term ‘granularity’ refers to the size of the things you’re managing—the greater the granularity, the deeper the level of detail. A high level of granularity would, for example, increase your options to reuse your content.

So, let’s think of output files and layout files—these might have a lower level of granularity. Text files, instead, would have a higher level, where raw data and image files would have even a greater level of granularity.

One way to determine your level of granularity is to decide what’s the smallest reusable chunk of content you want to be using. The size of that smallest reusable chunk will inform your content management strategy as well. Are you going to manage files or things smaller than files? Or maybe both?

There are two essential attributes to the smallest reusable chunk.

One, it is that it has to make sense on its own. And the second one, is that it can be put into another publication and still work semantically. We’re going to go back to that in a second.

Organise content into types

To get started, a good approach is to organise your documents into types. Find samples of all the different publications in all the variations and look at them, then separate the presentation from the meaning.

So, there is a difference between what something looks like and what it actually is. Formatting is related to the presentation of the content, while the semantic meaning has conceptually to do with what the content is.

The example I have on screen is a part, 23-955b, where this part has a format of red and bold, but semantically 23-955b is a part number.

Formatting can change depending on context, but the semantic meaning shouldn’t.

So, talking about semantic, identify the semantic components of each type of document.

  • You can start at a book level and in your book level you might have front and back matter, TOC, chapters.
  • Inside each chapter you might have an opener, a body, and a closer part.
  • Further in are the sub-chapter levels—you might have texts, figures, tables, boxes.
  • At a block level, you might be able to separate heads, paragraphs and captions.
  • Even at an inline level, you might be able to find special formatting within paragraphs.

Once you’ve identified these components, categorise them, just as we said, at different levels—for example, book chapters, sub-chapters, block, inline.

Then compare across your document types to determine the similarities and differences.

After that, make a list of all the components. Making a list of them will help you see everything that you produce, and where and how it is used.

Developing naming conventions, style guides, and templates (24:20)

Once you’ve done that, you’re ready to define naming conventions, styles and templates.

Style naming conventions

Style naming conventions help to ease communication between stakeholders. If we’re all using the same word for the same thing, we’ll understand each other better and enable consistency.

  • It can be used by writers and designers.
  • It should be semantic, not format based.
  • It should be consistent across all document types.

And then, a few tips for your naming convention—start from big and go to small, use or keep a reasonable length. You want to be able to read the different names you’ve created in the tool you’re using. And to separate the terms in your names, use underscores, camel caps, or hyphens, for example.

Editorial style guides

Now, there’s a difference between editorial style guides and design style guides.

Editorial style guides provide guidelines for the way documents are written. They’re not typically concerned with what the documents will look like when it is published.

These guides are for writers and editors, they define the voice of your document. They indicate what types of content are allowed, and they specify wording of specific pieces of content.

You can use a style naming convention in your editorial style guide, depending on your workflow.

Design style guides

Now, design style guides are a little different—they’re meant for designers and compositors, and define the formatting specifications for the document. You can use a design style guide to build style templates in whatever platforms you’re using to author and publish.

Style templates

Once you have your style naming convention, your style guides, you’re ready to create a style template.

A style template is where all the information you’ve discovered about your content so far is put together, so you can practically apply it.

After doing your content analysis, you should have a list of all the different content components you create. You can use your semantic naming convention to name all those components, and then use your design style guide to apply formatting to those components.

So, once you’ve developed a content strategy, conducted a content analysis, come up with a semantic style naming convention and created style guides and style templates, you might want to consider to go further with XML, CMS and automation.

XML-based authoring (28:22)

I’m going to start with XML-based authoring.

XML stands for eXtensible Markup Language. It is the programming language that underpins nearly all publishing workflows worldwide.

It’s often hidden behind the user-friendly interfaces we’re used to, and that includes Microsoft Word, Adobe Creative Cloud, and so on, but it’s there behind the scenes, making everything work.

It’s just a file text, and it has content on one hand and tags on the other hand. And the tags say what the content is.

On the slide right now, ‘What is XML’ is my title. ‘What is XML’ is the content, and in an XML text file, ‘What is XML’ would be surrounded by two title tags.

You can define your own tags, or you can use a standardised set, for example, DITA.

Just a note on DITA, because we’re going to see that further. For the few of us who might not know about it, DITA stands for Darwin Information Typing Architecture, and it’s traditionally used for technical communication.

It’s topic-based in structure, so essentially you divide your content up by topic and map the topics together to create publications.

There are all kinds of applications available that generate, read and manipulate XML content, so there’s plenty of things you can do with XML and with your content once it is in the XML format.

You can validate your content against a DTD or a schema to enforce consistency.

An XML transformation tool can transform different tags to comply with different schemas, so that means that you can easily move from one environment to another without having to copy-paste.

Now, XML must be transformed to usable output formats for end users, so you want to take your XML and export it to a PDF or a web page.

And if your content is in XML and you have an XML content management system, you can actually manage and manipulate content down to the tag level.

Content Management Systems (31:45)

Talking about content management, that would be another thing you can look into.

First, what type of content management?

To start with you want to define at which level you want to manage your content. I said earlier, do you want to manage output files or layout files, text files, image files, text, or data?

According to the level of granularity and the smallest reusable chunk you’ve determined, you’ll choose the level you want to manage your content at.

Then you want to define what you want to be able to do with your content. As we said earlier, do you want to organise it, search it, transform it, track it, report it, reuse it?

Because there are as many answers as organisations, you’ll find many options for content management systems.

The very simplest content management technique, which you’re probably already doing, is getting your files and folders organised.

One broad class of systems is digital asset management or DAM systems. These generally focus on file management and offer functionalities such as file check-in and checkout, versioning, and the ability to add metadata.

The last broad class is Enterprise CMS, these are the ones most people think of when they think of a CMS. In addition to everything that a DAM system would do, an Enterprise CMS often has workflow modules that let you automate some aspects of workflow management.

Again, there’s a wide variety available, so after following the previous steps we saw until now, you’ll be able to say, “This is what we have, this is what we want to do with it and hence, this is the content management system we need.”

Automated publishing (34:25)

Once you have your content in the XML format and nicely managed in a CMS, you might be willing to consider automation.

There are several different kinds of automation that can be introduced to a publishing workflow, and you can do some of them or all of them, depending on your needs.

  • Content aggregation is taking chunks of content and putting them together to make particular publications.
  • Automated composition refers to actually automating the activity of laying out pages, whether for print or electronic delivery.
  • Transformation means taking a content source and transforming it in some way to be suitable for a particular output channel.

In single source publishing, you literally have one source content file that can be run through the automation process to produce all the different outputs you need.

Demo: Automated publishing with Adobe Experience Manager Guides and Typefi (35:45)

I will now show you an example of a solution that combines XML, CMS and automation.

Typefi integrates with the Adobe Experience Manager Guides solution. This solution uses DITA as the source content and is a Component CMS.

Working with Adobe Experience Manager, Typefi is focusing on the assets. Within assets, we have several example demos, and I first will show you the marketing documents.

Marketing brochures

So, within our content folder for brochures, we can dive in and look at our source content in DITA format. Using the editor mode, we can look at the underlying source of our content in the DITA format.

All of these content components are going to be packaged up and sent through to Typefi from that DITA map.

Once the DITA map is selected, you have a Typefi Run Job button added to the interface, and in the Typefi overlay you can choose the workflow.

This is the next part of the workflow here that allows us to take the packaged content from Adobe Experience Manager Guides, send that full package of DITA content and any assets over to Typefi, put it together and compose a fully editable InDesign document.

Once the workflow is attached, we can click Run, and this will immediately trigger the job running over in Typefi. If we look at Typefi, we should see that running right here.

And as it’s being completed, we now have the final output. Using InDesign and Typefi, we’ve produced a fully editable InDesign document, a PDF, we also have the fonts and the links, all these different assets that we can package if needed.

If we go back to AEM, after the workflow has run on Typefi, we return back the InDesign file and the PDF to Experience Manager, and there you can download them and view them, or edit them for further changes.

Let’s take a look at the PDF we generated.

Here we have the final PDF where we can see beautiful graphics, nice placement of content, a few tables, the assembly instructions and our back page.

But, there is a problem here. If you look back at this front page, this caption at the top is hard to read because of the background image, so we want to make some edits to that content.

Going back to AEM, I can download the actual InDesign file and open it in InDesign, and here we can see that indeed, there is a problem. So, we can apply a background effect on a case by case basis, clear this out and export out a brand new PDF that has that little modification added to it.

When opening both side by side, yep, the new PDF is much more pleasing, much more readable. And this is the advantage of having the ability to generate live editable InDesign documents out of AEM.

Disclosure documents

Now, let’s look at another example. So, I go back into AEM and I go back into the top level assets and then into my disclosure folder.

In there, I have plenty of content, I have many DITA files, and I’m looking again for my DITA map to assemble the multiple components for my publication.

Again, I can select this file, flip into Typefi Run Job overlay, attach a different workflow, because I’m using a different InDesign template for this different kind of content. So, I attach this other workflow to it and run this through.

Again, if we switch over to Typefi, we can see this content running through.

Once the workflow is completed, we’ll have again, the different final outputs. And again, we’ll download this PDF and take a look at it.

Here we can see this is for medical insurance, where we have different languages, English and Spanish. We have our sidebar boxes, various numbered lists, and tables. All of this content is generated and rendered on the fly from that DITA source in a matter of seconds.

So, you can see how we can author and manage different types of content within AEM, and then we can publish these different types of content from AEM through Typefi into InDesign to create different outputs.

The PDF document can be used for review and the InDesign document can be edited for further detail.

So, with this solution, you get the full power out of Adobe Experience Manager to author and manage your component-based content and combine it with Typefi’s modular templates and workflows, and Adobe InDesign creative layouts and designs.

Q&A session (43:21)

And that’s it for the demo, and I’m now opening the floor to questions and answers, and to Caleb and Chris.

KATHY: Great, thank you very much, Marie. That was very informative.

I forgot to mention when we first kicked this off that we’ve got a very global team here today. I’m based in Colorado, Marie’s in France, and we’ve got Chris here and he’s in Philadelphia, Pennsylvania area, and Caleb is down in North Carolina. So it’s fun to see all these global teams come together.

So if you do have questions, go ahead and type them into the question dialogue box, and we’ll go ahead and field that. As Marie said we thought we would, you know, take maybe an hour, but we have allocated an hour and a half, so keep the questions coming if you want, and we’ll have Chris and Caleb field those.

Semantic analysis

Okay, so the very first questions were in the very beginning, when you were talking about semantics, and Moran wants to know what do you consider as the source document when you speak of semantics? What is the content source? And do you mean human assessment here when analysing the semantics?

CALEB: Semantics is really just trying to get to sort of a neutral unbiased view of what the content means or what it is. So, in that sense, it can be, you know, whoever’s defining the semantics can have some subjectivity in what they use to define that.

But we were trying to, as Marie’s example, the colours red or bold, those are attributes, those are appearances. That’s nothing to do with the semantics.

We see this when we look at the switch in markup in HTML from using bold and italic tags to using strong or emphasis as the equivalent, where strong doesn’t necessarily mean bold, it could be a colour, it could be something else, but it’s just trying to note some sort of emphasis or something behind the text that is a more neutral meaning as opposed to an apparent or visual sort of meaning.

Especially when we’re starting to think about how content is consumed through alternate means, whether it’s through voice or whether it’s through touch, when you’re dealing with accessibility, semantics are more important.

Validating source XML against a DTD or schema

KATHY: Okay. Can you talk a little bit more about how we can verify XML against the DTDs and schemas? I would think that’s out of the scope of Typefi. Can you talk a little bit about that?

CALEB: Well, a schema or DTD is when you have XML that’s written to a DTD or written to a schema, you can use the actual DTD or the actual schema as a way of validating the content to assure that it’s structurally valid.

There’s a whole other set of ways of looking at content from a business rules perspective, where you’d use a tool called Schematron. And this is where you can create rules above and beyond what’s defined within that schema.

So the schema might say that these are the allowed tags, but your Schematron might say that you have to have an H1 tag before an H2 tag, or that you must have a paragraph tag in between an H1 and H2, or so forth.

That’s the nice thing about schemas and DTDs, is that the structure of the document type declaration or the schema, that is the actual thing you use to do the validation.

KATHY: Right, but that’s something that you add on to Adobe Experience Manager, that’s not really part of…

CALEB: Well, when you’re working with Experience Manager, with DITA, it’s automatically validating your content as you’re authoring. If you load in your own DITA content into Adobe Experience Manager Guides, it will flag if your content does not conform to the schema, if you have invalid markup in some fashion.

Most XML tools have some sort of live or real time validation, or the ability to tap a button and have it go out and check against that schema or DTD on the fly so that you can ensure that your content is going to work.

It’s really critical that your content be structurally sound and valid because if it’s not, then as you push that content through systems like Adobe Experience Manager Guides or Typefi or FO transformations, things will not work because the content doesn’t validate, it’s not structurally sound.

KATHY: Right. I guess when I think more of the Schematron side of things, I’m thinking of complying to style guides.

CALEB: Absolutely.

KATHY: Which aren’t critical to the actual publishing aspect of it. Okay.

Using Adobe InDesign for automated layout

So, let’s see. The access of InDesign documents is possibly only for someone able to work with InDesign. Well, yes. Right. I mean, that’s the whole goal.

CALEB: Yeah. I think that the interesting thing here though, is that with integrating solutions like InDesign Server into XML authoring workflows, that by matching up, you have the XML in the source, and then you’re going to push that against an InDesign template to produce the PDF.

So, the author, they don’t necessarily care about the InDesign side of things, they just want to see the output. So they start with XML, it runs through InDesign, it produces the PDF output and they get that PDF back.

The nice thing about InDesign as a solution or as part of the solution, is that InDesign is far more accessible than solutions like FrameMaker or FO, that there’s a wider audience of people that know how to use InDesign, and that can design those templates to work within these systems than the number of people that understand Frame or can author FO transformations.

That reduces the cost, reduces the time, and so forth.

Enabling collaboration across business units

KATHY: Right, and I see it also, this whole collaboration across departments. So when you think of your tech pubs, maybe they are less interested in InDesign but your marketing teams may be more interested in InDesign, and here’s a way they can share content from a single source.

CHRIS: Yeah. You stole my thought exactly in terms of having those two groups be able to work collaboratively together, where the tech docs people can concentrate on getting the good information and have it well structured and have it all make sense on all the different platforms, but the marketing people can team up with them really nicely to be able to provide that consistent branding that customers want to see throughout their journey.

And certainly, new customers are always looking at those tech docs before they become a customer. If they see a consistency in both the technical documentation and the marketing content that does create a higher level of comfort for them, and it creates that one voice.

So, it all makes sense throughout their entire journey, which is what you want to focus on, is the customer’s journey.

Industries that use Typefi

KATHY: Absolutely, and that is a perfect lead-in to these related questions. So, who are your typical authors that are applying Typefi? And then, you know, what industries have you found mostly adopted the solution? Or markets?

CHRIS: The authors are the subject matter experts on whatever it is. So in the case of The Institutes, it’s the subject matter experts on insurance, the people that are already doing that. Same thing with the customers like Billabong or Rhino-Rack, they’re the subject matter experts that are authoring and they’re going to be using those structured authoring solutions.

It could be something like Adobe Experience Manager, or it could be something like Stilo, and Oxygen, that are doing those in their formats that they’re used to working in.

So that’s really who is authoring, and the industries are extremely varied. We have extremely varied customers, anybody from insurance companies to manufacturers, to educational publishers, journal publishers—that’s who uses Typefi as one part of their publishing solutions.

Integration with a range of XML authoring tools

KATHY: Right. So I’m going to interject my own question here. You mentioned tools like Oxygen, so you can also do a transform directly from Oxygen? Like you showed in Adobe Experience Manager?

CALEB: What we have with Experience Manager is we have a direct integration, so you can actually publish directly from within Experience Manager through Typefi.

With Oxygen or any other XML native application tool, we can take that XML and you can use the Typefi web interface and say, “I just want to run this XML against Typefi.”

There has to be a transformation to go from your source incoming XML into the Content XML schema that Typefi uses, it’s optimised for composing with InDesign.

You know, we have some XMLs, you have a sort of an output class, and we’re going to map that to a paragraph style, maybe, when it comes to InDesign. So there is some remapping that has to go on.

When you start talking about scholarly publishing, one of our customers is the New England Journal of Medicine, and they use the JATS standard. It’s extremely detailed for how you describe the authors, and every component of an author name is a separate XML tag.

All that gets essentially flattened into a string for composing into print, because you’re not necessarily worried about first name, last name, affiliations, and all the other different pieces that make up who they are as discrete style elements.

So, we can flatten that and push it through into InDesign much more efficiently and quickly as a string.

There’s always going to be a transform in there, but yeah, we work with dozens of different XML standards as well as proprietary formats.

Content reuse

KATHY: What about content reuse, how do you plan for that?

CALEB: I think in many cases, people haven’t planned for content reuse.

You start off, you think, well, this is the way it’s going to be used, and then down the road, someone comes back and says, “Oh, we want to use this somewhere else.”

This is something that we have continually seen over the years where, especially if you’re not XML-first and thinking about the way that your content is structured and thinking about the semantics, that you can get yourself in trouble where you haven’t marked up the content in a way that makes it easier to reuse.

The worst case scenario that I can recall from my time with Typefi was with an educational publisher where they were converting a back catalogue of content that was originally designed for print, and then they wanted to make sure they had XML because they knew they would reuse it sometime.

But they went through all of their highlight boxes or shaded boxes within this textbook, and they were all just tagged as ‘special feature’, because they were all special features, but there was no distinction between Special Feature A versus Special Feature B.

So, when it came back out to repurpose and reuse that content, all of these special boxes, which had different themes or aspects that they were trying to highlight, all ended up looking the same. There was no differentiation there.

They realised their XML wasn’t detailed enough, and so they had to go back in and fix all of that XML.

We also see this when you’re starting to think about, you know, hot topics or issues that might be sensitive to certain demographics that you want to rephrase or change the way something is presented depending on the audience, and making sure that you have enough markup around that.

Thinking about content reuse, just in US English versus UK English, different spellings and so forth, all of that comes into play and it can be a bit of a challenge to try to anticipate all the avenues of reuse.

Generally speaking, we just try to guide customers and people into sort of thinking about the immediate future and the way that you’re using your content.

And just trying to make sure that you have as semantic a set of markup underneath it as possible, and then sort out the intricacies of that reuse when you actually get to that point of reusing it.

Integration with a range of Content Management Systems

KATHY: Right. I should have asked this question when we were talking about the authoring environments. But they’re also asking about other CCMSs.

CALEB: Yeah, absolutely. If we’re just talking about Typefi, we have a web services interface, a RESTful API, and so we can tie into and integrate with any sort of other CMS, you know, we’ve done work with Documentum and SharePoint and other sorts of things.

It just depends on sometimes you have to have a middle layer between the CMS or a CCMS and Typefi to do that management.

That’s like what we were showing with the AEM integration. We have an add-on to Adobe Experience Manager that can, using our API, talk to Typefi, return back, here’s a list of the projects or workflows that are available. So you can target that and send it directly to that piece.

That was some custom code to glue everything together, but yes, you can absolutely use any of a number of different CCMSs with Typefi.

Developing transforms

KATHY: Okay, and what about the transforms that you said are going into InDesign, are there out of the box styles that you’ve created? Or do you do custom transforms? Or do your customers create their own transform? Talk a little bit about that procedure.

CALEB: Sure, sure. I’ve been with Typefi now for 15 years, and over those 15 years, we’ve developed a number of core libraries.

We have libraries around DITA content. We have a library around OpenOffice XML and libraries around JATS and BITS and ISO STS XML. All of those are available as sort of stock starting points.

And then, because our experience is that no one uses the same XML standard the same way, that everyone has customised it or tweaked it or has their own custom rules around the way that use it, and so then it may be, there’s an overlay transform to handle those little differences.

We have a number of customers where they may start with us building the transform for them and then at a certain point, they get comfortable with doing the transform themselves, and so they take ownership and management of their own transform.

We have other customers that have just built the transform directly themselves. The Institutes, that was a subject of Marie’s talk there, they built their DITA transform themselves.

They worked with a consultancy in the Philly area called Jacquette, and Jacquette are the ones that built that transform and worked with Typefi, and we worked sort of hand in hand over a period of weeks to create and refine and develop that transform.

How long does it take to set up a Typefi workflow?

KATHY: Okay. So what about the setup time? Assuming that you’ve got your raw sources already in XML. How long does it take to set up these workflows? And talk a little bit about that.

CALEB: The cop out answer is, it depends. But there’s a lot of truth to that.

It depends on the complexity of the output and what you’re attempting to achieve on the InDesign side of things, from a layout perspective.

It depends on the depth or the breadth of the source XML and how much variance there might be.

When we’re dealing with DITA that’s used for, like, in that example that Marie showed of the disclaimer, you’ve got basic lists and some tables and headings and hyperlinks and so forth. It’s a subset of what’s possible within DITA.

So, if that’s our starting point, and that’s the kind of content you want to publish, then you’re really talking days, total, to get a transform built, to build the InDesign template, and to go live.

If you’re dealing with something like designing a product marketing brochure or a travel guide or something that’s much more complex, both from a markup perspective as well as from a layout perspective, you might be looking at weeks.

It sort of depends on where you fall within that gamut.

I would typically say if we don’t have a transform library, like it’s a brand new XML that we haven’t worked with before, we would generally look at a month to wrap our heads around that transform and build the implementation on that.

Once we get the transform built though, the InDesign template, the training, what you would see as a customer, you’re talking days. Even our most complex customers, we have done sort of start to finish end to end training on how to use and work with the system within a week.

The heavy lifting is done offsite beforehand.

KATHY: Great. And I assume when you talk about those workflows from B to C content, as well as technical content, I mean, I think it’s…

CALEB: Yeah.

Transforming proprietary DTDs

KATHY: All right. What about transforming proprietary DTDs? Are they handled differently?

CALEB: There’s no real difference there. Obviously, we would need access to the DTD if we’re building a transform.

But from a systems perspective, we have fully documented what the Content XML schema is on Typefi’s side of things. Our API is published.

So if you want to kick the tyres in Typefi, and build your own transform from your own DTD or source content, there’s a possibility of arranging that kind of engagement where you take it.

But yeah, we would approach a proprietary DTD the same way we would approach any other XML implementation.

KATHY: Good. And then I think we’re good at this point. Nope. There’s a follow up question, in particular, the SCHEMA ST4 DTD.

CALEB: No, I don’t think we’ve actually done any projects with the ST4 DTD.

Typefi at ConVEx (1:07:20)

KATHY: Okay. All right. I know, Marie, did you have anything more there? Yep. You did. I thought so. So she’s going to do a little wrap up here, and if you have other questions, feel free to type them in, but better yet come to ConVEx, where the full team will be there from Typefi, but go ahead Marie.

MARIE: Yeah. Well, as you can see here, Chandi, Chris and Caleb will be having, what can I read, live panel or test kitchen. I really want to know what a test kitchen is. And also candid conversations. So join them for that.

KATHY: Yeah. Our test kitchen concept started a couple of years ago at our conferences where it’s the idea that it’s supposed to be a little bit more hands-on, a little bit of a showcase specifically for our exhibitors that provide tools. So that’s the idea of what the test kitchens are.

Then the candid conversation is all about watching the presentations ahead of time at your leisure, and then just having one-on-one conversations about the topic, about the presentation.

The live panel is just what you think, we have many different folks on a panel, there’s typically four, they’re all considered experienced experts in whatever they’re talking about.

Like Chandi with publishing. We have our own Brianna Stevens who teaches workshops on how to do transforms mostly to PDF and HTML, and so she’ll be on that panel as well. And I believe the Miramo folks are going to be on there. I can’t remember who the fourth person is on the publishing panel. That’s what the panels are about.

Q&A continued (1:09:18)

All right, another question came in. They always do, when we think we’re wrapping up but then another question comes in.

Typefi and PDF accessibility

Are your PDFs accessible and meet PDF U/A standards?

CALEB: Yeah, so PDF U/A is universal accessibility. It’s a relatively new standard.

The basic answer to this is from an accessibility perspective, yes. That’s one of the beauties of the way that Typefi builds documents in that automated fashion.

When you build a document in InDesign, the creation order of the content affects the tag order that Adobe exports that PDF with. And so, if you are not very careful in the way you construct an InDesign file upfront, then your tags can be kind of jumbled, which affects the accessibility.

That’s the beauty of the automation, is that when we automate to InDesign, we are creating things in a very specific order, which inherently makes it more accessible.

Now, there are some deficiencies in Adobe’s implementation of their tagged PDF export, and you’re not going to get 100% Section 508 compliant PDFs out of InDesign, but you will get very close to 508 compliance, and much closer with Typefi working with you on that.

In that sense, you know, we’re dependent on the host application being Adobe InDesign, and so we can maximise and optimise the output to get as close as possible.

So yes, you’ll almost get there, and it requires very minor edits within Acrobat to get it to that final compliance level.

Conclusion (1:11:32)

KATHY: All right. Well, I think we’re gonna wrap it up. Caleb, Chris and Marie, thanks so much for enlightening us on the whole automation process here. We look forward to chatting with you virtually at ConVEx or doing another webinar in the future.

CHRIS: Thanks everybody for attending.

CALEB: Thank you so much.

MARIE: Thank you for your time.

Your Typefi presenters

Marie Gollentz

Marie Gollentz

Project Manager Consultant

Based in Paris, Marie was previously a Senior Solutions Consultant and is now Project Manager at Typefi. Prior to joining Typefi, she held a number of positions in the publishing industry in London, including at the publisher of Research Fortnight and the London School of Business and Finance.

Marie holds a Masters degree in European Political Sciences from the Autonomous University of Barcelona and a Bachelor’s degree in Political Sciences from Sciences Po Strasbourg. She is trilingual in English, French and Spanish.

Caleb Clauset, VP Product at Typefi

Caleb Clauset

Vice President Product

Caleb drives the vision and strategy for Typefi’s products, and cultivates strategic partnerships with developers to extend Typefi’s core capabilities.

He is an award-winning designer and Adobe Certified Expert in InDesign with over a decade’s experience designing, developing and implementing publishing technology. He holds a Master of Graphic Design from North Carolina State University and a Bachelor of Science in Architecture from the University of Michigan, Ann Arbor.

Chris Hausler

Chris Hausler

Business Development Director

Chris has over 20 years’ experience in improving organisational processes through technology, including a decade working with publishers to define and deliver solutions that dramatically improve the way they publish content. He is always excited about the opportunities offered by new technologies.

Chris holds a Bachelor of Arts (Economics) from the University of Richmond, and a Masters of Business Administration from Widener University.