var popunder = true; XML Editor Review

Tuesday 14 February 2012

XML Editor Review

Tim Berners-Lee's dictum, "If it's not on the Web, it doesn't exist," may now be supplemented with, "And if your business is not creating XML content, it may soon cease to exist."
XML is changing the way content is created - for print publishing, for the Web, and for other new distribution channels like ubiquitous PDAs and WAP-based 3G cell phones - all of which are connected by ever-faster Internet connections. In recent years, XML has also become a leading method for moving content, with RSS feeds providing stories from blogger sites, transmitting audio and video from new podcasting sites, and managing Web Services, which communicate via XML-RPC, SOAP, and other behind-the-scenes protocols that have largely replaced CORBA, D-COM, and even EDI as data-exchange standards.
XML is returning to its roots as a markup language for creating and structuring content. From high-end publishing tools like Adobe FrameMaker, InDesign, and Quark Xpress to Web-content tools like Adobe GoLive and Macromedia Dreamweaver, and on to Microsoft Office 12 and OpenOffice (not to mention Google's stunning entry into document creation with its acquisition of Writely), XML export and import and even XML creation and storage are now becoming commonplace. Yet XML content creation and delivery is at least three times more difficult than HTML, which in turn is many times harder than writing a Word document. So the success of the new content tools depends on making structured authoring, or guided writing, appear to be much easier than HTML: "As easy as Word" is a popular marketing slogan.
In the high-end print publishing industry, XML content is advancing on several fronts. Microsoft Windows Vista will offer its XML Paper Specification (XPS) to the print industry as a rival to Adobe's Portable Document Format (PDF). Best-of-class desktop publisher Quark Xpress has lost significant market share to Adobe InDesign and hopes to recover by supporting XML Job Definition Format (JDF). IBM has abandoned decades of DocBook and SGML in favor of DITA (Darwin Information Typing Architecture). IBM donated DITA XML technology to OASIS, which has made it a standard. And Adobe announced that it is moving its technical documentation to DITA, starting with its Creative Suite docs. Finally, as markets become global, companies localizing their content will be doing it with XML standards like Translation Memory eXchange (TMX) and XML Localization Interchange File Format (XLIFF).
To get a handle on the tools that make this all happen, we reviewed 12 tools commonly known as XML Editors. We found some terminology confusion in the market because some of these "Editors" are clearly aimed at the authors, writers, and editors who create the core content, while others are for the application developers who build the structure and styles within which the former can do their guided writing. We will distinguish the two as XML Author Editors and XML Developer Editors (an IDE, or Integrated Development Environment). And as you'll see, we found that one product suite does both.
Our XML Editors were selected from nearly 80 listed on the CMS Review site, and we are confident they are the best XML content-creation tools on the market. We looked far and wide to find American, Austrian, Canadian, Dutch, Irish, Japanese, Romanian, and Russian software development teams. Collectively, these tools are in use by millions of content creators around the world working in XML.

The Three Layers of Content Management
Separating presentation (layout, style, format) from content is a CMS cliché. This can be done quickly with cascading style sheets (CSS) and HTML (or better, XHTML). Separating off the presentation into style sheets is a lot harder than straight HTML, but the result is much more manageable and reusable content.

The three layers of content: the core text (XML),
the structure (XSD), and the style layer (XSL).
XML further extends the amount of separation so that three basic things result: 1) the XML document containing the pure content itself, 2) a DTD or XML Schema Document (XSD) with the allowable structural elements and their attributes to validate the XML document, and 3) XSL style sheets (possibly XSL_FO and XSLT transformations) to convert the core content document to multiple output channels with different presentations.
The three layers correspond to different types of markup. In the content core, the markup is semantic, with meaningful tags like name and price, the RDF properties that will drive the Semantic Web. In the structural layer, markup is syntactic, with tags like div and span. In the outer layer, markup is stylistic or presentational, with tags like font, b, and i.
The layers also correspond to different professional skill sets - designers on the outside, architects building the middle layer, and content authors in the core. XSD, XML, and XSL are three components like the red, green, and blue signals of component television. Separating CSS from XHTML is, like S-Video, good but not HDTV.
And finally, in an effort to make the objective of XML content-creation tools more readily understandable, here's a railroad metaphor: we can say the Developer Editor tools build the structure rail on one side (the content model) and the style rail on the other (the presentation model) that keep the author/editor on track with guided writing.
Author or Developer? Both
Since almost every major content-creation tool (e.g., Word) can now export documents in XML format, why do you need a dedicated XML Author Editor and XML Developer Editor? Because creating well-formed XML documents is not enough. Your organization needs XML documents that are structured properly. That means documents validated against your content model - the schemas and DTDs that describe allowable content elements and attributes. An XML Author tool will create valid content immediately, instead of starting from Word files, whose XML output must be painfully converted to valid XML. Many organizations create Word templates for authors, then parse the Word XML to match their schemas. XML Author Editors eliminate this time-wasting step.
Second, your XML will need to be re-purposed to feed ever more types of output channels as you exploit multiple opportunities to publish your content. The XML Developer Editor will let your application developers design and build the several XSLT transformations that will be needed behind the scenes to publish your content. So now that we've framed the whys and wherefores of XML tools, it's time to take some out for a spin.
Test Method
We downloaded trial versions of all 12 tools and installed them on a new dedicated test platform at CMS Labs. We read the online documentation (many PDFs we downloaded and printed out). Overall, existing online training was good, though video training from Altova, Stylus Studio, and SyncRO Soft really stood out. When companies offered us personal online training (with screen-sharing sessions), we accepted.
We joined online company forums where they were offered and read some posts to get a sense of the user communities. We also joined independent mailing lists of user groups. When we encountered problems, we first Googled the error message or situation, then we sent in questions to vendor support. We found the company is usually a user's last resource for help.
Once all the tools were installed, we created a test set of XML documents, XML schemas, and XSLT style sheets. We took them from CM Pros' Design Patterns for reporting Best Practices initiative and we simplified one of these to make a DITA XML test document.
We recommend you follow a similar methodology, working with test documents drawn from your own content. If you have no structured content, you have three options:
  1. You can get a consultant to structure your content (or learn yourself). This may be long and difficult, but in the end, a rewarding process.
  2. All the XML Developer Editor tools can extract structure from a collection of similar-instance documents. They infer a content model (allowable elements - you may need to fine-tune the order and frequency). They then give you a schema document you can use to create more documents structured to be consistent with your existing content.
  3. You can work within proposed industry-standard content models like the new DITA structures for topics, concepts, tasks, and references.

Feature Evaluation
We would like to make a point that, given the space limitations of this print article, we can cover only a small number of the key features that distinguish these tools. We plan to create an online report that will provide more details, including many screenshots of key features, a glossary of terms, links to documentation on all these XML Editors, and mention of dozens of other XML Editors that did not make the cut for EContent but might fit your niche.
For a sophisticated developer, any good text editor can be used to create XML documents, XML schemas, and XSLT style sheets. Among the most popular are HomeSite and UltraEdit (Windows), BBEdit (Mac), Emacs (Unix), and jEdit (cross-platform Java), our favorite tool. These typically have validation, tag-completion, elements in context, etc. And jEdit recognized the ditabase.dtd properly from its DOCTYPE declaration, where our reviewed XML Developer Editors surprisingly could not.
Here are the key features we focused on in our examination of these tools:
IDE. An XML Developer Editor that is the central part of an Integrated Development Environment is a must-have tool for any organization moving its content to XML. Since that now includes all content, at least one person in your organization needs an IDE. We judge the IDE tools by how many different aspects of XML they support. They should not only work with XML schemas and DTDs, they should provide full creation and debugging tools for these schemas. Perhaps even more important, they should let you design, debug, and deploy the many different style sheet transformations (XSLT) to repurpose your content for multichannel publishing. The best of them are part of a coordinated suite of tools that implement the many other options for XML development, which includes an alphabet soup of acronyms.
Altova’s source code view can switch to a Grid/Structure view, plus WYSIWYG (Authentic) view and a browser view
WYSIWYG. A simple XML Author tool that enforces the structural and styling constraints of XML schemas and XSL transformations can do this without revealing complex behind-the-scenes machinery. It should provide a comfortable and familiar interface for content creators used to working in standard what-you-see-is-what-you-get word processors. If the ratio of content creators to content editors and designers is high, you will need many more XML Author tools than XML Editor tools.
Validation. Guided writing, or structured authoring, works best when the writer is constantly assisted in doing the right thing. The ideal is continuous real-time validation against the content model rules in the schema. Even more rigid is to not allow the possibility of error. So validation comes in a range of settings. It can be turned off for experts. It may be done by clicking a request for validation. It may provide only warnings. It may actually correct or prevent errors. Some tools only allow both start and end tag sets to be inserted.
Elements in context. When adding structural elements, the best tools display a context-dependent list of the "allowed" or "available" elements that can be added at the current insertion point in the document. This can be in a separate window pane or a floating palette, a drop-down menu when you type an open tag, or revealed by right-clicking at the insertion point. Some Editors allow you to turn this down, or completely off for power users.
Tags-on view. Though they disrupt the pure WYSIWYG look, optional visual representations of the start and end tags for structural elements are very helpful.
Structure, or Tree, view. A hierarchical view of the document, which expands and contracts elements like an outline tool, letting you move around quickly in large documents. More powerful Editors let you move structural elements in this view and synchronize changes with other views.
Grid view. An arrangement of your content in something like spreadsheet cells. Cells can be moved, preserving their internal structure.
Drag/drop structure. The best Editors allow selection of the whole structural element, then drag-and-drop of the element - only to locations that are valid for the specific element, of course.
In Arbortext Editor we have selected a short description element in our
DITA document and are dragging it. The red "stop/no parking" symbol
indicates shortdesc is not allowed inside the paragraph element.
Arbortext will let you drop an invalid element if it thinks it can make it
valid after you drop it, in which case the symbol is a blue "plus" sign.
Source-code view. Some pure WYSIWYG Editors can show the source code for changes best made while looking at the XML source. Top tools highlight the syntax with your choice of colors so authors can easily distinguish the code from the content.
Tag auto-completion. When typing tags yourself, auto-completion is one way of enforcing correct structure.
Line indenting. Source code that is hard to scan quickly is useless. The best tools indent child elements to reveal the document structure. They may do this as you type or on-demand. But watch out for tools that add white space where you don't want them in mixed XML content. They should "roundtrip" your code cleanly.
XSLT processor. Built-in XSLT processing lets you view the multiple transforms to publishing output channels.
XPath/XQuery. Quickly find elements anywhere in a document that share a property and may be targets for special repurposing in your XSLT outputs. Some tools populate the XPath with your contextual position. As you click in the document your XPath appears automatically in the search box.
Native XML databases. Berkeley and eXist databases store your XML as is.   Many content management systems do as well.
Specific schemas. Support for well-known XML "vocabularies" like DocBook, DITA, etc. Support also for different schema standards, like DTD, XSD (XML Schema Document), Relax NG, NRL, etc.
DITA. The Darwin Information Typing Architecture standardizes schema components that can be specialized to a variety of technical documentation needs.
DocBook. While DITA accomplishes most of what the SGML DocBook standard provides, some tools continue support for DocBook.
Package size. The amount of code downloaded gives you some idea of the amount of work put into these great tools.
Communities online. Judging a book by its cover is a bad idea, but listening to what users are saying about these tools is a solid criterion for judgement.
Google PageRank. Another objective measure of a company is its Google ranking on a logarithmic scale from one to ten. Note that every additional step up in rank means roughly ten times the importance of the site.
Templates. A library of "stationery" resources, XML documents that authors can start with, knowing they are valid for certain schemas and support specific output style sheets.
Developer tools. All our Developer tools can extract schemas from well-formed XML documents. Some offer visual schema designers. Some provide schema management to assist in the creation of compound XML documents that contain elements from diverse schemas and namespaces, e.g., Dublin Core, SVG (scalable vector graphics), MathML, etc. Some visual XSLT style sheet designers are synchronized with their code view. Look for new XSLT 2.0 support.
The visual design view above is synchronized with the source code view.
Spell check. This can range from simple checking with a customizable word list to dynamic word and phrase completion to insure that writers use terminology consistently throughout the organization.
Multilingual. Unicode support for integration with translation tools.
There are other important features you should evaluate when considering the selection of an XML tool; however, we didn't include these in our matrix because all 12 tools we evaluated have them.

No comments:

Post a Comment