======Augmenting the MSAA API Enhancing Accessibility and Multi-Platform Development (presented at CSUN 2007)
This session introduces a new accessibility API which complements Microsoft's earlier work on MSAA. This API fills critical accessibility API gaps in the MSAA offering. The new API, called IAccessible2, was built out of necessity to produce a usable and accessible OpenDocument Format (ODF) based office suite for the Commonwealth of Massachusetts. IAccessible2 is an engineered accessibility interface allowing application developers to leverage their investment in MSAA while also providing an assistive technology (AT) access to rich document applications such as the IBM Workplace productivity editors and web browsers such as Firefox. The additional function includes support for rich text, tables, spreadsheets, Web 2.0 applications, and other large mainstream applications.
It is important to note that IAccessible2 has been harmonized with the Linux accessibility APIs to allow for efficient multi-platform development. This was done through close collaboration with accessibility architects who have intimate knowledge of the Linux accessibility APIs.
Equally important is that IAccessible2 was a joint effort with the leading AT vendors. These vendors saw the need for an enhancement to MSAA and worked closely with IBM architects and engineers during the definition of the API as well as the implementation of the API in the IBM Workplace productivity editors and the screen readers.
The Microsoft Active Accessibility API, MSAA, was released in 1995 and was a long needed advance of the state of the art at that time which allowed application vendors and AT vendors to provide engineered solutions to the problem of providing accessibility on the Windows platform. Over the years it has become clear that additional engineered interfaces are needed. Some notable deficiencies in MSAA are significant shortcomings in the support for text controls, tables, hyperlinks, and relations between accessible objects.
To provide for the missing function AT vendors were forced to use OS hooks, heuristics, reverse engineering, Off Screen Models (OSMs), and many other non-engineered solutions. This resulted in very brittle code and solutions uniquely created on an application by application basis. Solutions would often have to be reworked in response to updates of the applications and updates of the OS software.
In addition to these requirements, a recent requirement is the need to support AJAX web based applications, i.e. the ability to dynamically add new roles, states, and relations.
After the release of MSAA the state of the art continued to advance on other platforms. First on Java in 1998 when the Java Accessibility API was developed in a joint effort by accessibility architects from IBM and Sun Microsystems. That work was further improved by OpenOffice.org when the UNO Accessibility API was developed and by the GNOME community when the Linux Accessibility Toolkit (ATK) and its companion Assistive Technology Service Provider Interface (AT-SPI) was created.
Any improvement would be influenced by those post-MSAA developments. Another major requirement for IBM and many other vendors is the ability to efficiently create applications for multiple platforms. Therefore any Windows based accessibility solution would have to be harmonized with solutions for other platforms.
Another important requirement for a software interface is that it must be standardized so that any number of applications can be assured of support by the AT vendors and AT vendors can be assured that they will support any number of new applications with minimal effort.
Of the several reference APIs examined, the Linux ATK has several advantages: It encompasses the features from the others, it is general purpose for multiple application types, and it is still evolving in an open environment. It also includes several advanced features to support new kinds of applications.
So a detailed comparison of MSAA to ATK revealed some missing interfaces, events, roles and states. The roles unique to ATK reflected greater support for rich document content. The states unique to ATK reflected support for richer controls and richer interactions with those controls.
The AtkAction interface adds support for multiple actions on a single control, and also provides a method to reveal the keyboard equivalent. Though multiple actions on a single control may seem excessive, most of us have experience with a browser back button that can either single step through prior pages or display a history. By supporting multiple accessible actions on a single control, the rich semantics of this kind of control can be revealed to the assistive technology user.
The AtkImage interface adds a means to reveal information about an image in a control or in rich document content. While many controls only supplement or replace textual content with a graphic, there are some that reveal semantic information in addition to the text. The implementation of AtkImage provides a way for the application author to reveal the semantic meaning of the image.
Many applications and rich content documents present content in the form of a table. The AtkTable interface provides an engineered interface to clearly reveal the table context of a specific cell. For example, the table caption and summary, the row and column numbers, their headers, and their description. It also provides a streamlined means to determine row, column, and cell selection.
The AtkText interface provides engineered access to rich text contents, layout, selection, attributes, and caret position without having to rely on hooking text drawing calls in the operating system. AtkHypertext and AtkHyperlink are added to expose substrings that perform document navigation or other actions. These interfaces, and some thoughtful structuring, enable applications to expose the rich semantic experience of complex interactive documents.
The concept of exposing a semantic relationship between two or more controls is exposed by AtkRelation. An AtkRelation has a type and a description to expose the semantic relationship to the user. The relation is set on one accessible object and also has a reference to another accessible object to define the parties that have this relationship. The “labelled by” and “label for” relations are the easiest to understand. And since the “other control” only has to be an accessible object, the user interface is free to use non-text objects like check boxes and radio buttons as labels for other input controls. This also has the benefit of removing placement restrictions on labels that were needed to give a hint to the AT that a specific control is a label for another.
New kinds of applications are enabled through the ATK features that allow an application to create and expose custom roles, states, and relations.
Windows application authors and assistive technology vendors are already familiar with MSAA and their COM implementations. And for the information provided, it is reliable and works well. So an important feature of this new interface is that the implementation and semantics of current applications and ATs should not change. In fact, it builds on the existing COM foundation to expose only new interfaces, roles, states, and events.
These design foundations guided the specification of the current set of IAccessible2 interfaces that are based on the difference between MSAA and ATK. Application vendors can incrementally add IAccessible2 interfaces as needed to provide improved accessibility to their application. And AT vendors can keep much of their current design, only adding a check for these new interfaces before using their legacy heuristic code.
After the problem was understood the new interface IAccessible2 was designed. It was recognized that a key factor in the success of IAccessible2 would be close collaboration between the interface designers, the application developers implementing IAccessible2 and the AT vendors using IAccessible2.
On the application side, IBM's Workplace Productivity Editors needed to provide the leading ATs access to their significantly advanced function which includes the creation of word processing documents, spreadsheets, and presentations. These are very complex applications and a significant accessibility challenge. The IAccessible2 interface was very well suited to this challenge and the IBM development team has implemented the IAccessible2 interfaces in menus, dialogs, and client areas.
In addition, AT vendors saw the advantage of IAccessible2, i.e. the means to create a robust solution usable across many different applications. The leading AT vendors have added IAccessible2 support to their screen readers. These vendors and the IBM productivity editors accessibility engineers were deeply involved during the definition of the IAccessible2 interfaces and the implementations of the productivity editors. IAccessible2 is very well proven as a result.
Note that IAccessible2 is complementary to MSAA. The AT vendors continued to use their existing MSAA code. Their rework consisted of replacing fragile non-engineered solutions with engineered IAccessible2 solutions.
In order for the IAccessible2 API to be attractive to the widest audience it must be standardized and that standardization must be open for review and improvement by any interested party. IBM has donated the IAccessible2 specification to the Linux Foundation (LF). Participation in this group is open to all interested parties. Note that IAccessible2 is not source code and is considered to be an open standard rather than open source. That means that it can be used by both proprietary vendors and open source communities.
The IBM donation includes the following:
Further work is underway to include the following:
A significant result of the comparison between MSAA and ATK is that adding the IAccessible2 events and interfaces to MSAA yields application accessibility semantics that are very similar on both Windows and Linux. This is important to applications that run on both platforms and need to be accessible on both platforms. Some examples of applications that will benefit from this similarity are IBM's Workplace Productivity Editors, Firefox, and Eclipse.