Posts by author mdcurran

Microsoft provides grant to improve NVDA

Microsoft has recently decided to support NV Access and the NVDA screen reader project by providing funding and support that will allow NV Access to improve NVDA in two key areas:

  • Allowing NVDA to support Microsoft's UI Automation API in Windows 7 (the next Microsoft Windows Operating System); and
  • Improving NVDA's support for Microsoft Internet Explorer and other related applications.

Please visit the NV Access site for a more detailed blog post.

NVDA Weekly Update (December 18 2008)

This past week we have been busy preparing for the up and coming NVDA 0.6 p3 release, which we hope will be out in the next week or so. Work has been mainly fixing bugs that we and users have been noticing as we check that all is working as it should for the release.

I have fixed a bug introduced some time after 0.6p2 where NVDA would not detect that a new page had been loaded in Internet Explorer. This code broke when we moved to a new version of the Python COM Types library, which seems to have changed the way it handles COM events.

Users of Windows Mail or Outlook Express who read their emails in plane text will be happy as now when a message is opened, NVDA makes sure that the user is actually placed on the message text so that they can read it with the arrows. This is actually a bug in the Email software, but NVDA has a pretty simple peace of code to make this a little nicer for the user.

Jamie has been fixing some bugs in NVDA's Audiologic synth driver. As we do not have access to all synthesizers and braille displays we support, it is rather hard to make sure that these drivers will always work. Though we do rely on users to let us know if something does break, and to provide suitable information so that we can try and fix the problem.

I have spent a bit of time fixing some issues with text selection in NVDA. In some situations in Windows, when the user moves away from a text control that has selected text, the selection is unselected at the same time the focus moves. NVDA now should hopefully not inappropriately announce this selection change as the focus moves, as it is some what unuseful to the user. Users may notice this fix especially on the Internet Explorer address bar, and in Thunderbird 3 when moving from an email address field after an address auto-completes.

I have also possibly fixed a small bug in NVDA's support for Outlook Express where it would play its error sound when the user is sending an email.

Jamie has fixed a bug where NVDA would freeze, or read too much, or play its error sound, when bringing up the Bookmarks dialog in Mozilla Firefox. The issue was that NVDA was going in to an endless loop when collecting the text to speak in the dialog. Technically this is a bug with Firefox, but he was able to change the code for other advantages, which also indirectly fixed this problem. He has also improved the rule for when NVDA should respond to alert events and speak them. NVDA will no longer speak an alert event if the focus is already inside it as NVDA therefore must have already spoken the alert.

Finally, Peter has continued to be an extremely important asset to the NVDA project, updating NVDA's language and documentation translations on behalf of our translators, providing support to current and perspective translators, and of course also fixing bugs and adding new features through out NVDA's core and application-specific code. I personally would like to extend my thanks for his time and effort.

NVDA weekly update (December 10 2008)

Many different things have been worked on in NVDA in the past week. Much of the work has been to improve the user experience, though a few bugs have been fixed as well.

Probably one of the most important things for this week is that we have finally been able to stop crashes in Firefox in Windows Vista while using NVDA. We have known about this problem for a rather long time, but have not ever been able to reproduce it ourselves. However, due to some unknown reason, my wife's computer suddenly started doing this, and although it was very annoying for her, it was great for me as I finally had a chance at investigating the bug.

It took me about 20 minutes to fix, after I was able to run Firefox on her computer with a debug build of our virtualBuffer library. In the end the bug was a coding error probably on my part, although it is extremely surprising that Firefox wasn't crashing on everyone's systems. It seems that some installs of Windows Vista are just that little bit more secure than others.

Technically the problem was that our virtualBuffer library was trying to deallocate some memory it really wasn't allowed to deallocate. Specifically it was trying to free a string literal in c++. Again, Although this is a pretty bad error on a programmer's part, it is strange how this hasn't caused many more problems.

This fix has stopped crashes with Firefox in Windows Vista, but we are still aware of a few other crashes.

As NVDA is always in rapid development, its structure can change quite a bit over time. This has started to be come an issue when users try to install a new copy of NVDA over the top of an old one. To try and fix some issues caused by this, I have now made the installer detect and uninstall any previous copy of NVDA first before installing the new copy. As this meant I was playing with a lot of complex installer code, I also spent some time cleaning up that code, hopefully making it a little more readable. I also changed the publisher on the installer to nvda-project.org, as before it was Michael Curran (which isn't very correct).

Michel Such provided a patch for the installer, which makes the temporary copy of NVDA started by the installer use the user's existing NVDA configuration file. This means that when installing a newer copy of NVDA, it will now read the installer with your choosen voice preferences etc.

Thanks to Simone Dal Maso and others, the NVDA Excel editor dialog now works with localized formulas. This means that just like when editing formulas in Excel with its own formula bar, you are able to read and write them in the language your computer is set to, rather than just English.

Jamie spent quite a large amount of time deep in NVDA's menu handling code. He originally started trying to solve one particular problem where NVDA does not announce the focus when exiting a context menu, but in the end he has completely refactored the code, making it much easier to understand, and has also made the following noticeable changes:

  • fix: Improved the detection of the focus when leaving context menus. Previously, NVDA often didn't react at all when leaving a context menu.
  • fix: NVDA now announces when the context menu is activated in the Start menu.
  • fix: The classic Start menu is now announced as Start menu instead of Application menu.

A small bug in NVDA's sayAll code has been fixed. This bug caused NVDA to play the error sound on a rare occasion where NVDA was performing a sayAll, but then it needed to speak something else. I certainly noticed this bug when reading a large document, and then pausing for a few minutes and going back to it.

When arrowing around web content in a virtualBuffer, NVDA always tries to scroll the object you are reading on to screen, so that sighted people can see what you are looking at, and of course so you can interact with it perhaps with the mouse. However, sometimes for some unknown reason, NVDA gets a little bit out of sync with the web browser and its possible you may move to some text that is representing an object that may not actually exist at this point in time. This was causing some issues where NVDA would throw some errors on particular pages, which was rather annoying to the user. Jamie has stopped these errors from being shown to the user, so the experience will be a bit nicer. Although we still are not sure if this is just hiding a much deeper underlying problem. But for now, this works.

When the selection in an edit field or document changes, NVDA detects this and speaks the change. However somtimes, such as in a select all, NVDA may see that the entire content has been selected, and try and speak all of it in one go. Aleksey and Jamie have worked on this problem (Ticket #249) and now NVDA only announces how many characters were selected rather than the actual content, if the number of characters is 512 or higher. A part from the fact that the user probably doesn't want to hear the entire document when they select it, this does stop some problems with some speech synthesizers that can not handle large amounts of text handed to them in one go.

Peter has fixed some issues with progress bars in software such as Nokia PC Suite which uses the QT toolkit.

Finally, Jamie has worked on the code that handles the reading of content in alerts and dialog boxes. Specifically users will notice that NVDA is much more successful in reading the text in its own installer (specifically the welcome page) and alerts in Firefox no longer cause a lot of pointless info to be spoken. The text announcement in Dialogs is still not as good as we'd like (some info can be repeated) but double speaking is always better than not speaking at all.

Weekly update (December 2nd 2008)

The last few weeks have been rather busy for us due to work specific to NV Access so we have got a little behind in weekly blog posts. However, in this post I will try and cover the significant happenings in NVDA from mid November up to now.

Firstly, I have finally implemented support for group position information for IAccessible2 objects (ticket #77). NVDA has always been able to report group position info e.g. "x of y" or "level z" on MSAA objects, but we had never gotten around to supporting it for IAccessible2. At the same time as implementing support for this, I slightly redesigned how NVDA handles position information in general. This does not really change anything for users, but does make it easier to manage these properties in the speech code.

Although Jamie has already mentioned this in another blog post, NVDA has now been updated to run with Python 2.6. There are some pretty interesting new features in Python 2.6, and we are quite glad that we can now start thinking about using them. NVDA no longer will run with Python 2.5, so if you do run from source, please make sure to read the updated dependencies.txt file.

When redesigning how NVDA handles position info, we decided to properly handle the announcement of levels in tree views much more genericly. This means that all tree views now will announce the level of the current item first if it changes, rather than this only happening for SysTreeView?32 tree views.

Jamie has spent some time working on braille support, to make it much faster when moving the focus around the operating system and applications. Specifically he has implemented ticket #201, which allows NVDA to remember common objects in the ancestors of the new and old focus objects in a focus change. As this greatly cuts down the amount of information NVDA has to fetch in order to represent the objects in braille, there are some very noticeable speed improvements.

I made a few small improvements to NVDA's support for speaking typed words and characters. These include better improved speaking when both typed words and characters are on (now the word will be announced before the character that denoted its end, rather than after). NVDA no longer announces an asterisk when pressing non-alphaNumeric keys such as tab and enter, in a password field. And finally, NVDA no longer will speak a word you have typed if when typing the word you then move the focus to something like a menu or other application etc. Previously NVDA could unexpectantly start announcing a word you typed minutes ago just because you pressed space or enter.

Jamie has managed to fix a very long standing bug in NVDA's command console support that was stopping certain bits of new text from being announced. This had been reported to us quite a long time ago, and funnily enough, to fix it was as easy as removing one small line of code. This fix hopefully may make NVDA a little nicer to use with some console-based text adventure games.

As some people have been reporting crashes with Firefox when using NVDA, we have been racking our brains over what could be causing them. So far we havn't had too much luck, but along the way we did manage to find some issues with NVDA and unicode file paths. Jamie and Peter have made a few changes which *may* make NVDA better when its run from a directory with non-ascii characters, though we are not too sure really how many issues this really does fix.

If people do get crashes with Firefox when running NVDA, it is very important that they do send their crash reports to Mozilla via the Mozilla Crash Reporter if it does appear. And it is also helpfull if they could send the crash IDs to the NVDA developers, so we can research to see if NVDA caused the crash, and if so, how it caused the crash. If you wish to find IDs for crashes you have experienced in Firefox please type the url: about:crashes In to your Firefox location bar, and you will see a page with all the IDs listed.

Jamie has fixed some issues where reading virtual buffers with the review cursor would not seem to announce the same information for fields such as links etc as would be announced if you used the actual caret (with the arrow keys). The review cursor hopefully now acts exactly the same as the caret, at least as far as user experience goes.

Peter has added support for some edit controls in pinacle tv software.

For a while now, it has been clear that there should be a way to copy text from NVDA's review cursor. A situation that needs this feature would be perhaps in a Dos console where you need to copy a URL. Sighted users can just highlight the text with the mouse and then copy it. Because of this, Jamie has added two new scripts to NVDA. nvda+f9 marks the current position of the review cursor, and nvda+f10 copies the text from the previously marked position up to the new current position of the review cursor. Finally, There are a lot of applications which like to let NVDA know that the focus has changed, even when they are not the currently active application. This is annoying as NVDA then announces the focus change to the user, but when the user then acts upon the change, they find that the focus never really actually changed. Jamie has fixed this problem by making sure that NVDA does not handle focus changes outside the active application, or in a system control (such as the start menu etc).

Finally, we have been making some changes to do with whether actual Python source code is included in our compiled snapshots and releases. This is to try and get around the issue where a person or organisation wishes to Use NVDA, but can't deal with the GNU General Public License (GPL). This may have caused some problems for users using snapshots. If you find you can no longer run NVDA after installing a new snapshot, please uninstall any old copies of NVDA, including the newly installed one, also delete the NVDA directory from the Program files directory or where ever you installed it if it still exists, and then re-install NVDA again. We aim to make this transition a little easier for users in the near future.

Mozilla Foundation grant allows for employment of NVDA full-time developer

Thanks to the generocity and support of the Mozilla Foundation NV Access has been able to hire James Teh as a full-time developer to work on NVDA. The Mozilla Foundation has taken a keen interest in NVDA as one of NVDA's goals is to provide excellent support for Mozilla products, such as Firefox and Thunderbird.

The grant (which provides NV Access with US$80,000 over 2008) allows NV Access to employ James Teh (Jamie) full-time to work on improving and maintaining NVDA, with a major focus on Mozilla products. The grant will be also used to cover overheads for the running of NV Access, which a part from general administration, also includes project promotion and the seeking of further funding.

NV Access and Mozilla worked together to draw up a list of grant goals for NVDA, which both organizations see as the most important things that should be achieved to make the project a success. Although the grant will be reviewed before the end of this year, all the goals listed are to be completed with in a three year timeline.

Jamie will hopefully be starting work in the next month or so, once all the admin has been organized. I for one am very excited to have Jamie join the project on a much more full-time basis, and I know he is also very excited to be able to put all his working time to open-source projects that hopefully can improve the lives of people in the community in some way.

On behalf of NV Access, and the other developers of NVDA I would like to thank the Mozilla Foundation for its support over the last year. Together we can make sure that blind users will always have both a free choice when it comes to access to applications on the Microsoft Windows Operating System, and also a choice to move forward with the rest of the community, to use free and open-source products (such as Firefox and Thunderbird).

New virtualBuffers now in NVDA, and fun with lines

Since my last blog post on the web access grant a lot has happened in regards to this stopic. A few features talked about in previous postings for the storage code have been implemented, but most importantly, I have taken that next step of actually completely integrating the storage code in to NVDA and now giving myself and other NVDA developers the choice of using the new code to interact with Gecko 1.9 documents (such as Ff3 and ThunderBird?3).

The last post I talked about the fact that Jamie and I had talked about allowing arbitrary properties on nodes in the buffer, rather than just locking it down to just role, value, states, keyboardShortcut and contains. Well, this has been achieved, so now functions such as addTagNodeToBuffer and findBufferFieldIDByProperties take an array of attributes (name and value pares), or sometimes name and multiValue pares, if using findBufferFieldIDByProperties to search on multiple values of a given property. There are still particular properties of a node which have their own specific member variables (such as ID, and some other new ones that are generic to nodes or tagNodes)

The coding that has probably taken up most of my time through out the last month or so is the code that manages lines in the storage module. A virtualBuffer's job is to render a representation of a document in a flat layout, meaning that every character in the buffer has an index, from 0 to the length of the buffer - every character has an ordered place. But, it also has to have an idea of what lines are, as in it must allow the user (through the AT) to be able to arrow up and down through the buffer by lines of information that are not too long, but also not too short as to make the user have to press keys more than they need to. Working out exactly how to implement this was hard, what it uses now is my third go at implementing it.

NVDA itself has pretty good text management through its TextInfo? classes, so all the storage module had to do to communicate line placement was to allow the querying of line offsets using a particular offset as reference, with the getBufferLineOffsets function.

My first attempt was to simply scan back from the offset, looking for a line feed character, and then do the same forward. This works ok if the only information stored in the buffer is itself basic text broken up by line feeds. However, if for some reason the AT wanted to some how tweek where line breaks occured (perhaps for ease of reading), it would have to insert its own line feeds in along with the original information.

This way of doing things was ok for testing. In fact, with some initial tests in NVDA, I had NVDA place a line feed at the end of each node it inserted, plus I also had it scan each block of text it added and got it to insert line feeds with in the text, to break it up in to reasonable sized chunks.

Mutating text in this way is not only bad because when the text is navigated by the user, they will see a line feed character at the end of each line, even if the line was only broken due to line length rules, not because a paragraph actually ended. The other major problem is that because Mozilla Gecko provides text with imbedded objects, who's events depend on the text offsets staying the same as what they internally have, things could get out of sync pretty quickly.

I then designed a way so that the AT or backend, when adding the text to the storage buffer, could provide a list of offsets where lines should be broken. These would be soft line breaks that did not actually appear in the text, but the buffer would know about them and when asked for line offsets, could take those in to account.

I was happy with this approach for quite a while, as it meant 1. that we were not mutating the text at all and Gecko events would be happy, and 2. Users could arrow to the end of a line in the middle of a paragraph and not see line feeds that shouldn't be there.

There were two major problems with this approach. The first was that the AT or backend needed to know the user's chosen maximum line length at render time, and although individual text blocks would not contain lines longer than the chosen length, there was nothing stopping two text blocks (say part of a paragraph and then some links) from all together added up being much longer than the chosen length. Of course this wouldn't be a problem if a line break was inforced at the end of all nodes (such as in many popular windows screen readers), but if NVDA was to support a screen layout, then this problem could be quite evident.

Eventually I decided on the third approach. This way was to allow getBufferLineOffsets to receive a maximum line length int , and also an int that indicated whether a screen layout was to be used, and then it would calculate the offsets itself by a set of steps. To accomidate the new way, tag nodes in the buffer also needed to take a new member variable, addTagNodeToBuffer also needed to be able to receive this. This was an int that indicated if this tag node was a block element or not, as in, should the buffer assume that this node has to both inforce the start and end of lines at its edges.

So, the steps that getBufferLineOffsets takes are: *Set some initial line offsets to the start and end of the buffer *Locate the deepest node at the offset given *Move up the node's ancestors until it locates a tagNode that is indicated as being a block element. If one is found, then the line offsets are set to this node's start and end offsets. Also record the start and end of any tagNodes passed in a possibleLineBreaks set. *Then from the given offset, do a traversal search both backwards and forwards in the tree locating the closest block elements. If one is closer than the ancestor block element, then set the line offsets to this node's offset. Again also while traversing, save the start and end offsets of any tagNodes in the possibleLineBreaks set. *Then scan the text between the now found line offsets, looking for both line feeds and beginnings of words. If a lineBreak is found before the given offset, and its the closest one to the offset, than it now becomes the line's startOffset. Same for a line feed on or after the given offset, if its the cloest it becomes the end offset. The beginning of word offsets are saved in the possibleLineBreaks set. *Finally, The line start to line end is counted up to make sure it doesn't exceed the maximum line length the user requested. If it does, then the line start is brought forward to an offset either at the max line length, or before (using the possibleLineBreaks set as indication of where its healthy to break), and then the line length is counted up from there again. Of course this does not ever pass the original given offset, and the line end of course will not end up being before, or too far after, the given offset.

Note that if the user chooses not to use a screen layout, then rather than searching for block elements, it just uses any tagNode, meaning lines will seem to always break at the end of links and other fields etc.

A rather complex set of actions, however in c++ they really do not take too much time at all. I didn't really like this approach at first as it has a danger of being non-cemetrical, in that there could be a chance that asking for two different offsets that should be on the same line, it may give back two different lines, due to the fact that a maximum line length has to be checked. Though, I foun that as long as I always calculated all soft line breaks, even before the given offset, between clear block line breaks, this would never be a problem.

Around the same time I was improving upon the line offset code, I started re-writing NVDA to use the new virtualBuffer code. At this point in time, the new virtualBuffers for Gecko 1.9 applications have improved quite a lot in comparison to the old virtualBuffers NVDA was using before the grant. Although the backend rendering code is still in Python, the technique of using imbedded objects in text with IAccessible2 and so forth proved to be a rendering time improvement of over 50%. This means when NVDA loads a document in Firefox3, it now takes just under half the time it used to.

As the low-level management of nodes and text is all maintained in c++, this has made sure that its much more accurate, and we no longer have large chunks of documents mysteriously not being rendered, or complaints from the virtualBuffers that some ID doesn't exist and other fun things we used to have.

We have been waiting for a long time to be able to convert NVDA's virtualBuffer interface code to using the TextInfo? classes I spent a lot of time on last year. As we needed to make NVDA work with the c++ virtualBuffer storage module, and because we needed to improve NVDA's rendering patterns for imbedded objects and such, I made sure the new virtualBuffers were designed around the TextInfo? classes. this now means users of NVDA now have the ability to select text with in virtualBuffers, and also copy that text to the clipboard if they wish. They can also choose to read the buffers as a screen layout, or as a more conventional node per line layout. Its taken a little while, but I've also now added the quick key navigation (as in press h to jump to a heading, l for a list etc) in to the new virtualBuffers; its great to see that the findBufferFieldIDByProperties function actually works like I'd hoped.

At the moment we're still in discussion on the development list about how particular fields such as links etc should be spoken: should the word link be spoken before or after the text etc. Though I think we've come to a pretty good agreement on most of the fields.

The new virtualBuffers (at least in Gecko 1.9 applications) can be interacted with in regards to activating links, toggling on and off a pas-through mode to interact with edit fields and combo boxes etc, though the one thing that makes the new virtualBuffers incomplete still is that they have no support for events etc, as in if content changes dynamically in the document, the buffers do not pick up this change. This code however will be added when I re-write the rendering code in c++, as it needs to be very fast, and for best accuracy, it should really be in-process so that things don't start disappearing before NVDA's process gets around to actually asking Gecko for it. However in the mean time I've added a key stroke to tell NVDA to manually re-render the current document, so for most websites, they are able to be tested well enough.

Over the next little while its probably going to be more work on NVDA and virtualBuffers in general, to make sure that the user experience is the best it can be. Once this is ok, then my next task wil be to re-write the rendering code for Gecko virtualBuffers in c++. this should give load times an estimated speed up of about a multiple of three. Then after that the fun work will begin on trying to integrate all of the virtualBuffer c++ code so that the rendering code is injected in to the Gecko application, and rendering takes place in-process. Which by estimates should speed up load times by a multiple of twelve or so.

More work on Web Access grant

Since my last post on the web access grant, the virtual buffer library code has undergone quite a few changes, both for code readability, and for makeing sure it will really work the way it should, in all situations.

The first change is that the wm module (that manages all the window messages) has now been removed. So rather than having individual window messages for each API call that must cross a process boundary, we only now have one window message which takes a pointer to the internal function, and a pointer to a struct of arguments, as its wParam and lParam arguments. This means that only one window message needs to be registered, plus it means one less code change when adding new functions to the API.

The next change is that rather than client and internal functions using the storage buffer directly, they use a buffer container instead. This buffer container is a pointer to a struct that contains a handle to the window being virtualized, a handle to the current backend dll being used for this buffer, a pointer to the storage buffer, and a pointer to a win32 Critical Section, which is used to serialize access to the storage buffer. These changes make it much more possible to have multiple buffers for the same window, and it makes sure that the storage buffer can not be read from while its being written to etc. The latter of course depends on the fact that any backends will also properly use that critical section when accessing the storage buffer.

Previously, nodes were only used to represent tags, as in actual nodes with properties, not just text. The only way that text manifested itself was if a node was wider than 0 offsets and it had no children, or as a gap of more than 0 offsets between two sibling nodes. This was very dificult to manage when inserting and removing text, so now text has its own node type.

As Firefox3 accessibility has a *very* different way of handling text and child nodes (through its embedded object approach) compared to Firefox2, or other web browsers, a lot of long phone calls between Jamie and I were had, and a lot of code restructurig was done, to make sure that we can handle the two very different approaches as efficiently as possible.

The main problem was that the virtual buffer library was very ID-centric, meaning that all our API functions took node IDs as arguments. However, in Firefox3, text itself does not have a unique ID, so this approach just doesn't work at all. So the calls for adding and removing nodes have now been changed to take actual nodes as arguments, not just IDs. Also, the call to add a node also actually returns the node as its added, which allows the backend to gain a reference to the created node for later use. To avoid code duplication, much of the adding/removal code has ben broken down in to much smaller reusable functions, which make the code easier to read, and probably even more efficient.

Two extra functions have been added which handle the merging and splitting of text nodes. When removing a node which is flanked by two text nodes, its probably best to actually merge the two text nodes together so as not to cause fragmenting over a long period of time. Otherwise we could end up with a whole bunch of one-character wide text nodes all over the place, if there were a lot of arbitrary removals. The reason for the splitting function, is that if firefox instructs us that we need to add a node in to a parent at offset n, offset n may actually be right in the middle of a text node, so before adding the node, we need to split the text node in two at this position, and then add the new node directly after the first text node. Note that a function has not yet been written to actually take a parent and an offset from firefox and calculate where exactly in its children the new node must be added, this will get written along with the Gecko backend as its quite specific to Gecko.

Many other little fixes and tweeks have been made to the code, making sure that it handles different situations properly.

All through the writing of this code, there has been a testing program that tests different actions to perform on the storage buffer, to make sure we don't break anything.

Lately, I have branched NVDA trunk to a virtual buffer testing branch, which has allowed me to pull apart NVDA's old virtual buffers, and start writing a test one using the storage module from the virtual buffer library. Its very basic, only printing about three or so lines to a virtual buffer, with a few links and headings etc, but this is really just to enable me to start writing the necessary code in NVDA that will be used to navigate the new virtual buffer etc. I must say its quite nice to finally be able to navigate around the new virtual buffer, it does prove that this code is actually going somewhere.

One thing that Jamie and I were talking about on the phone last night was the use of hard-coded properties in the virtual buffer library, such as role, value, states, contains and shortcut. It has always been thought up until this point that backends will convert their own role and states values to virtual buffer library - sspecific ones, then NVDA only has to deal with one set. However, this seems to create a lot of work for the backends, plus rather large mappings need to be written for all the different accessibility APIs. I think we have agreed that the backends will now just use API specific values for roles and states, and NVDA itself will do any conversions after fetching them from the virtual buffer.

We are also worried about exactly what properties a node should have. Obviously a node needs an ID, as that is what makes it unique, but as far as role, value, states, contains and shortcut is concerned, these are really quite arbitrary to a virtual buffer, and specific to the API used. Our thought is that perhaps rather than having hard-coded properties, we will allow arbitrary properties instead, meaning that when a node is created, the backend who created it can specify a string of name=value pares, which denote the properties. The property names should also probably have namespaces for the different accessibility APIs, though there may be some properties which are not specific to any API.

All this needs to be thought out a little more, but what we are starting to realize is, is that the virtual buffer library should only act as a pipe line for information, at the same time tweeking the sintax and structure (i.e. converting a hyerarchical model in to a flat model), but not in any way changing the actual content. For example, the virtual buffer library should not have any idea about what a 'role' is, it should only know that nodes have properties. Its up to the actual accessibility API to inforce the semantics.

The advantage of this is that we don't need to keep changing the virtual buffer library interface when we want to add some other property, perhaps its to do with live regions, or its something to do with tables. Instead, the backend just needs to make sure it adds that particular property to the node, and of course NVDA needs to know to use that property on the other end, when its reading from the buffer.

Although work is slow, I think we are certainly making progress, and at the very least I am certainly learning a lot. This is new grround, for us at least, and I think we'll get there.

A Server for building and testing NVDA

Over the last week or so, I have been busily setting up the new testing/development server that was kindly donated to NV Access. This server is going to be used as a server for the organization: hosting a virtual private network, allowing for the collaboration of business-related work and access to NVAccess's printer/copier/scanner/fax (bought through an Australian government grant). But more importantly it will be used as a testing/development server. Once its finally set up, this server will be able to automatically build daily snapshots of NVDA (if the source code changes), and also hopefully run some automated tests on the snapshots, to make sure that changes made don't break any previous changes.

Hardware-wise, the server has a Pentium 4, 3.20 GHz processor, and 1 gig of ram. Previously it did have 2 gigs, but the second chip seemed to be not very healthy, and after taxing the memory quite a bit the other day, we were getting all sorts of fun errors, so for now that chip has been removed. It has two hard drives, one 40 gig for the main Operating System and applications, and an 80 gig for files and Virtual machines for building/testing. It also has no shortage of USB sockets, in fact I already took out one card that had four on it. It has a floppy drive, sound card, and two network cards (one for access to the internet and my home network, and one purely for NV Access, to access the printer, and any other NV Access specific devices).

I have installed Ubuntu Linux 7.10 as the server's Operating System. We chose this OS because a: both Jamie and I are used to using Debian Linux (which Ubuntu is based on) and B: Ubuntu's accessibility seems to be growing all the time. And as NV Access, this is something we'd like to keep an eye on.

For testing and building NVDA, we are going to run MS Windows inside VMWare. For those who don't know, VMWare is software that emulates an entire computer system, so you can run one operating system, inside another. There is a free version of VMWare for Linux, which suits our needs, and I have successfully installed it and its running quite nicely on the server now.

Although the server is pretty much all set up, we're still a little way off from complete automated building/testing of NVDA. One thing that needs to be completed is the re-writing of the build scripts Jamie currently uses to build NVDA snapshots on his laptop. The plan is that we'd no longer like to keep compiled copies of eSpeak, charHook, keyHook and the virtual buffer library in subversion, but instead build them along with the snapshots, and have them all included in the snapshots, and also as a separate download so that people can still run from source. However the most important part holding us back is getting access to the MS Windows Operating System, so we can install it in the virtual machines on the server. I have tested virtual machines with some other free operating systems such as Ubuntu and Gentoo Linux so I know they work, but over the next little while NV Access would like to investigate how to acquire licensed copies of the needed Windows versions. We plan to build the snapshots in Windows XP, though we would much like to be able to test NVDA with Windows 2000, Windows Vista, possibly Windows 98/ME. Of course testing MS Office would be also very useful.

Many thanks go to the donator of the server, already we are seeing just how useful it is, plus we believe it really will change the way NVDA development happens in the future.

One other advantage of the server running Ubuntu is that I'm able to run Orca (a Gnome X-Windows screen reader for Linux / Solaris). Both the Orca and NVDA projects do have many things in common (as they are both free and open-source screen readers), and even though they are written for two entirely different operating systems, there should be much the projects can learn from each other, both in coding and user experience. I recently tested Orca with Firefox 3, and it is very clear that orca's web support is coming in leaps and bounds.

Other than server stuff, I of course have been working on the new virtual buffer library for NVDA. Work is slow but I'm definitely getting there. Design decisions need to be made very carefully as we need to make sure the code is as efficient as possible, but also make sure it will be compatible with lots of different web content.

Virtual buffer Library code started

As the web access solution in NVDA will allow users to use both object navigation and a flat model approach, we have to start writing a replacement for NVDA's current virtual buffer code.

At the time of this blog entry, Quite a bit of the code has already been written. That being the storage module, whos job is to allow the storage and retreavel of text and fields. And part of the client and management code has already been written.

To access the source code, you can grab it from http://svn.nvda-project.org/nvda/virtualBufferLibrary/

As the virtualBuffer library must be fast and light-weight, it is being written in c++. It also will be written so that parts of it (such as the rendering/updating of the buffer) will be executed in-process. This means that for instance if virtualizing some web content fromFirefox, then some of the code will be executed with in Firefox itself.

The basic idea of a virtual buffer is that a screen reader wants to access a flat model of some content in a particular window. It is the job of the virtual buffer to render and update the flat model inside the window, and allow the screen reader to query for certain information about the flat model.

In order to facilitate the execution of code in-process, the library needs to firstly be able to inject itself in to another process, and also then be able to send information to and from the code that was injected.

The virtual buffer library itself only manages the window, and allows the reading from storage. However particular backend libraries must be also written which know how to work with particular object models in Mozilla Gecko etc, in order to render and update content, storing it in storage so that the virtual buffer can read and interact with it.

The library is split up in to a few distinct parts: dllMain, client, internal, wm, storage.

DllMain? is very small, but its job is to initialize common variables etc, that must exist for any instance of the library, whether it be in or out of process. For now this is just a handle to the opened dll, and also it needs to initialize all the window message values.

Wm keeps and manages all the window messages that will be used to communicate information between processes. As the library will be injecting itself in to processes and then intercepting window messages from any window the client tells it to, it is necessary that the library's own window messages be values that don't clash with any other messages already used for that particular window. The win32 API call RegisterWindowMessage? is useful for this task as it can assign unique values to the libraries window messages. However, this does mean though that each time the dll is loaded in to a process, all the window message values have to be initialized. RegisterWindowMessage? (given the same string argument) will always give back the same message value, at least until the system is rebooted.

Wm also contains quite a few struct types which are used to hold arguments needed for a window message. When a message is sent with SendMessage?, it is necessary to allocate either system memory, or virtual memory in the process the message is being sent to, and using this memory as the struct, and then giving SendMessage? a pointer to this memory.

Client contains all the high-level functions that will be used by a screen reader to create and read from the virtual buffer. It can create a buffer, destroy a buffer, get text between two offsets, find out a field ID at a particular offset, get an XML representation of the text and fields between two offsets, find particular text, and even find a field given certain properties.

These client functions all pretty much just send a window message to the given window, passing a buffer handle, and perhaps a pointer to system allocated memory containing further arguments for the message.

Client also contains some functions which can prepare and unprepare a window. These functions are what manage the injection of code in to a process, indirectly intercepting a chosen window. How it does this is to temporarily set a window message filter hook on the thread who owns the window, then send the window a message (to make sure the hook gets called at least once), and then unregister the hook. Because the hook function the library used is actually part of the library, Windows automatically loads the library in to the process who owns the window. However, it is then up to this hook function to intercept the window, and make sure that windows can't unload the library from the process when the hook is unregistered.

Internal contains the hook function, and a window procedure. The hook function's job is to pass on any window messages it receives, ignoring them, unless its the window message that client sent in prepare window. If it is this message, the hook function intercepts the window that message was for by retreaving the window's current window procedure, saving it as a property on the window for later use, and setting the window's window procedure as the window procedure in the library. It finally finds out the file path of the library, and uses the win32 api call LoadLibrary? to up the reference count of the library, so that Windows won't automatically unload it once the hook function is unregistered. This all means that from then on, any message for that window will travel through the libraries own window procedure.

The hook function will also have to load a backend library, and instruct it to render the current content, and also register any events it might need so that it can continue to update the content.

The libraries window procedure will pass on any message it receives to the old window procedure, unless its one of the library's own window messages. If it is one of these messages, then the window procedure will perform the appropriate action and return the result. e.g. ask storage for the text between to offsets, and return the result.

However if the message is for unpreparing the window, then the window procedure has to replace the window's old window procedure, and call FreeLibrary?, allowing Windows to finally unload the library from this process. It would also have to do this if it detected that the window was going to be destroied.

Storage is the code that manages the actual text and fields. It stores fields as a tree of nodes (with next, previous, parent, firstChild and lastChild relationships). Each node has a given ID, and also contains properties such as a role, value, states etc. It also contains start and end offsets, which are used to work out what text goes with what field. The text is all stored in one large c++ string.

There is also a map of IDs to nodes, to make it easy to locate a node for any given ID.

There are functions to perform all the tasks that the client needs to perform (such as getting text between two offsets, finding text, finding fields, getting an xml representation for text and fields between two offsets).

No backends have been written yet. But their job will be to render and update the content, using storage to store the rendered content.

This description is a very rough idea of how the library will work. Please look at the actual source code for more detail.

There is still much code to write, and many things to decide upon.

Research with Voice Over, and design decisions, for Web Access Solution

After my last blog entry on the Web Access Solution, I received the Mac laptop NV Access had hired for a week, so I could test out the way Voice Over handled access to the web.

On the whole, I found that using Voice Over, I was able to perform any task I needed to, though the keyboard commands took me quite a while to get used to. Though it was a nice feeling to be able to unpack the laptop, turn it on, and press command f5 and have the system start talking, allowing me access to 99% of the operating system.

For navigation, Voice over takes an approach which is sort of a mix between Gnopernicus/Virgo/NVDA (tree-based object navigation) and Jaws/Hal/Window Eyes (flat screen model). Voice Over allows you to navigate by object, though its tree-structure is very minimal. Its more as if the order of objects is governed by where they appear on the screen, rather than where they are logically positioned.

When I used Safari, I noticed that Voice Over does not use the virtual buffer flat-model approach to web content like many Windows screen readers, but just continues to allow the user to use its operating system wide object navigation. Once you type in the URL, and then locate the html content, you can either tell Voice Over to read all the objects inside the html content object, or you can enter the html content object and then navigate around the objects within.

It was nice to be able to quickly get an idea of the structure of the page using object navigation, though I did feel yet again that the tree-structure was quite minimalistic. Ialso found it a little hard to review bits of information on a page, as you could only really move between paragraphs and other elements, rather than also being able to move easily between lines and characters etc. There is a mode you can switch in to to review by character on an object, though it is quite fidly to do.

Having had a play with Voice Over on the web, We have decided that there are advantages and disadvantages to both object navigation and a flat model approach. We have planned now to make sure that NVDA's web access solution uses not one or the other, but both in paralell.

The idea will be that when you go to a web page, the content will be loaded in to a flat representation, but also you will be able to use NVDA's object navigation at the same time. In fact, each time you move with in the flat model, where you are in object navigation will be updated. And the same goes for moving with object navigation: your position in the flat model will be updated.

This means that users can use what ever approach they like to read the page. Some highly structured information might be best navigated by object, but some textual information might be best read in a flat model.

There will most probably also be a setting in NVDA to say whether you in fact want the flat model at all. Some users may only want to use object navigation, and in that case,they shouldn't have to be affected by the rendering of a flat model they never intend to use.

I must admit I was a little surprised at Voice Over's object navigation. Users of Voice over have been singing its praises for quite a while, in that Voice Over takes a very different approach to web access. Although I personally totally agree that object navigation is very useful, in truth object navigation has been around in Gnopernicus, and Virgo4 for many years. Plus, NVDA has had the ability to navigate a web page (at least in Firefox) by object navigation for over a year, though it seems to me that many users of Windows don't seem to find this useful.

So, hopefully with NVDA having both, users can choose which way is best for them.

First Work on Web Access Grant

NV Access (the supporting organisation of NVDA) has just received a grant from the Mozilla Foundation. This grant enables us to implement a web access solution to allow NVDA to work with web content in Mozilla Gekco windows (in such programs as Mozilla Firefox, and Mozilla Thunderbird). NVDA so far does have support for Gecko already, but there are many problems with the current solution such as slowness in loading pages, and errors in rendering, and keeping up to date, pages containing javascript.

Over the last two weeks, we have finally started on the research for the web access solution. Jamie and I started with a phone call where we talked about where exactly the project should go, we planned out a few very basic implementations such as as simply fixing up the current virtual buffers by making them faster and more accurate, or going for a completely different approach.

The different approach could be where the web content could be navigated using an object oriented idea. This would mean moving among objects on a particular level, and then moving in to an object and navigating the next level down. When navigating to an object, NVDA would speak the object plus any objects inside. This sort of means that rather than rendering the entire document, only the specific object being navigated to would be rendered. Though

this fixes a few problems with the current implementation it does also have its drawbacks in that it may take extra time to navigate from object to object. Having a good understanding about how other screen readers handle web content is quite important when developing our own solution, so I spent some time looking at various screen readers (namely Jaws, Window Eyes, System Access, Orca and Hal). I looked at how they show their web content and how they let users interact with form fields. I tested with Firefox when possible, failing this, Internet Explorer.

My findings showed me that some screen readers use two modes, one for arrowing around a flat view of the page, and one where arrowing goes straight through to the application, enabling the user to interact with forms. However, some other screen readers seem to integrate these two modes in to one where arrowing on to a field then automatically allows further arrow presses to go to the form field if it supports them, tabbing away from the field is the means of getting out. Advantages to this method is that the user does not have to worry about two modes, getting rid of the need to remember just how to toggle the modes. The disadvantages are that there is no over all logic that can be easily applied to the arrow keys when arrowing a web page. Sometimes the arrows move you around the page, sometimes they move in a form field.

Another variation I found was the way that simantic info such as link, heading, list etc was presented to the user. Some info such as lists in some screen readers appeared as physical text that you had to arrow around, some info in some screen readers was spoken but never shown at all, some info was shown in the buffer, but only took up the space of one character. Also whether a field type was spoken before or after the actual content varied due to the field type, and also the screen reader.

Although this quick look at screen readers hasn't completely set our minds to what is best for NVDA as far as navigation logic and speaking order goes, it has helped us formulate more of an idea as to what questions we will soon be asking screen reader users, as to how NVDA should look and feel when it comes to web access.

One screen reader we havn't yet been able to test is Voice Over (the Built in screen reader in MAC OSX 10.4 and above). Many users of this screen reader report great things about how it takes a more object oriented approach and gives a very useable access solution to users viewing web pages in Safari (the default MAC web browser).

Before we can make a better decision as to how NVDA's web access solution should allow users to navigate, we do need to try out Voice Over properly, so NV Access has hired a MAC Powerbook for the next week, enabling me to test Voice over with Safari and other applications, to get a good feel for how things are done the MAC way.

Probably the most important peace of work that I have worked on so far is extending a small C++ library I wrote called Gekco Walker. Now known as MSAA Walker, this software is used to traverse a tree of MSAA objects, logging their name, role, and value to a file. I originally wrote this library before starting the grant to time how long it took to completely traverse an MSAA tree produced from a Mozilla Gecko window with a particular web page loaded.

The code in its original state worked out of process, meaning that it executed from where the user started it. All MSAA objects being retreaved had to be pulled across process boundaries. This is the same way that NVDA currently works with MSAA objects, and probably is also the easiest way to code. Originally I firstly timed a particular web page just using NVDA itself to render it, and it took a total of 36 seconds. This was a very large page, being a quite verbose article from Wikipedia. I then timed how long it took to traverse the same page with MSAA walker. This only took a total of 12 seconds.

These findings so far show that there is at least a 3-time speed up of traversal when traversing a document using pure c++ as apposed to a higher level language such as Python. This is not to say that Python is bad, its just probably that in pure C++ there is much less bagage to be carried around when performing a repeditive task on many objects.

Since starting on the grant I have extended MSAA Walker so that it can also execute in-process. This means that it can inject itself in to the process containing the MSAA objects, and run inside. When retreaving MSAA objects, they no longer have to cross process boundaries, in theory speeding up the traversal.

Once I made the changes, I again ran MSAA Walker on the particular Wikipedia article, and this time instead of running in 12 seconds, or 36 seconds, it ran in a total of 0.8 seconds! When I extended the library I also allowed it to count all the MSAA objects it logged so I could make sure that the same amount of objects were being traversed. And sure enough, with both out of process and in-process, 5403 DOM nodes were counted.

Jamie and I had heard from people that in-process would give us a speed up, but like any information, it is important to test for yourself, especially when a lot of information to do with in-process execution comes from programmers of commercial screen readers, where the code is not available for testing. We were very surprised with the 12-time speed up, and after testing a few other large pages, also yielding times similar to 0.8 seconds, we became quite a bit more certain about sticking with a virtual buffer approach. However, we are still not yet at all ready to completely drop non-virtualBuffer ideas, or think about coding as we would still like to fully test Voice Over's web content support, and also get some answers from users as to how they like things presented.

Over the next week I plan to test Voice Over, and also have further talks with Jamie about the questions we'll be asking, and perhaps look more closely at useful methods of sharing a large buffer of information between processes. I already have been researching in to memory mapped files, and other means of sharing memory.