Usability Development

I am an undergraduate at a private technical school, majoring (in part) in computer science. When I started my undergraduate career, software engineering was among my list of majors, however that changed after I took a few software engineering courses. Software engineering is too soft for my more scientific tastes, Even though I myself do not find software engineering (especially for a corporation) to be rewarding, I recognize that software engineering is important and not particularly easy.

When I first think of software engineering, I imagine companies and clients. I imagine software developers in cubes exasperated over irrational clients who have no concept of what computers can and cannot do, and who have no idea what they actually want. I imagine lousy software products written in outdated languages, completely unusable (despite usability testing), and cost and schedule overruns. This is why I am not a software engineer.

However, at heart, I believe I am a software engineer of a different kind. I like the idea of open software, the problem is that it is almost universally absolutely unusable. I believe the world could be much more productive if there was free usable software. Though it may shock other computer science students, I use Windows (XP and 7) almost exclusively. I do not use Macs, and I most certainly do not use Unix.

Unix has essentially unlimited untapped power, the problem is that it seems that that power will remain forever locked away. I have tried to use Unix, and I find it infuriatingly frustrating to accomplish even the most simple task. It always involves several Google searches and typing in arcane commands that seem to have no semantic meaning. If I want to do something that I can do in Windows (say, play a game, use my favorite program, or watch a movie), it always involves more work than I am willing to put in. It is harder than it is in Windows, so why should I use Unix?

This is a troubling view from someone such as myself who does not give up easily. What it means is that the general public will never end up using Unix. If I can't learn it in a few minutes, most people would never even consider trying. I realize that not being able to use my favorite software (M$ products, such as Visual Studio and Office) are no fault of Unix or Unix developers, but the lack of an alternative is a problem.

This post isn't actually about Unix however. It is about great ideas. Before I get to the real point, I'll give an introductory example. A few years ago I was trying to figure out how to synchronize files between my desktop and laptop. It seemed like a relatively simple and common problem: lots of people have two computers, and doesn't everyone want to have all their data everywhere (securely) if possible? The answer was obvious to me.

So I began scouring the internet for solutions. I only considered freeware, because while shareware may be a great product, it can never revolutionize computing. It will never be widely adopted because of free (albeit terrible) free products. (I am ignoring the glaring counterexamples of M$ Windows and Office, which as I explained above have no alternative in my opinion). Even if other people do adopt shareware solutions to their problems, that is not how I will solve my problems. I'm an engineer; I solve my own problems.

So more about the backup example. The only freeware solutions that I could find were either completely unusable or inflexible and slow. The power of Unix (and open-source in general) is flexibility. The backup programs invariably required scanning the non-server computer file by file for changes, and then syncing. Today's hard drives hold terabytes of data, and it is completely unreasonable and impractical to scan the entire drive for changes. I envision a program that scans the hard drive once. After specifying which directories should be synchronized, the program would keep a list of files in those directories that have changed since the last synchronization. Then when the computers are next able to synchronize, there is no searching the entire hard drive doing file-by-file diffs. There is a short list of files that have changed and need to be transferred.

As a computer science major I am sort of obsessed with optimizations like this. It doesn't make sense for a program to search the entire hard drive (which frequently took more than a half-an-hour) when it had almost all of the information it needed to start transferring files as a soon as it detected the other computer's presence. If I started composing a document on one computer and then want to work on the other computer instead, I do not want to wait 45 minutes to synchronize a 10KB text file. That is outrageous. These are the kinds of problems that I want to solve.

Although the backup example is outrageous, a far more egregious failure has occurred with a new technology I am starting to learn: LaTex. If you haven't heard of LaTeX, it is a typesetting system envisioned, designed, and implemented by Donald Knuth (a veritable demigod of computer science majors everywhere). Knuth set out with a monstrous task of comprehensively summarizing (yes, that's what he thought too I'm sure) all of computer science. By the time he was done with the third book, he got so frustrated with the existing typesetting technology that he recursively began an equally incomprehensibly large task of developing his own language, TeX so that he would have a way to write mathematical formulas in his books. It took him 12 years to develop LaTeX, and academia today owes him more than could ever be repaid. It is almost painful to imagine the state of digital mathematics without TeX. Simply put, it couldn't exist.

After devoting 12 years of his life to LaTeX, Knuth went back to the Art of Computer Programming. However 22 years after completing TeX, support remains abysmal. It is a disgrace to the unimaginable amount of work that went into TeX. Given its wide-spread use, there should be myriad usable LaTeX editors, but I have not found a single one. The only current solutions require you to type code into a typical text editor, then compile the document (at least a 10-15 second process for a small document) and then open a PDF to examine your work. The only reasonable solution I can see is that LaTeX editing should be live. Much like the idea behind the backup utility, I believe that the compiler should only compute differential changes, rather than being forced to recompile the entire document. Other language compiles and IDEs (Visual Studio for C, for example) support such differential compilation. It is shocking to me that such a thing does not exist for TeX.

I want to honor Knuth's accomplishment while at the same time making LaTex more accessible to everyone by developing an open-source live editor for LaTeX (there is a closed-source solution. ONE).

LaTeX compilers work by reading the file several times and incrementally building the file index that is ultimately converted to a PDF output. This is a typical lexical problem for language theorists who care about such things. The basic idea is that if I can put code that changes the entire documents margins as the last file in the document, I can't create anything else until I have that line because I have to know what the page margins are before I can figure out how many words go on one line. The same problem occurs in languages like Java that do not require special ordering on classes and functions. Compilers must read the file to obtain the function headers, then go back and process the bodies given the header information. C solves this problem a different way by requiring that function dependencies are linear or that function headers are declared at the beginning of the file (either explicitly or by including a header file). Thus this is not a new problem. I believe I can achieve the same results as C (only reading the file once) but with the flexibility of Java (the ability to write in any order, not restricted to a total ordering of commands). LaTeX commands should fall into a few categories, so if I divide up lines of code based on their type and compile those separate files individually, I can store the index and only update what changes.

For example, if the first line of the file sets the page size (A4 vs letter, etc) and the last line of the file sets the margins, these two lines would be identified in the editor as "page layout" meta commands that would be stored (via XML or some other mechanism) into a virtual file that could be compiled first. If a change was made that added body text, the editor would realize this and would not update the virtual file containing meta commands. Then the meta commands would not need to be recompiled (eliminating at least one iteration of the LaTeX compiler). The compiler would then be called only for the next iteration, and the results would be integrated into the existing meta index.

It seems straightforward enough to me, however there are some serious developmental roadblocks. First, I barely know how to write LaTeX, much less the language semantics that I would need to start classifying LaTeX code into disjoint categories. I do not know XML, and I am not familiar with how it is used to hierarchically store text. Such a program would necessarily have a UI (otherwise it isn't usable!) and I have no experience writing UIs for Windows or Unix. The program needs to be platform independent (otherwise, what is the point?) but I have no experience developing for Unix.

This this task would almost certainly require a team. But that in itself is problematic. I have no idea how one goes about starting an open source collaborative such as this. In short, this is a truly monumental undertaking for one person. I suspect this may be partially to blame for the dearth of good programs. Well-intentioned people such as myself are overwhelmed by the immensity of such a task. The amount of knowledge I need compared to what I have is staggering. However, it would be an insult to Knuth to not start on such a noble task. Knuth did exactly what I intend to do. Twice. He knew little about typesetting when he set out; only that the existing solutions were not adequate for his needs. This is exactly the situation I find myself in now. So like him, I must endure. The journey of a thousand miles starts with the first step, so now I will outline how I intend to start.

I need Unix development experience, open-source collaboration experience, and LaTeX experience, so I think a good starting point would be to become a contributor to the MikTex project, an open-source LaTeX compiler and editor (I am currently using MikTex, it is quite the opposite of what I am envisioning).

I will probably have more time this summer, but I believe I can actually make a difference in this area, and I intend to.

Usability Development

Sunday, January 30, 2011

Case in point...

The Insane Problem