New research could enable computer programming based on screen shots, not just code. By Larry Hardesty, MIT News Office
Until the 1980s, using a computer program meant memorising a lot of commands and typing them in a line at a time, only to get lines of text back.
The graphical user interface, or GUI, changed that. By representing programs, program functions, and data as two-dimensional images - like icons, buttons and windows - the GUI made intuitive and spatial what had been memory intensive and laborious.
Graphic: Christine Daniloff
But while the GUI made things easier for computer users, it didn't make them any easier for computer programmers. Underlying GUI components is a lot of computer code, and usually, building or customizing a program, or getting different programs to work together, still means manipulating that code. Researchers in MIT's Computer Science and Artificial Intelligence Lab hope to change that, with a system that allows people to write programs using screen shots of GUIs. Ultimately, the system could allow casual computer users to create their own programs without having to master a programming language.
The system, designed by associate professor Rob Miller, grad student Tsung-Hsiang Chang, and the University of Maryland's Tom Yeh, is called Sikuli, which means ''God's eye'' in the language of Mexico's Huichol Indians. In a paper that won the best-student-paper award at the Association for Computing Machinery's User Interface Software and Technology conference last year, the researchers showed how Sikuli could aid in the construction of ''scripts,'' short programs that combine or extend the functionality of other programs.
Using the system requires some familiarity with the common scripting language Python. But it requires no knowledge of the code underlying the programs whose functionality is being combined or extended. When the programmer wants to invoke the functionality of one of those programs, she simply draws a box around the associated GUI, clicks the mouse to capture a screen shot, and inserts the screen shot directly into a line of Python code.
Suppose, for instance, that a Python programmer wants to write a script that automatically sends a message to her cell phone when the bus she takes to work rounds a particular corner. If the transportation authority maintains a web site that depicts the bus's progress as a moving pin on a Google map, the programmer can specify that the message should be sent when the pin enters a particular map region. Instead of using arcane terminology to describe the pin, or specifying the geographical coordinates of the map region's boundaries, the programmer can simply plug screen shots into the script: when this (the pin) gets here (the corner), send me a text.