Making Web applications more efficient

By By Larry Hardesty, MIT News Office | 01 Sep 2012

Most major websites these days maintain huge databases. Shopping sites have databases of inventory and customer ratings, travel sites have databases of seat availability on flights, and social-networking sites have databases of photos and comments. Almost any transaction on any of these sites requires multiple database queries, which can slow response time.

This week, at the 38th International Conference on Very Large Databases - the premier database conference - researchers from MIT's Computer Science and Artificial Intelligence Laboratory presented a new system that automatically streamlines websites' database access patterns, making the sites up to three times as fast. And where other systems that promise similar speedups require the mastery of special-purpose programming languages, the MIT system, called Pyxis, works with the types of languages already favored by Web developers.

Alvin Cheung, a graduate student in the department of electrical engineering and computer science (EECS), is first author on the paper. He's joined by his advisor, EECS professor Sam Madden, and by Owen Arden and Andrew Myers of Cornell University's department of computer science.

A web-services transaction typically involves both data retrieval - say, the flights on a given route with available seats - and computation - say, whether the difference in flight times would allow the traveler to make a connection. Typically, data is stored on one server, and the computation, or ''application logic,'' is executed on another. The application server and the database might have to exchange information multiple times only to conclude that, no, a given itinerary won't work.

But if a few frequently used chunks of application logic could run on the database server instead, it would save time, by limiting the number of cross-server transactions, and bandwidth, since the sole remaining transaction could be sending the single bit of information ''no.''

But application logic and database queries are generally written in very different languages, which are optimised to handle different types of operations, so moving code to the database can require not only rewriting it, but also rethinking the way it's implemented. And it's difficult to split a program in two without introducing bugs -without, say, losing track of which server needs to modify which variable at which point.