Finding old versions of web pages could become far simpler thanks to a “time-travelling” web browsing technology, New Scientist’s Paul Marks writes.
Bookmarking a page takes you to its current version – but earlier ones are harder to find. One option is to visit a resource like the Internet Archive’s Wayback Machine.
There, you key in the URL of the site you want and are confronted with a matrix of years and dates for old pages that have been cached. Or, if you want to check how a Wikipedia page has evolved, you can hit the “history” tab on a page of interest and scroll through in an attempt to find the version of the page on the day you’re interested in.
It’s a lot of hassle. But it shoudn’t be, says Herbert Van de Sompel, a computer scientist at Los Alamos.
“Today we treat the web like a library in which you have to know how to go and search for things. We’ve a better way.”
That “better way” is a system that gives browsers a “time-travel” mode, allowing users to find web pages from particular dates and times without having to navigate through archives.
Called Memento, the system Van de Sompel is developing alongside colleagues from Old Dominion University in Norfolk, Virginia, harnesses a function of the hypertext transfer protocol (HTTP) – the system which underpins the world wide web by defining how web pages are formatted and transmitted from servers to browsers.
One of HTTP’s standard functions is called content negotiation.
This allows one URL to send multiple types of data, depending on the settings of the browser that contacts the URL: for instance, a browser in France accessing a URL may retrieve an HTML page in French, while accessing the same URL from the US may deliver an English version.
“Your browser does this negotiation all the time, but you don’t notice it,” says Van de Sompel.
But HTTP content negotiation is not limited to arbitrating between media formats and languages – it can cope with any data type. So the team are adding another dimension to page requests: date and time.
“In addition to language and media type, we negotiate in time. So Memento asks the server not for today’s version of this page, but how it looked one year ago, for instance,” says Van de Sompel.
Browsing the past
Memento comprises both server and browser software. On a server running the open-source Apache web system, just four lines of extra code are needed to build in date-and-time negotiation. On the browser, a drop-down menu will let users enter the date and time for which they want to view a page.
So far, the team has developed a Memento plug-in for the open-source Firefox browser, plus a “hacked” version of Firefox with built-in Memento capability.
Web pages need no extra features: the web server just needs to intercept the date-time requests of users. A demonstration of what Memento can do is available for any browser.
Of course, the whole idea requires website owners to store many more time-stamped versions of their pages than they do now, but the team think Memento will encourage them to do this.
“I would love to see Memento supported,” says Van de Sompel. “It would be such fun to set our browsers back in time and just browse the past.”
Jakob Voss, a developer with the Common Library Network in Göttingen, Germany, is an early Memento user – and he is already advocating use of Memento for sites with frequently updated pages like Wikipedia.
“Memento is only a proof of concept but it looks very promising and could be a great enhancement to the web. There is little support in today’s browsers for digging into archives, especially those with dynamic content management systems like wikis and weblogs,” Voss says.
“Tracking versions, and the provenance of web information, is becoming more and more important and Memento could help manage this complex task.”
He’s not alone in that view. Ian Jacobs, a spokesman for the World Wide Web Consortium in Boston, Massachusetts, agrees that “URL persistence” is a valuable aim – and that users should be able to browse the latest version of a page or one on a given date.
“The browser should allow the user to choose,” says Jacobs.
Van de Sompel is presenting the Memento technology today at a meeting of the National Digital Information Infrastructure and Preservation Program at the Library of Congress in Washington DC.