Xerox PARC Research Projects

"It Grows on Its Own Like an Ecosystem"

The Internet Ecologies Area at Xerox's Palo Alto Research Center is using multiple snapshots from the Internet Archive on disk - "the Web in a box" - as a kind of test tube for understanding the Web. "We see the Web as an 'information ecology,' where we study the relationships between people and information," says PARC researcher Jim Pitkow.

PARC "benefited greatly" from access to the Archive's crawls, says Pitkow's colleague and Stanford physics professor Bernardo Huberman. According to Pitkow, access to the snapshots "is great for researchers because it lets them fuse traditional tools and techniques with new tools that haven't existed before."

Huberman describes a PARC study that produced a mathematical "law of surfing," which says that Web traffic follows predictable, regular patterns. For example, in a manifestation of the "winner take all" principle, it turns out that just a few Web sites get most of the traffic. The researchers were also able to show how deeply people delve into a typical Web site: on average, it's about a page and a half. Huberman has also studied Internet congestion as a social dilemma, where people weigh the costs and benefits of putting up with slow traffic versus waiting until the network is less crowded.

In a study of the topology of the Web, a Stanford graduate student working on PARC's Internet ecology project found that any two Web sites are no more than four clicks away from each other - hard evidence that the world is smaller than it seems, on the Web at least.

Research on this scale and of this complexity makes new thinking possible in a whole range of fields, from graph theory to sociology. Pitkow compares what's happening to the Einstein-era thrust past the limitations of Newtonian physics into quantum mechanics: "The Web," he says, "requires a whole new form of understanding."