Several statistical laws have been proposed to help characterize strong regularities in the use and nature of the Web, including connectivity patterns and user surfing behavior [1, 2, 6, 7, 12]. One of them, the Universal Law of Web Surfing [7], maintains that surfing patterns for users are quite regular, manifesting a preference for shorter surfing sessions over longer ones. Here, we assess whether these regularities extend to surfing on the mobile Web, specifically through Wireless Application Protocol (WAP) portals, where many aspects of the user environment are quite different from the conventional Web. We report our analysis of a large sample of mobile Web user data 3.7 million sessions involving 0.5 million users of a European portalto determine whether the same laws apply to mobile Web surfing on WAP phones.
The results confirm the generality of the Universal Law of Web Surfing. As WAP is a precursor of future 3G telecom systems (from which we will be able to access Web content from mobile devices with restricted interface features), these results have important implications for our understanding of future mobile Web surfing behavior, as well as how to design Web-content-access interfaces.
Today, most mobile phone users readily access Internet-provided services from their handsets. But this mobile Web revolution has had a slow start, largely due to such problems as unstable handsets, limited content, low bandwidth, and high cost to the end user. However, significant device-, bandwidth-, and charging-model improvements have led to a dramatic increase in use over the past 1218 months. Here, we consider the underlying characteristics of the mobile Web and its usage patterns, comparing them to those of the regular Web.
The mobile Web represents a fundamentally different information medium from the traditional Web in terms of access devices used, content availability, bandwidth, and cost to the end user. These differences suggest there may be little to learn about mobile Web usage from observations of regular Web use. In fact, the reverse is the case; irrespective of these differences, the same Universal Law of Web Surfing applies in both Web and mobile Web surfing.
One obvious difference between the mobile Web and the regular Web concerns the devices employed by end users. There are fundamental differences between the WAP phones designed to access the mobile Web and PCs designed to access the regular Web. Screen real estate is an obvious difference, with PCs offering display sizes many orders of magnitude larger than a typical mobile phone screen. In addition, there is much greater diversity in the capabilities of mobile phones in terms of display resolution, color capability, operating system features, and browser functionality compared to the standardized world of PC-based Internet access. Mobile phones also have limited input capability; their numeric keypads allow only minimal text entry compared to the mouse/keyboard entry on PCs. In addition to these device differences, the content base of the mobile Web is much more limited in scope and diversity. Moreover, mobile Web users must also contend with slow download times and incremental billing costs [8].
These differences have led to differences between the way users access information on the mobile Web and the way they access information on the regular Web. For example, mobile users generally adopt a browsing model of information access [1, 2, 4, 6, 12], locating information and services through the menu hierarchies of operator portals, a type of access for which mobile phones are well adapted. In contrast, the primary mode of access on the Web today is via search engines. However, the input and output limitations of mobile handsets make it difficult for users to specify queries or sift through long lists of results.
Several recent studies [1, 2, 4, 6, 12] have identified several universal laws that appear to characterize the structure of the Web and its usage patterns. For example, [2] found that, though the Web remains essentially uncontrolled and dynamic, it reflects well-defined large-scale properties. For example, it is characterized by scale-free link distribution, and its highly connected structure results in a surprisingly small "diameter" (approximately 19 links or clicks), thereby limiting the click-distance between isolated information sources to just a few clicks.
The surfing patterns of WAP portals reflect the same strong regularities found in Web surfing; the Universal Law of Web Surfing is also the Universal Law of Mobile Web Surfing.
Web pages were shown in [6] to be distributed among sites according to a universal power law, meaning that many sites have only a few pages, while a few have hundreds of thousands of pages. The Web, according to [1], is an example of a small-world network (see [12]) and how this property is leveraged by search engines. Indeed, many of these properties are exploited by Google and other global search engines [4, 12].
In addition to these topological characteristics of the Web, fundamental usage properties govern the way Web users access information. For example, [7] found that Web-user surfing patterns reflect strong statistical regularities that can be described through a universal power law that explains the Zipf-like distributions in page hits commonly observed at Web sites. These regularities have been characterized by an inverse Gaussian distribution of surfing behavior that helps determine the probability a user will click through a succession of pages (search to a given depth) in a given surfing session. This analysis shows that most users are likely to have only very short sessions in terms of the number of links they select, that most users select only two or three links in a session, and that few users are likely to surf any deeper than these few links.
A key question is whether these regularities also exist in mobile Web usage, in light of the fundamental differences with respect to devices, content, and infrastructure. Here, we focus on the surfing behavior of mobile Web users by analyzing a large corpus of surfing data (more than 3.75 million sessions by almost 421,000 users in a large European mobile portal). Worth noting is that previous analysis (such as [7]) of Web searching often relied on much smaller samples of usage data involving at most thousands of users.
Following the methodology of [7], we start by deriving the probability, P(L)
, that a user will select L
links in a WAP portal. We assume that each visited portal page (initial page) holds some value for the user and that by clicking on a link from that page, the user proceeds to another page (the next page) that also holds some value for the user. The value of the next page is in some way related to the value of the initial page; that is, the value of the next page VL
is the value of the previous (initial) one VL-1
plus or minus a random term, as in
where the values eL are independent and identically distributed Gaussian random variables. A particular sequence of page valuations is the realization of a random process and is thus different for each user. Within this formulation, an individual surfs until the expected cost of continuing is perceived to be greater than the discounted expected value of the information that might be found in the future. This trade-off can be thought of as similar to an option in financial markets for which it is well known that a threshold value exists for exercising the option [5]. Even if the current value of a page is negative, it may be worthwhile to proceed if there is some likelihood that a collection of high-value pages may be found later. If the value is sufficiently negative, however, the risk associated with continuing to surf may rise to an unacceptable level; that is, when VL
falls below some threshold value, it is optimal for the person surfing the Web to stop.
It may be possible to reorganize the structure of a WAP site to obtain a desired usage pattern by motivating the appropriate behaviors.
The number of links a user follows before the page value reaches the stopping threshold is a random variable L
. For a random walk of Equation 1 the probability distribution of first passage times to a threshold is given asymptotically by the two-part inverse Gaussian distribution [9], as in
where l is a scale parameter. This distribution has two characteristics worth stressing in the context of surfing patterns. First, it has a very long tail, extending much farther than that of a normal distribution with comparable mean and variance. It implies a finite probability for events that would be unlikely if described by a normal distribution. Consequently, large deviations from the average number of user clicks computed in a session will be observed. Second, because of the asymmetry of the distribution function, the typical behavior of users will not be the same as the average behavior. Thus, because the mode is lower than the mean, care must be exercised with available data on the average number of clicks, as the average overestimates the typical depth surfed. This distribution was shown in [7] to accurately characterize Web surfing patterns, producing the so-called Universal Law of Web Surfing.
To test the generality of Equation 2 we applied it to our corpus of mobile Web surfing sessions. This data corresponds to four weeks of browsing activityduring September 2002for the almost 421,000 unique European users across more than 3.75 million sessions in the portal. In the analysis, we discarded all sessions of length one, because they were not representative of a true browsing session.1 This reduced our sample to 3,006,385 sessions by 350,635 users. A measured cumulative distribution frequency (CDF) of depth L
for the four weeks is outlined in Figure 1; superimposed on the plot is the predicted function from the inverse Gaussian distribution. Using the chi-squared-goodness-of-fit test, we found a fit of 0.99, accounting for 99% of the data.
Performing a similar experiment in 1998, [7] found similar distributions for users surfing a variety of Web sites, as well as for users surfing within a large Web site. In each case, the fit between the CDF and the inverse Gaussian distribution was found to be significant. As distributions of user hits on the regular Web and on the mobile Web are almost identical, we found that, despite the device and other differences, users surf in similar ways in either context. Each community tends to favor short regular browsing sessions.
An interesting implication of this surfing law is obtained by taking the logarithms on both sides of Equation 2 to yield
As a log-log plot, the equation presents as a straight line with a slope of approximately 3/2 for small values of L
and large values of the variance. As L
gets larger, the second term provides a downward correction. Thus, Equation 3 implies that, up to a constant given by the third term, the probability of finding a group surfing at a given level scales inversely in proportion to its depth P(L)
. We verified this Pareto scaling by plotting the available data on a logarithmic scale; Figure 2 shows that the inverse proportionality holds over a range of depths. Although the graph in Figure 2 essentially shows a Pareto plot it can be viewed as a Zipf Distribution, Pareto, or Power Law, as all three can be considered to contain the same information but stated in different ways [3, 11]. A similar experiment in [7] found patterns similar to those demonstrated for WAP users, further strengthening the hypothesis that mobile Web and regular Web users browse the same way.
These results indicate that the surfing patterns of WAP portals reflect the same strong regularities found in Web surfing, and that the Universal Law of Web Surfing [2, 7] is also the Universal Law of Mobile Web Surfing. Even though WAP and Web both offer ostensibly different paradigms of information access, users tend to surf and browse for information in essentially the same way in both contexts.
This law demonstrates the regularity with which users surf the Web and is useful for predicting mobile behavior. For example, there should be attendant benefits to using this model to aid user surfing, bootstrapping intelligent systems that aid users surfing WAP pages [10] by aggressively promoting links to aid user navigation or prefetching pages to combat slow download times [8]. As outlined in [7], this law may also be used in conjunction with spreading activation to predict expected usage of WAP sites; that is, it may be possible to reorganize the structure of a WAP site to obtain a desired usage pattern by motivating the appropriate behaviors.
The Universal Law of Web Surfing [2, 7] has proved to be the Universal Law of Mobile Web Surfing, affording more tools to help overcome the shortcomings of the mobile Web and aid development of new and novel techniques in user navigation in the mobile domain. These findings also augur well for understanding surfing in future 3G systems that will share similar device properties with WAP devices while offering more Web-like surfing opportunities to even casual cell phone users.
1. Adamic, L. The small world Web. In Proceedings of the Third European Conference on Research and Advanced Technology for Digital Libraries (Paris, Sept. 2224). Springer-Verlag, 1999, 443452.
2. Albert, R., Jeong, H., and Barabasi, A. Diameter of the World-Wide Web. Nature 401 (Sept. 9, 1999), 130131.
3. Bak, P. How Nature Works: The Science of Self-Organized Criticality. Springer-Verlag, New York, 1996.
4. Brin, S. and Page, L. The anatomy of a large-scale hypertextual Web search engine. In Proceedings of the Seventh International World Wide Web Conference (Brisbane, Australia, Apr. 1418). Elsevier Science, 1998, 107117.
5. Dixit, A. and Pindyck, R. Investment Under Uncertainty. Princeton University Press, Princeton, NJ, 1994.
6. Huberman, B. and Adamic, L. Growth dynamics of the World-Wide Web. Nature 401 (Sept. 9, 1999), 131.
7. Huberman, B., Pirolli, P., Pitkow, J., and Lukose, R. Strong regularities in World Wide Web surfing. Science 280, 5360 (Apr. 3, 1998), 9597.
8. Ramsay, M. and Nielsen, J. WAP usability deja vu: 1994 all over again. Nielsen Report (2000); Nielsen Norman Group, www.nngroup.com.
9. Seshardi, V. The Inverse Gaussian Distribution. Clarendon Press, Oxford, U.K., 1993.
10. Smyth, B. and Cotter, P. The plight of the navigator: Solving the navigation problem for wireless portals. In Proceedings of the Second International Conference on Adaptive Hypermedia and Adaptive Web Systems (Malaga, Spain, May 2931). Springer-Verlag, 2002, 328337.
11. Troll, G. and beim Graben, P. Zipf's law is not a consequence of the central limit theorem. Physical Review E 57, 2 (1998), 13471355.
12. Watts, D. and Strogatz, S. Collective dynamics of `small-worlds networks.' Nature 393 (June 4, 1998), 440442.
1Sessions of length 1 correspond to a user opening a browser, then closing it (accounting for 76.73% of such sessions) or to a user visiting bookmarked pages (accounting for the other 23.27% of such sessions). These sessions tell us little about surfing behavior. This lack of information also means that for the remainder of our calculations a session said to have contained t clicks actually contains t + 1 clicks.
Figure 1. Cumulative distribution frequency for WAP users as a function of the number of surfing clicks. The observed data was collected in a four-week period in September 2002 from a sample of 350,635 users representing 3,006,385 sessions. The inverse Gaussian distribution has a mean of m = 3.6576 and l = 2.69.
Figure 2. Frequency distribution of surfing on log-log scales. We used the same data set as in
©2006 ACM 0001-0782/06/0300 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2006 ACM, Inc.
No entries found