O'Reilly    
 Published on O'Reilly (http://oreilly.com/)
 See this if you're having trouble printing code examples


Dissecting Web 2.0 Examples: Chapter 3 - Web 2.0 Architectures

by James Governor, Duane Nickull, Dion Hinchcliffe
Web 2.0 Architectures book cover

This excerpt is from Web 2.0 Architectures. This fascinating book puts substance behind Web 2.0. Using several high-profile Web 2.0 companies as examples, authors Duane Nickull, Dion Hinchcliffe, and James Governor have distilled the core patterns of Web 2.0 coupled with an abstract model and reference architecture. The result is a base of knowledge that developers, business people, futurists, and entrepreneurs can understand and use as a source of ideas and inspiration.

buy button

Table of Contents

DoubleClick and Google AdSense
Applicable Web 2.0 Patterns
Advertising in Context
A Peek at the Future of Online Advertising
Ofoto and Flickr
Applicable Web 2.0 Patterns
Collaboration and Tagging
Akamai and BitTorrent
Applicable Web 2.0 Patterns
Alternate Solutions to Bandwidth
MP3.com and Napster
Applicable Web 2.0 Patterns
Shifting Patterns and Costs of Music Distribution
MP3.com and Napster Infrastructures
Britannica Online and Wikipedia
Applicable Web 2.0 Patterns
From a Scholarly to a Collaborative Model
Personal Websites and Blogs
Applicable Web 2.0 Patterns
Shifting to Blogs and Beyond
Screen Scraping and Web Services
Applicable Web 2.0 Patterns
Intent and Interaction
Content Management Systems and Wikis
Applicable Web 2.0 Patterns
Participation and Relevance
Directories (Taxonomy) and Tagging (Folksonomy)
Applicable Web 2.0 Patterns
Supporting Dynamic Information Publishing and Finding
More Hints for Defining Web 2.0
Reductionism

“Web 1.0 was about connecting computers and making technology more efficient for computers. Web 2.0 is about connecting people and making technology efficient for people.”

--Dan Zambonini

So, what actually changed between the emergence of Web 1.0 and Web 2.0? In this chapter, we’ll compare Web 1.0 companies and technologies with Web 2.0 companies and technologies to begin developing the design patterns that distinguish them. We use the term “Web 1.0” to refer to the Web as it was understood in the period of around 1995–2000, though obviously it’s not a simple matter of dates. To help us get started, Figure 3.1, “Tim’s list of Web 1.0 versus Web 2.0 examples” shows again the list of Web 1.0 and Web 2.0 examples that Tim O’Reilly and others compiled during an initial brainstorming session to get a “feel” for what Web 2.0 was.

Figure 3.1. Tim’s list of Web 1.0 versus Web 2.0 examples

Tim’s list of Web 1.0 versus Web 2.0 examples

Note

It’s important to note that some of the Web 1.0 companies included in Figure 3.1, “Tim’s list of Web 1.0 versus Web 2.0 examples” have evolved substantially since Tim made his original comparison. For the latest information, definitely visit each company’s website. Tim’s choosing them as examples actually speaks to their success in that earlier age.

Although several of the companies we use as examples in this chapter are large enterprises or corporations, the patterns themselves don’t apply only to enterprises. In fact, the value of patterns is that you can remove them from an enterprise context and reuse them in other applications. For example, Service-Oriented Architecture (SOA) is a pattern of exposing capabilities to potential end users. Whether the SOA pattern is used by online gamers to access the states of each other’s joysticks or by large enterprises to reach into their customer relationship management (CRM) systems and provide users of their websites with rich interactive experiences, the core pattern is the same when abstracted to a high enough level. In both cases, a service offers some functionality or capability that another entity consumes.

DoubleClick and Google AdSense

Before we compare these two companies, we must point out that DoubleClick has vastly enhanced its platform since it was formed; so much so, in fact, that Google acquired DoubleClick in 2007 to further broaden its media advertising ambitions.[32] Therefore, instead of specifically illustrating DoubleClick’s original ad model, we’ll illustrate the generic pattern of banner ad impression sales that many online advertising companies used in the late 1990s.

Applicable Web 2.0 Patterns

Watch for illustrations of the following patterns in this discussion:

  • Software as a Service (SaaS)

  • Mashup

  • Rich User Experience

  • Semantic Web Grounding

  • Asynchronous Particle Update

You can find more information on these patterns in Chapter 7, Specific Patterns of Web 2.0.

Advertising in Context

Banner ad placement originally operated on a simplistic model whereby advertisers purchased banner ads in lots (typically of 1,000 or more), and the banners were then placed on websites. The placement of these banner ads was often billed based solely on impressions, regardless of whether anyone actually clicked on the banners. This online advertising model clearly had room for improvement.

Initially, one of the main issues facing advertisers was the lack of any guarantee that the ads were effective; however, this problem was mitigated by the use of tracking software and new business models that charged based on the number of click-throughs. Another issue concerned the fact that some larger companies offering such services asked webmasters to place code in their sites and then served up ads whenever someone issued a request for a page containing that code. It was therefore quite possible that ads aimed at golfers, for example, might appear on fishing or other websites not concerned with golf. The placement pattern looked a lot like Figure 3.2, “Basic pattern of banner ad placement”.

Figure 3.2. Basic pattern of banner ad placement

Basic pattern of banner ad placement

In contrast, Google AdSense is a paid ad service that serves contextually specific ads on web pages and tracks the number of clicks on each ad by visitors to those pages. This form of ad delivery uses a simple yet effective pattern of contextual targeting. Rather than just advertising blindly, AdSense attempts to quantify the context of a user’s experience based on a keyword score within the web pages containing the ads. AdSense then cross-references the keywords with a list of potential target ads that might be of interest to the user of that web resource. As a result, visitors to a web page on golfing will typically see golf-related advertisements rather than completely random content. AdSense also lets web page owners filter out competitors’ ads. For example, a golf club manufacturer could block competing companies’ ads from being displayed on its website. This is a highly useful pattern for preventing competitors from targeting a website owner’s customers. Figure 3.3, “Contextual serving of ads based on user profile patterns” shows an example of this pattern.

Figure 3.3. Contextual serving of ads based on user profile patterns

Contextual serving of ads based on user profile patterns

Also attracting website owners to AdSense is the fact that ad revenues are split between Google and the website owner. Other banner ad companies also use this revenue model, but AdSense users have a better chance of increasing their revenues because the ads on their sites are contextually specialized for their audiences, so users are more likely to click on them. Given the fact that net revenue from a website must be calculated once the costs of hosting are subtracted, it makes more business sense to go with a contextual pattern such as that offered by AdSense than with a non-contextual pattern.

A Peek at the Future of Online Advertising

Serving contextually specific information based on a single site visit is only one aspect of how online advertising is changing. With the evolution of the Internet and some underlying technologies, the science of targeted advertising is reaching new heights. Dr. Usama Fayyad, chief data officer and senior vice president of Research & Strategic Data Solutions at Yahoo!, stated in the March 2007 issue of Business 2.0 magazine, “I know more about your intent than any 1,000 keywords you could type.”[33] He knows this because of his Yahoo! research into the click-stream consciousness of web users. Dr. Fayyad is an actual rocket scientist who worked at NASA’s Jet Propulsion Laboratory before moving to Yahoo! to manage the roughly 12 terabytes of user data—more than the entire content of the Library of Congress—that Yahoo! collects every day.

Yahoo! tracks user behavior with a multitude of technologies, including cookies, user account activity, bounce rates, and searches. The major search engine vendors have acquired the ability to build comprehensive user profiles based not just on contextual information from a single web page, but on many aspects of a user’s behavior. For instance, Yahoo! and Google have created services that consumers can use to help them build successful businesses and/or websites. Along those lines, Yahoo!’s acquisition of Overture let people target search terms based on available inventory. Overture’s tools can tell an advertiser how many people search for a specific term in a given month, as well as suggesting similar terms.

Another trend in Web 2.0 advertising is the move away from traditional graphic banner ads and toward text and video media. Bandwidth-light text has an advantage thanks in part to the increasing use of cell phones as people’s primary devices for connecting to the Internet. Jupiter Research reported a trend of growth in all three categories (text, graphical banners, and video), implying either that there are more advertisers or that advertisers are continuing to pull financial resources away from traditional media such as television, magazines, and newspapers.[34] This phenomenon must be somewhat scary to the incumbent media giants, especially when coupled with the recent history of small upstart Internet companies becoming the largest media sources within just a few years (YouTube and MySpace are good examples).

A third trend concerns the delivery of ever more targeted content. Engaging users in a context in which they’re open to an ad’s content requires walking a narrow line. Many bloggers make a few dollars a month on Google AdWords, but some deliver a more immersive experience, even using text ads within RSS feeds and carrying ads into their readers’ aggregators. This can be effective, but if consumers are bombarded with ads, the entire mechanism starts to void itself out, as the human mind starts to filter out too-frequent advertisements.

Web 2.0 hasn’t done much about the pressing question of how society will continue to react to ads that are often perceived as intrusive and unwelcome. We’re referring to one of the most hated words since the dawn of the Internet: spam.

Email spam is probably the most despised form of advertising on the Web. Despite numerous mechanisms (such as spam filters and legislation) to control it, spam is still rampant. Like spam, banner ads also permeate many corners of the Internet and are common on many web pages. Do banner ads drive people away or do they provide value? Users have expressed time and again that sites that are uncluttered with commercial messages are more attractive. Google was widely heralded as setting a new model for search engines with a simple, noncommercial interface, although web historians could point out that Alta Vista was just as clean in its lack of commercialism. Consumers and users flocked to Google.com when it launched: it provided information they wanted without bombarding them with advertising. Similar models have evolved from companies such as Flickr (discussed in the next section), although the old-world ways of commercial ads still permeate much of the Internet landscape (even to pervasive levels within newer presences such as YouTube and MySpace).

Some have even organized communities to fight advertising. The most notable is Adbusters, an organization based in Vancouver, Canada. Adbusters is a global network of artists, activists, writers, pranksters, students, educators, and entrepreneurs who want a new social activism movement piggybacked on the information age. Their goal is simple: anarchy. Their aim is to topple existing power structures and forge a major shift in the way we’ll live in the 21st century. In a similar vein, Canadian film producer Jill Sharpe released a documentary called Culture Jam,[35] a stab back at our mainstream media and advertising agencies. “Culture jamming” is a form of public activism that is generally in opposition to commercialism.[36]

Despite these challenges, Google AdSense delivers a service that many people use and serves ad content that can be mashed into most websites. Google has provided many other value ad services that make it easy for anyone to become an advertiser and get some value for her budget. Google’s stock price continues to rise as a reflection of its perceived strength and dominant position in the new advertising industry.

Ofoto and Flickr

Ofoto began life as an online photography service based in Berkeley, California. The service provided three basic features:

Ofoto later added a 35mm online film processing service and an online frame store, as well as some other services, but its core pattern still embraced a core model of static publishing. In May 2001, Eastman Kodak purchased Ofoto, and the Ofoto Web Service was rebranded in 2005 as the Kodak EasyShare Gallery.

Flickr is another photo-sharing platform, but it was built with the online community in mind, rather than the idea of selling prints. Flickr made it simple for people to tag or comment on each other’s images, and for developers to incorporate Flickr into their own applications. Flickr is properly a community platform and is justifiably seen as one of the exemplars of the Web 2.0 movement. The site’s design and even the dropped e in the company name are now firmly established in Web 2.0’s vernacular.

Applicable Web 2.0 Patterns

This comparison involves the following patterns:

  • Software as a Service (SaaS)

  • Participation-Collaboration

  • Mashup

  • Rich User Experience

  • The Synchronized Web

  • Collaborative Tagging

  • Declarative Living and Tag Gardening

  • Persistent Rights Management

You can find more information on these patterns in Chapter 7, Specific Patterns of Web 2.0.

Collaboration and Tagging

Flickr is often used as an information source for other Web 2.0 platforms or mechanisms. It offers simple application programming interfaces (APIs) for accessing its content, enabling third parties to present images in new contexts and to access and use Flickr’s services in their own mashups or other applications. Bloggers commonly use it as an online photo repository that they can easily connect to their own sites, but the APIs offer much more opportunity than that. Programmers can create applications that can perform almost any function available on the Flickr website. The list of possible operations is vast and covers most of the normal graphical user interface’s capabilities.

Note

Flickr also lets developers choose which tools they want to use to access its services. It supports a REST-like interface, the XML Remote Procedure Call (XML-RPC), and SOAP (and responses in all three of those), plus JSON and PHP. For more, see http://www.flickr.com/services/api/.

Developers can easily repurpose Flickr’s core content in mashups, thanks to its open architecture and collaborative nature. A mashup combines information or computing resources from multiple services into a single new application. Often, in the resulting view two or more applications appear to be working together. A classic example of a mashup would be to overlay Google Maps with Craigslist housing/rental listings or listings of items for sale in the displayed region.

Flickr’s API and support for mashups are part of a larger goal: encouraging collaboration on the site, drawing in more users who can then make each others’ content more valuable. Flickr’s value lies partly in its large catalog of photos, but also in the metadata users provide to help themselves navigate that huge collection.

When owners originally upload their digital assets to Flickr, they can use keyword tags to categorize their work. In theory, they do this to make it easier to search for and locate digital photos. However, having users tag their photos themselves only starts to solve search problems. A single view of keywords won’t work reliably, because people think independently and are likely to assign different keywords to the same images. Allowing other people to provide their own tags builds a much richer and more useful indexing system, often called a folksonomy.

A folksonomy (as opposed to a top-down taxonomy) is built over time via contributions by multiple humans or agents interacting with a resource. Those humans or agents apply tags—natural-language words or phrases—that they feel accurately label what the resource represents. The tags are then available for others to view, sharing clues about the resource. The theory behind folksonomies is that because they include a large number of perspectives, the resulting set of tags will align with most people’s views of the resources in question.[37] The tags may even be in disparate languages, making them globally useful.

Consider an example. Say you upload a photo of an automobile and tag it as such. Even though a human would understand that someone looking for “automobile” might find photos tagged with “car” to be relevant, if the system used only a simple form of text matching someone searching for “vehicle,” “car,” or “transportation” might not find your image. This is because a comparison of the string of characters in “automobile” and a search string such as “car” won’t produce a positive match. By letting others add their own tags to resources, Flickr increases the number of tags for each photo and thereby increases the likelihood that searchers will find what they’re looking for. In addition to tagging your “automobile” photo with related words such as “vehicle,” “car,” or “transportation,” viewers might also use tags that are tangentially relevant (perhaps you thought the automobile was the core subject of the photo, but someone else might notice the nice “sunset” in the background and use that tag).

With this in mind, how would you tag the photo in Figure 3.4, “How would you tag this photo?”?

Figure 3.4. How would you tag this photo?

How would you tag this photo?

We might tag the photo in Figure 3.4, “How would you tag this photo?” with the following keywords: “mountain,” “bike,” “Duane,” “Nickull,” “1996,” “dual,” and “slalom.” With Flickr, others can tag the photo with additional meaningful keywords, such as “cycling,” “competition,” “race,” “bicycle,” and “off-road,” making subsequent searches more fruitful. Semantic tagging may require more thought, but as a general rule, the more minds there are adding more tags, the better the folksonomy will turn out.

Flickr has also built an interface that lets people visiting the site see the most popular tags. This is implemented as a tag cloud, an example of which appears in Figure 3.5, “Flickr tag cloud, from ”.

Figure 3.5. Flickr tag cloud, from http://www.flickr.com/photos/tags/

Flickr tag cloud, from

The tag cloud illustrates the value of a bidirectional visibility relationship between resources and tags. If you’re viewing a resource, you can find the tags with which the resource has been tagged. The more times a tag has been applied to a resource, the larger it appears in the tag cloud. You can also click on a tag to see what other assets are tagged with the same term.

Another advancement Flickr offers is the ability to categorize photos into sets, or groups of photos that fall under the same metadata categories or headings. Flickr’s sets represent a form of categorical metadata rather than a physical hierarchy. Sets can contain an infinite number of photos and may exist in the absence of any photos. Photos can exist independently of any sets; they don’t have to be members of a set yet can be members of any number of sets. These sets demonstrate capabilities far beyond those of traditional photo albums, given that each digital photo can belong to multiple sets, one set, or no sets. In the physical world, this can’t happen without making multiple copies of the photos.

Akamai and BitTorrent

Both Akamai and BitTorrent address the challenge of distributing large volumes of information across huge networks, striving to minimize bandwidth consumption and delays that users might notice. Their approaches to solving these problems, however, are very different.

Applicable Web 2.0 Patterns

This comparison discusses the following patterns:

  • Service-Oriented Architecture

  • Software as a Service

  • Participation-Collaboration

  • The Synchronized Web

You can find more information on these patterns in Chapter 7, Specific Patterns of Web 2.0.

Alternate Solutions to Bandwidth

Akamai and BitTorrent both avoid the issue of a single host trying to supply bandwidth-intensive content to a potentially global audience. A single server starts to slow down as it reaches its maximum capability, and the network in the immediate vicinity of the host server is affected because it’s handling a higher amount of traffic.

Again, the incumbent in this case (Akamai) has significantly changed its mechanics and infrastructure since the date of the original brainstorming session when the comparison of Web 1.0 and Web 2.0 was made (as depicted in Figure 3.1, “Tim’s list of Web 1.0 versus Web 2.0 examples”). Accordingly, understanding the patterns and advantages of each system is a good idea for budding Web 2.0 entrepreneurs. You shouldn’t view Akamai as antiquated. It is performing tremendously well financially, far outstripping many of the Web 2.0 companies mentioned in this book. It’s been one of NASDAQ’s top-performing stocks, reporting 47% growth and revenues of $636 million in 2007. With 26,000 servers, Akamai is also a huge Internet infrastructure asset.

Akamai’s original approach was to sell customers a distributed content-caching service. Its aim was simply to resolve bandwidth issues, and it solved that problem very well. If a customer like CNN News decided to host a video of a newscast, the content on the CNN server would be pulled through the Akamai network. The centrally located CNN server bank would modify the URIs of the video and other bandwidth-intensive content by morphing them to URLs of resources that were easier for the client making the request to access, often because they were hosted in physically closer locations. The client’s browser would load the HTML template, which would tell it to hit the Akamai network for the additional resources it required to complete the content-rendering process. At the time of this writing, end users do not see any indication of Akamai.com being used (although streaming videos do require modification of URLs).

Figure 3.6, “Overview of Akamai core pattern (courtesy of Akamai)” shows Akamai’s core architecture (as analyzed when used in Figure 3.1, “Tim’s list of Web 1.0 versus Web 2.0 examples”).

Figure 3.6. Overview of Akamai core pattern (courtesy of Akamai)

Overview of Akamai core pattern (courtesy of Akamai)

Pulling richer media (the larger files) from a system closer to the end user improves the user experience because it results in faster-loading content and streams that are more reliable and less susceptible to changes in routing or bandwidth capabilities between the source and target. Note that the Akamai EdgeComputing infrastructure is federated worldwide and users can pull files as required. Although Akamai is best known for handling HTML, graphics, and video content, it also offers accelerators for business applications such as WebSphere and SAP and has a new suite to accelerate AJAX applications.

BitTorrent is also a technology for distributing large amounts of data widely, without the original distributor incurring all the costs associated with hardware, hosting, and bandwidth resources. However, as illustrated in Figure 3.7, “BitTorrent’s pattern of P2P distribution”, it uses a peer-to-peer (P2P) architecture quite different from Akamai’s. Instead of the distributor alone servicing each recipient, in BitTorrent the recipients also supply data to newer recipients, significantly reducing the cost and burden on any one source, providing redundancy against system problems, and reducing dependence on the original distributor. This encompasses the concept of a “web of participation,” often touted as one of the key changes in Web 2.0.

Figure 3.7. BitTorrent’s pattern of P2P distribution

BitTorrent’s pattern of P2P distribution

BitTorrent enables this pattern by getting its users to download and install a client application that acts as a peer node to regulate upstream and downstream caching of content. The viral-like propagation of files provides newer clients with several places from which they can retrieve files, making their download experiences smoother and faster than if they all downloaded from a single web server. Each person participates in such a way that the costs of keeping the network up and running are shared, mitigating bottlenecks in network traffic. It’s a classic architecture of participation and so qualifies for Web 2.0 status, even if BitTorrent is not strictly a “web app.”

The BitTorrent protocol is open to anyone who wants to implement it. Using the protocol, each connected peer should be able to prepare, request, and transmit files over the network. To use the BitTorrent protocol to share a file, the owner of the file must first create a “torrent” file. The usual convention is to append .torrent to the end of the filename. Every *.torrent file must specify the URL of the tracker via an “announce” element. The file also contains an “info” section that contains a (suggested) name for the file, its length, and its metadata. BitTorrent clients use the Secure Hashing Algorithm-1 (SHA-1) to make declarations that let any client detect whether the file is intact and complete.

Decentralization has always been a hallmark of the Internet, appearing in many different guises that come (and sometimes go) in waves. Architecturally, this pattern represents a great way to guard against points of failure or slowdowns, as it is both self-scaling and self-healing.[38] A very elegant architectural trait of peer to peer in particular is that the more people there are interested in a file, the more it will propagate, resulting in more copies being available for download to help meet the demand.

MP3.com and Napster

By the time the first Web 2.0 conversations started, the first incarnations of MP3.com and Napster were both effectively history. Neither of them was particularly well liked by the music industry, for reasons that feed into Web 2.0 but aren’t critical to the comparison between them. Their business stories share a common thread of major shift in the way music is distributed, but the way they went about actually transferring music files was very different, mirroring the Akamai/BitTorrent story in many ways.

Applicable Web 2.0 Patterns

Some of the technical patterns illustrated by this comparison are:

  • Service-Oriented Architecture

  • Software as a Service

  • Participation-Collaboration

  • The Synchronized Web

  • Collaborative Tagging

  • Declarative Living and Tag Gardening

  • Persistent Rights Management

You can find more information on these patterns in Chapter 7, Specific Patterns of Web 2.0.

Shifting Patterns and Costs of Music Distribution

The music industry has historically been composed of three main groups: those who create music (writing, recording, or producing it); those who consume it; and those who are part of the conventional recording and music distribution industry, who sit in the middle (see Figure 3.8, “Conventional music industry model”).

Figure 3.8. Conventional music industry model

Conventional music industry model

Historically, music publishing and distribution has been done via physical media, from 78s to CDs. If you abstract the pattern of this entire process, you can easily see that the storage of music on physical media is grossly inefficient (see Figure 3.9, “Distribution pattern for audio content”).

Figure 3.9. Distribution pattern for audio content

Distribution pattern for audio content

Figure 3.10, “The electronic music distribution pattern” contains two “Digital Signal” points in the sequence. The persistence to some form of physical storage medium is unnecessary for people who are capable of working directly with the digital signal. If you’re providing a digital signal at the source, the signal can travel from the source to its ultimate target in this digital form, and the media is consumed as a digital signal, why would it make sense to use a non-digital storage medium (such as CD, vinyl, or tape) as an intermediate step? The shift to digital MP3 files has made the middle steps unnecessary. Figure 3.10, “The electronic music distribution pattern” depicts a simpler model that has many advantages, except to those whose business models depend on physical distribution.

Figure 3.10. The electronic music distribution pattern

The electronic music distribution pattern

For instance, this new pattern is better for the environment, because it does not involve turning petroleum products into records and CDs, or transporting physical goods thousands of miles. It satisfies people’s cravings for instant gratification, and it lets consumers store the music on physical media if they want, by burning CDs or recording digital audio tapes.

The old model also had one massive stumbling block: it arguably suppressed a large percentage of artists. For a conventional record company to sign a new artist, it must make a substantial investment in that artist. This covers costs associated with such things as recording the music, building the die for pressing it into physical media, and printing CD case covers, as well as the costs associated with manufacturing and distributing the media. The initial costs are substantial: even an artist who perhaps produces only 250,000 CDs may cost a record company $500,000 to sign initially. This doesn’t include the costs of promoting the artist or making music videos. Estimates vary significantly, but it’s our opinion that as a result, the conventional industry signs only one out of every 10,000 artists or so. If a higher percentage were signed, it might dilute each artist’s visibility and ability to perform. After all, there are only so many venues and only so many people willing to go to live shows.

An industry size issue compounds this problem. If the global market were flooded with product, each artist could expect to capture a certain portion of that market. For argument’s sake, let’s assume that each artist garners 1,000 CD sales on average. Increasing the total number of artists would cause each artist’s share of the market to decrease. For the companies managing the physical inventory, it’s counterproductive to have too much product available in the marketplace. As more products came to market, the dilution factor would impact sales of existing music to the point where it might jeopardize the record company’s ability to recoup its initial investment in each artist.

Note

Love’s Manifesto, a speech given by Courtney Love during a music conference, illuminates several of the problems inherent in the music industry today and is a brilliant exposé of what is wrong with the industry as a whole (pun intended) and the realities faced by artists. You can read the speech online at http://www.indie-music.com/modules.php?name=News&file=article&sid=820.

Producers and online distributors of digital music benefit from two major cost reductions. In addition to not having to deal with physical inventory and all its costs, they also offload the cost of recording music onto the artists, minimizing some of the risk associated with distributing the work of previously unsigned bands. These companies often adopt a more “hands off” approach. Unlike conventional record companies, online MP3 retailers can easily acquire huge libraries of thousands of new, previously unsigned artists. They don’t need to censor whose music they can publish based on their perceptions of the marketplace, because adding tracks to their labels poses minimal risk. (They do still face some of the same legal issues as their conventional predecessors, though—notably, those associated with copyright.)

This approach also has significant benefits for many artists. Instead of having to convince a record company that they’ll sell enough music to make the initial outlay worthwhile, new independent artists can go directly to the market and build their own followings, demonstrating to record companies why they’re worth signing. AFI, for example, was the first MySpace band to receive more than 500,000 listens in one day. Self-promotion and building up their own followings allows clever artists to avoid record companies while still achieving some success.

In this model, artists become responsible for creating their own music. Once they have content, they may publish their music via companies such as Napster and MP3.com. Imagine a fictional company called OurCo. OurCo can assimilate the best of both the old and the new distribution models and act as a private label distribution engine, as depicted in Figure 3.11, “The best of the old and the new distribution models”.

Figure 3.11. The best of the old and the new distribution models

The best of the old and the new distribution models

Figure 3.12. Music creation workflow

Music creation workflow

MP3.com and Napster Infrastructures

When analyzing P2P infrastructures, we must recognize the sophistication of the current file-sharing infrastructures. The concepts of a web of participation and collaboration form the backbone of how resources flow and stream in Web 2.0. Napster is a prime example of how P2P networks can become popular in a short time and—in stark contrast to MP3.com—can embrace the concepts of participation and collaboration among users.

MP3.com, started because its founder realized that many people were searching for “mp3,” was originally launched as a website where members could share their MP3 files with each other.

Note

The original MP3.com ceased to operate at the end of 2003. CNET now operates the domain name, supplying artist information and other metadata regarding audio files.

The first iteration of MP3.com featured charts defined by genre and geographical area, as well as statistical data for artists indicating which of their songs were more popular. Artists could subscribe to a free account, a Gold account, or a Platinum account, each providing additional features and stats. Though there was no charge for downloading music from MP3.com, people did have to sign up with an email address, and online advertisements were commonplace across the site. Although MP3.com hosted songs from known artists, the vast majority of the playlist comprised songs by unsigned or independent musicians and producers. Eventually MP3.com launched “Pay for Play,” which was a major upset to the established music industry. The idea was that each artist would receive payments based on the number of listens or downloads from the MP3.com site.

The original technical model that MP3.com employed was a typical client/server pattern using a set of centralized servers, as shown in Figure 3.13, “Typical client/server architecture model”.

Figure 3.13. Typical client/server architecture model

Typical client/server architecture model

MP3.com engineers eventually changed to a new model (perhaps due to scalability issues) that used a set of federated servers acting as proxies for the main server. This variation of the original architectural pattern—depicted in Figure 3.14, “Load-balanced client/server pattern with proxies” using load balancing and clusters of servers—was a great way to distribute resources and balance loads, but it still burdened MP3.com with the expense of hosting files. (Note that in a P2P system, clients are referred to as “nodes,” as they are no longer mere receivers of content: each node in a P2P network is capable of acting as both client and server.)

Figure 3.14. Load-balanced client/server pattern with proxies

Load-balanced client/server pattern with proxies

In Figure 3.14, “Load-balanced client/server pattern with proxies”, all nodes first communicate with the load balancing server to find out where to resolve or retrieve the resources they require. The load balancing server replies with the information based on its knowledge of which proxies are in a position to serve the requested resources. Based on that information, each node makes a direct request to the appropriate node. This pattern is common in many web architectures today.

Napster took a different path. Rather than maintaining the overhead of a direct client/server infrastructure, Napster revolutionized the industry by introducing the concept of a shared, decentralized P2P architecture. It worked quite differently from the typical client/server model but was very similar conceptually to the BitTorrent model. One key central component remained: keeping lists of all of the peers for easy searching. This component not only created scalability issues, but also exposed the company to the legal liability that ultimately did it in.

Napster also introduced a pattern of “Opting Out, Not Opting In.” As soon as you downloaded and installed the Napster client software, you became, by default, part of a massive P2P network of music file sharers. Unless you specifically opted out, you remained part of the network. This allowed Napster to grow at an exponential rate. It also landed several Napster users in legal trouble, as they did not fully understand the consequences of installing the software.

P2P architectures can generally be classified into two main types. The first is a pure P2P architecture where each node acts as a client and a server. There is no central server or DNS-type node to coordinate traffic, and all traffic is routed based on each node’s knowledge of other nodes and the protocols used. BitTorrent, for example, can operate in this mode. This type of network architecture (also referred to as an ad hoc architecture) works when nodes are configured to act as both servers and clients. It is similar conceptually to how mobile radios work, except that it uses a point-to-point cast rather than a broadcast communication protocol. Figure 3.15, “Ad hoc P2P network” depicts this type of network.

Figure 3.15. Ad hoc P2P network

Ad hoc P2P network

In this pure-play P2P network, no central authority determines or orchestrates the actions of the other nodes. By comparison, a centrally orchestrated P2P network includes a central authority that takes care of orchestration and essentially acts as a traffic cop, as shown in Figure 3.16, “Centrally orchestrated P2P network”.

Figure 3.16. Centrally orchestrated P2P network

Centrally orchestrated P2P network

The control node in Figure 3.16, “Centrally orchestrated P2P network” keeps track of the status and libraries of each peer node to help orchestrate where other nodes can find the information they seek. Peers themselves store the information and can act as both clients and servers. Each node is responsible for updating the central authority regarding its status and resources.

Napster itself was a sort of hybrid P2P system, allowing direct P2P traffic and maintaining some control over resources to facilitate resource location. Figure 3.17, “Conceptual view of Napster’s mostly P2P architecture” shows the classic Napster architecture.

Figure 3.17. Conceptual view of Napster’s mostly P2P architecture

Conceptual view of Napster’s mostly P2P architecture

Napster central directories tracked the titles of content in each P2P node. When users signed up for and downloaded Napster, they ended up with the P2P node software running on their own machines. This software pushed information to the Napster domain. Each node searching for content first communicated with the IP Sprayer/Redirector via the Napster domain. The IP Sprayer/Redirector maintained knowledge of the state of the entire network via the directory servers and redirected nodes to nodes that were able to fulfill its requests. Napster, and other companies such as LimeWire, are based on hybrid P2P patterns because they also allow direct node-to-node ad hoc connections for some types of communication.

Both Napster and MP3.com, despite now being defunct, revolutionized the music industry. MySpace.com has since added a new dimension into the mix: social networking. Social networking layered on top of the music distribution model continues to evolve, creating new opportunities for musicians and fans.

Britannica Online and Wikipedia

A disruptive technology can do more than cost a business money. Sometimes the disruption extends so deep that the virtues of the business’s past become problems, and techniques that would previously have been vices suddenly become virtues. The emergence of Wikipedia and its overshadowing of the Encyclopedia Britannica is one case where the rules changed decisively in favor of an upstart challenger.

Applicable Web 2.0 Patterns

The collaborative encyclopedia approach ushered in by Wikipedia capitalizes on several Web 2.0 patterns:

  • Software as a Service

  • Participation-Collaboration

  • Rich User Experience

  • The Synchronized Web

  • Collaborative Tagging

You can find more information on these patterns in Chapter 7, Specific Patterns of Web 2.0.

From a Scholarly to a Collaborative Model

The Encyclopedia Britannica was originally published in 1768 as a three-volume set, emerging from the intellectual churn of Edinburgh. It grew quickly, reaching 21 volumes by 1801, and over the next two centuries, it solidified its reputation as a comprehensive reference to the world. Producing the printed tomes was a complex and expensive enterprise, requiring editors to judge how long to leave an edition in print, how much to change between editions, what new material to cover, and who should cover it.

The possibility of an electronic edition was in many ways a relief at first. The Encyclopedia Britannica took huge strides during the computer revolution to survive a changing world. In the mid-1990s, the static book publisher tried bundling an Encyclopedia Britannica CD with some PCs. That experiment was short-lived, as it soon became obvious that any publishing effort in the new digital age had to be dynamic. The company then migrated its entire encyclopedia set to the Web, where it was free of many of the edition-by-edition obstacles to updating that had limited its print and CD editions.

Although this was a daring move, and Britannica continues to sell its content online, the model behind the encyclopedia’s creation now faced a major challenge from newcomer Wikipedia. Whereas Encyclopedia Britannica had relied upon experts and editors to create its entries, Wikipedia threw the doors open to anyone who wanted to contribute. While it seemed obvious to many that an encyclopedia created by volunteers—many of them non-experts, many of them anonymous, and some of them actually out to cause trouble—just had to be a terrible idea, Wikipedia has thrived nonetheless. Even Wikipedia’s founders didn’t quite know what they were getting into—Wikipedia was originally supposed to feed into a much more formal, peer-reviewed Nupedia.

In Wikipedia, rather than one authority (typically a committee of scholars) centrally defining all subjects and content, people all over the world who are interested in a certain topic can collaborate asynchronously to create a living, breathing work. Wikipedia combines the collaborative aspects of wiki sites (websites that let visitors add, remove, edit, and change content) with the presentation of authoritative content built on rich hyperlinks between subjects to facilitate ultra-fast cross-references of facts and claims.

Wikipedia does have editors, but everyone is welcome to edit. Volunteers emerge over time, editing and re-editing articles that interest them. Consistency and quality improve as more people participate, though the content isn’t always perfect when first published. Anonymous visitors often make edits to correct typos or other minor errors. Defending the site against vandals (or just people with agendas) can be a challenge, especially on controversial topics, but so far the site seems to have held up. Wikipedia’s openness allows it to cover nearly anything, which has created some complications as editors deleted pages they didn’t consider worthy of inclusion. It’s always a conversation.

The shift from a top-down editorial approach to a bottom-up approach is a painful reversal for people who expect only expert advice when they look up something—and perhaps an even harder reversal for people who’ve built their careers on being experts or editors. Businesses facing this kind of competition need to study whether their business models are sustainable, and whether it is possible to incorporate the bottom-up approach into their own work.

Personal Websites and Blogs

The term blog is short for weblog, a personal log (or diary) that is published on the Internet. In many cases, blogs are what personal websites were initially meant to be. Many early website gurus preached the idea that online content should always be fresh and new to keep traffic coming back. That concept holds just as true now as it did then—the content has just shifted form.

Applicable Web 2.0 Patterns

Many blogs embrace a variety of the core patterns discussed in Chapter 7, Specific Patterns of Web 2.0, such as:

  • Participation-Collaboration

  • Collaborative Tagging

  • Declarative Living and Tag Gardening

  • Software as a Service

  • Asynchronous Particle Update (the pattern behind AJAX)

  • The Synchronized Web

  • Structured Information (Microformats)

Shifting to Blogs and Beyond

Static personal websites were, like most websites, intended to be sources of information about specific subjects. The goal of a website was to pass information from its steward to its consumers. Some consumers might visit certain websites (personal or otherwise) only once to retrieve the information they sought; however, certain groups of users might wish to visit again to receive updated information.

In some ways, active blogs are simply personal websites that are regularly updated, though most blog platforms support features that illustrate different patterns of use. Because there are no hard rules for how frequently either a blog or a personal website should be updated, nor is it possible to classify either in a general sense, it is probably not possible to identify clear differences as patterns. However, here are a few key points that differentiate blogs:

  • Blogs are built from posts—often short posts—which are usually displayed in reverse chronological order (newest first) on an organizing front page. Many blogs also support some kind of archive for older posts.

  • Personal websites and blogs are both published in HTML. Blog publishing, however, usually uses a slightly different model from traditional HTML website publishing. Most blog platforms don’t require authors to write HTML, letting them simply enter text for the blog in an online form. Blog hosting generally allows users to know less about their infrastructure than classic HTML publishing. Blogs’ ease of use makes them attractive to Internet users who want a web presence but have not yet bothered to learn about HTML, scripts, HTTP, FTP, and other technologies.

  • Blogs often include some aspects of social networking. Mechanisms such as a blogroll (a list of other blogs to which the blog owner wishes to link from his blog) create mini-communities of like-minded individuals. A blogroll is a great example of the Declarative Living pattern documented in Chapter 7, Specific Patterns of Web 2.0. Comment threads can also help create small communities around websites.

  • Blogs support mechanisms for publishing information that can be retrieved via multiple patterns (like Search and Retrieve, Push, or Direct Request). Instead of readers having to request the page via HTTP GETs, they can subscribe to feeds (including Atom and RSS) to receive new posts in a different form and on a schedule more convenient to them.

Standard blog software (e.g., Blogger or WordPress) has evolved well beyond simple tools for presenting posts. The software allows readers to add their own content, tag content, create blogrolls, and host discussion forums. Some blog management software lets readers register to receive notifications when there are updates to various sections of a blog. The syndication functionality of RSS (or Atom) has become a core element of many blogs. Many blogs are updated on a daily basis, yet readers might not want to have to reload the blog page over and over until a new post is made (most blog authors do not post exact schedules listing the times their blogs are updated). It is much more efficient if the reader can register interest and then receive a notification whenever new content is published. RSS also describes the content so that readers can decide whether they want to view the actual blog.

Blogs are also moving away from pure text and graphics. All kinds of blog mutations are cropping up, including mobile blogs (known as moblogs), video blogs, and even group blogs.

Developers are adding tools that emphasize patterns of social interactions surrounding blogs. MyBlogLog.com has software that uses an AJAX widget to place the details of readers of a blog on the blog page itself so that you can see who else has been reading a specific blog. Figure 3.18, “Screenshot of MyBlogLog.com blog widget” shows the latest readers of the Technoracle blog at the time of this writing.[39]

Figure 3.18. Screenshot of MyBlogLog.com blog widget

Screenshot of MyBlogLog.com blog widget

Most blog software also offers the ability to socially network with like-minded bloggers by adding them to your blogroll. Having your blog appear on other people’s blogrolls helps to elevate your blog’s status in search engines, as well as in blog directories such as Technorati that track blog popularity. It also makes a statement about your personality and your stance on a variety of subjects. Figure 3.19, “Example of a blogroll from Technoracle.blogspot.com” shows an example of a blogroll.

Figure 3.19. Example of a blogroll from Technoracle.blogspot.com

Example of a blogroll from Technoracle.blogspot.com

A blogroll is a good example of the Declarative Living and Tag Gardening pattern, as the list of fellow bloggers in some ways tags the person who posts it. By making a statement regarding whose blogs they encourage their readers to read, blog owners are declaring something about themselves. Blog readers can learn more about blog writers by looking at who they have on their blogrolls. For example, in Figure 3.19, “Example of a blogroll from Technoracle.blogspot.com”, knowing that John Lydon is in fact Johnny Rotten, the singer for the Sex Pistols, may imply to a reader that the owner of Technoracle has a somewhat disruptive personality and will try to speak the truth, even if it’s unpopular.

Blogs lowered the technical barrier for getting a personal presence on the Internet, making it much easier for many more people to join the conversation. Blogs have also changed the patterns of dissemination of information. Rather than simply reading a news story on a particular topic, interested readers can also find related blogs and find out what the average person thinks about that topic. Blogs represent a new kind of media and offer an alternative source for people who want more than news headlines.

More recently, blogs have evolved beyond their basic form. Blogs have become one of many components in social networking systems like MySpace and Facebook: one component in pages people use to connect with others, not merely to present their own ideas. Going in a different direction, Twitter has stripped blogging down to a 140-character minimalist approach, encouraging people to post tiny bits of information on a regular basis and providing tools for following people’s feeds.

Screen Scraping and Web Services

Even in the early days of the Web, developers looked for ways to combine information from multiple sites. Back then, this meant screen scraping—writing code to dig through loosely structured HTML and extract the vital pieces—which was often a troublesome process. As Web 2.0 emerged, more and more of that information became available through web services, which presented it in a much more structured and more readily usable form.

Applicable Web 2.0 Patterns

These two types of content grabbing illustrate the following patterns:

  • Service-Oriented Architecture

  • Collaborative Tagging

You can find more information on these patterns in Chapter 7, Specific Patterns of Web 2.0.

Intent and Interaction

In the earliest days of the Web, screen scraping often meant capturing information from the text-based interfaces of terminal applications to repurpose them for use in web applications, but the same technology was quickly turned to websites themselves. HTML is, after all, a text-based format, if a loosely (or even chaotically, sometimes) structured one. Web services, on the other hand, are protocols and standards from various standards bodies that can be used to programmatically allow access to resources in a predictable way. XML made the web services revolution when it made it easy to create structured, labeled, and portable data.

Note

There is no specific standardized definition of web services that explains the exact set of protocols and specifications that make up the stack, but there is a set that is generally accepted. It’s important to examine the web services architecture document from the W3C to get a feel for what is meant by “web services.” When this book refers to “web services,” it doesn’t specifically mean SOAP over HTTP, although this is one popular implementation. RESTful services available via the Web are just as relevant.

One major difference between the two types of interactions is intent. Most owners of resources that have been screen-scraped did not intend to allow their content to be repurposed. Many were, of course, probably open to others using their resources; otherwise, they probably wouldn’t have posted the content on the Internet. However, designing resources for automated consumption, rather than human consumption, requires planning ahead and implementing a different, or even parallel, infrastructure.

A classic example of the shift from screen scraping to services is Amazon.com. Amazon provides a tremendous amount of information about books in a reasonably structured (though sometimes changing) HTML format. It even contains a key piece of information, the Amazon sales rank, that isn’t available anywhere else. As a result, many developers have written programs that scrape the Amazon site.

Rather than fighting this trend, Amazon realized that it had an opportunity. Its network of Amazon Associates (people and companies that help Amazon sell goods in exchange for a commission) could use the information that others were scraping from the site. Amazon set out to build services to make it easier for its associates to get to this information—the beginning of a process that has led Amazon to offer a variety of web services that go far beyond its product information.

Most web services work falls under the Service-Oriented Architecture (SOA) pattern described in Chapter 7, Specific Patterns of Web 2.0. SOA itself doesn’t depend on the web services family of technologies and standards, nor is it limited to the enterprise realm where SOA is most ubiquitous. Web services are built on a set of standards and technologies that support programmatic sharing of information. These usually include XML as a foundation, though JSON has proven popular lately for lightweight sharing. Many web services are built using SOAP and the Web Services Description Language (WSDL), though others take a RESTful approach. Additional useful specifications include the SOAP processing model,[40] the XML Infoset[41] (the abstract model behind XML), and the OASIS Reference Model for SOA (the abstract model behind services deployed across multiple domains of ownership).

While web services and SOA are often thought of as technologies used inside of enterprises, rather than publicly on the Internet, the reality is that there is a wide spectrum of uses in both public and private environments. Open public services are typically simpler, while services used internally or for more specific purposes than information broadcast and consumption often support a richer set of capabilities. Web services now include protocol support for expressing policies, reliable messaging features, secure messaging, a security context, domains of trust, and several other key features. Web services have also spawned an industry for protocols and architectural models that make use of services such as Business Process Management (BPM), composite services, and service aggregation. The broader variety of web services standards has been documented in many other books, including Web Services Architecture and Its Specifications by Luis Felipe Cabrera and Chris Kurt (Microsoft Press).

Content Management Systems and Wikis

As the Web evolved from the playground of hobbyists to the domain commercial users, the difficulty of maintaining sites capable of displaying massive amounts of information escalated rapidly. Content management systems (CMSs) such as Vignette leaped into the gap to help companies manage their sites. While CMSs remain a common component of websites today, the model they use is often one of outward publication: a specific author or organization creates content, and that content is then published to readers (who may be able comment on it). Wikis take a different approach, using the same system to both create and publish information and thereby allowing readers to become writers and editors.

Applicable Web 2.0 Patterns

The patterns illustrated in this discussion focus on collaboration:

  • Participation-Collaboration

  • Collaborative Tagging

You can find more information on these patterns in Chapter 7, Specific Patterns of Web 2.0.

Participation and Relevance

Publishing is often a unilateral action whereby content is made available and further modifications to the content are minimal. Those who consume the content participate only as readers.

Wikis may look like ordinary websites presenting content, but the presence of an edit button indicates a fundamental change. Users can modify the content by providing comments (much like blog comments), use the content to create new works based on the content (mashups), and, in some cases, create specialized versions of the original content. Their participation gives the content wider relevancy, because collective intelligence generally provides a more balanced result than the input of one or two minds.

The phrases “web of participation” and “harnessing collective intelligence” are often used to explain Web 2.0. Imagine you owned a software company and you had user manuals for your software. If you employed a static publishing methodology, you would write the manuals and publish them based on a series of presumptions about, for example, the level of technical knowledge of your users and their semantic interpretations of certain terms (i.e., you assume they will interpret the terms the same way you did when you wrote the manuals).

A different way to publish the help manuals would be to use some form of website—not necessarily a wiki, but something enabling feedback—that lets people make posts on subjects pertaining to your software in your online user manuals. Trusting users to apply their intelligence and participate in creating a better set of software manuals can be a very useful way to build manuals full of information and other text you might never have written yourself. The collective knowledge of your experienced software users can be instrumental in helping new users of your software. For an example of this pattern in use, visit http://livedocs.adobe.com and see how Adobe Systems trusts its users to contribute to published software manuals.

Directories (Taxonomy) and Tagging (Folksonomy)

Directories are built by small groups of experts to help people find information they want. Tagging lets people create their own classifications.

Applicable Web 2.0 Patterns

The following patterns are illustrated in this discussion:

  • Participation-Collaboration

  • Collaborative Tagging

  • Declarative Living and Tag Gardening

  • Semantic Web Grounding

  • Rich User Experience

You can find more information on these patterns in Chapter 7, Specific Patterns of Web 2.0.

Supporting Dynamic Information Publishing and Finding

Directory structures create hierarchies of resource descriptions to help users navigate to the information they seek. The terms used to divide the hierarchy create a taxonomy of subjects (metadata keywords) that searchers can use as guideposts to find what they’re looking for. Library card catalogs are the classic example, though taxonomies come in many forms. Within a book, tables of contents and especially indexes often describe taxonomies.

Navigation mechanisms within websites also often describe taxonomies, with layers of menus and links in the place of tables of contents and a full-text search option in place of an index. These resources can help users within a site, but users’ larger problem on the Web has often been one of finding the site they want to visit. As the number of sites grew exponentially in the early days of the Web, the availability of an incredible amount of information was often obscured by the difficulty of finding what you wanted. The scramble for domain names turned into a gold rush and advertisers rushed to include websites in their contact information—but many people arrived on the Web looking for information on a particular subject, not a particular advertiser.

The answer, at least at the beginning, was directories. Directory creators developed taxonomic classification systems for websites, helping users find their way to roughly the right place. Online directories usually started with a classification system with around 8 to 12 top-level subjects. Each subject was further classified until the directory browser got to a level where most of the content was very specialized. The Yahoo! directory was probably the most used directory in the late 1990s, looking much like Figure 3.20, “The Yahoo! directory”. (You can still find it at http://dir.yahoo.com.)

Figure 3.20. The Yahoo! directory

The Yahoo! directory

Each category, of course, has further subcategories. Clicking on “Regional,” for example, provided users with the screen in Figure 3.21, “Subcategories under the Regional category”.

Figure 3.21. Subcategories under the Regional category

Subcategories under the Regional category

Similarly, clicking on “Countries” in the subcategory listing shown in Figure 3.21, “Subcategories under the Regional category” yielded an alphabetical list of countries, which could be further decomposed into province/state, city, community, and so on, until you reached a very small subset of specific results.

Directories have numerous problems. First and foremost, it is very difficult for a small group—even a small group of directory specialists—to develop terms and structures that readers will consistently understand. Additionally, there is the challenge of placing information in the directory. When web resource owners add pages to the Yahoo! directory, they navigate to the nodes where they think the pages belong and then add their resources from there. However, other people won’t necessarily go to the same place when looking for that content.

Say, for example, you had a rental car company based in Vancouver, British Columbia, Canada. Would you navigate to the node under Regional→Countries→Canada→Provinces→British Columbia→Cities→Vancouver, and then add your content? Or would you instead add it under Recreation & Sports→Travel→Transportation→Commuting, or perhaps Business & Economy→Shopping and Services→Automotive→Rentals? Taxonomists have solved this problem by creating polyhierarchies, where an item can be classified under more than one node in the tree. However, many Internet directories are still implemented as monohierarchies, where only one node can be used to classify any specific object. While polyhierarchies are more flexible, they can also be confusing to implement.

Another problem concerns terminology. Although terms such as “vehicles for hire” and “automobiles for lease” are equally relevant regarding your rental car company, users searching for these terms will not be led to your website. Adding non-English-speaking users to the mix presents a whole new crop of problems. Taxonomists can solve these problems too, using synonyms and other tools. It just requires an ever-greater investment in taxonomy development and infrastructure.

Hierarchical taxonomies are far from the only approach to helping users find data, however. More and more users simply perform searches. Searches work well for textual content but often turn up false matches and don’t apply easily to pictures and multimedia. As was demonstrated in our earlier discussion of Flickr, tagging offers a much more flexible approach—one that grows along with a library of content.

Sites such as Slashdot.org have implemented this type of functionality to let readers place semantic tags alongside content. Figure 3.22, “Screenshot from Slashdot.org showing user tags (the tags appear in the oval)” shows an example of the tagging beta on a typical Slashdot.org web page. The tags appear just below the article.

Figure 3.22. Screenshot from Slashdot.org showing user tags (the tags appear in the oval)

Screenshot from Slashdot.org showing user tags (the tags appear in the oval)

The most effective tagging systems are those created by lots of people who want to make it easier for themselves (rather than others) to find information. This might seem counterintuitive, but if a large number of people apply their own terms to a few items, reinforcing classification patterns emerge more rapidly than they do if a few people try to categorize a large number of items in the hopes of helping other people find them. For those who want to extract and build on folksonomies selfish tagging can be tremendously useful, because people are often willing to share their knowledge about things in return for an immediate search benefit to them.

Delicious, which acts as a gigantic bookmark store, expects its users to create tags for their own searching convenience. As items prove popular, the number of tags for those items grows and they become easier to find. It may also be useful for the content creators to provide an initial set of tags that operate primarily as seed tags—that is, a way of encouraging other users to add their own tags.

More Hints for Defining Web 2.0

Tim’s examples illustrate the foundations of Web 2.0, but that isn’t the end of the conversation. Another way to look at these concepts is through a meme (pronounced “meem”) map. A meme map is an abstract artifact for showing concepts and their relationships. These maps are, by convention, ambiguous. For example, if two concepts are connected via a line, you can’t readily determine what type of relationship exists between them in tightly defined ontological terms. Figure 3.23, “Meme map for Web 2.0” depicts the meme map for Web 2.0, as shown on the O’Reilly Radar website.

Figure 3.23. Meme map for Web 2.0

Meme map for Web 2.0

This map shows a lot of concepts and suggests that there are “aspects” and “patterns” of Web 2.0, but it doesn’t offer a single definition of Web 2.0. The logic captured in the meme map is less than absolute, yet it declares some of the core concepts inherent in Web 2.0. This meme map, along with the Web 2.0 examples discussed earlier in the chapter, was part of the conversation that yielded the patterns outlined in Chapter 7, Specific Patterns of Web 2.0. Concepts such as “Trust your users” are primary tenets of the Participation-Collaboration and Collaborative Tagging patterns. “Software that gets better the more people use it” is a key property of the Collaborative Tagging pattern (a.k.a. folksonomy). “Software above the level of a single device” is also represented with the Software as a Service and Mashup patterns.

Reductionism

Figure 3.24, “Reductionist view of Web 2.0” shows a reductionist view of Web 2.0. Reductionism holds that complex things can always be reduced to simpler, more fundamental things, and that the whole is nothing more than the sum of those simpler parts. The Web 2.0 meme map, by contrast, is a largely holistic analysis. Holism, the opposite of reductionism, says that the properties of any given system cannot be described as the mere sum of its parts.

Figure 3.24. Reductionist view of Web 2.0

Reductionist view of Web 2.0

In a small but important way, this division captures an essential aspect of the debates that surround Web 2.0 and the next generation of the Web in general: there is one set of thinkers who are attempting to explain what’s happening on the Web by exploring the fundamental precepts, and another set who seek to explain in terms of the things we’re actually seeing happen on the Web (online software as a service, self-organizing communities, Wikipedia, BitTorrent, Salesforce, Amazon Web Services, etc.). Neither view is complete, of course, though combining them could help.

In the next part of the book, we’ll delve deeper into detail, some of it more technical, and try to distill core patterns that will be applicable in a range of scenarios.



[32] See http://www.doubleclick.com/us/about_doubleclick/press_releases/default.asp?p=572.

[33] See http://money.cnn.com/magazines/business2/business2_archive/2007/03/01/8401043/index.htm.

[34] See http://www.jupiterresearch.com/bin/item.pl/research:concept/87/id=99415/.

[35] See http://www.culturejamthefilm.com.

[36] See http://en.wikipedia.org/wiki/Culture_jamming.

[37] Flickr’s tagging is effective, but it represents only one style of folksonomy: a narrow folksonomy. For more information on different styles of folksonomy, see http://www.personalinfocloud.com/2005/02/explaining_and_.html.

[38] Cloud computing, in which developers trust their programs to run as services on others’ hardware, may seem like a return to centralization (“All those programs run on Amazon S3 and EC2....”). The story is more complicated than that, however, as cloud computing providers have the opportunity to give their customers the illusion of centralization and the easy configuration that comes with it, while supporting a decentralized infrastructure underneath.

[39] See http://technoracle.blogspot.com.

[40] See http://www.w3.org/TR/soap12-part1/#msgexchngmdl.

[41] See http://www.w3.org/TR/xml-infoset/.

Web 2.0 Architectures book cover

This excerpt is from Web 2.0 Architectures. This fascinating book puts substance behind Web 2.0. Using several high-profile Web 2.0 companies as examples, authors Duane Nickull, Dion Hinchcliffe, and James Governor have distilled the core patterns of Web 2.0 coupled with an abstract model and reference architecture. The result is a base of knowledge that developers, business people, futurists, and entrepreneurs can understand and use as a source of ideas and inspiration.

buy button

Copyright © 2009 O'Reilly Media, Inc.