Internet Systems and Programming - User Agents

User Agents

Various programs were used on the Internet before the WWW, using protocols for email (e.g., SMTP), ftp, gopher, wais, nntp, telnet, rsh, ssh, finger, whois, snmp, and irc. The conventions for the WWW were made general enough to incorporate many of these protocols. "User agent" is the generic WWW term for programs that retrieve WWW content.

The first programs to access WWW files were called browsers. Browsers function as clients with web servers. The browser makes a request to a server for a document, then the server delivers the document to the browser which renders it for display. The first widely successful browser was Mosaic, developed by Marc Andreessen and others at the National Center for Supercomputer Applications (NCSA), at the University of Illinois. Since then, browsers have been expanded to include related software for processing email, authoring HTML files, and other purposes and other user-agent programs have been developed to access WWW files.

The most popular commercial browser today is Microsoft's Internet Explorer. Microsoft was found guilty of illegal monopolistic practices in winning browser market share from Netscape's Navigator and Communicator products. AOL transferred the Netscape software code to the non-profit Mozilla Foundation, which recently released the Firefox browser and Thunderbird email client. Other browsers include Opera and Macintosh Safari. Mozilla and Opera recently announced plans for "user agents with initial implementations of jointly-developed specifications" to counter "a rising threat of single-vendor solutions."

The w3schools website has an excellent resource about browsers, including some valuable statistics for developers of websites:

Microsoft IE 6 is the most popular browser, with over 65% share. Microsoft's share (including IE 5) is down to less than 70% from a high of 88% in 2003.
Mozilla's share, including Firefox, is up to 23%. Other browsers, including Opera and Netscape Navigator have less than 2% share.
Microsoft's OS share is about 90%, with XP holding 60% share. Linux and Mac about 3% each.
More than 1/3 computers have display resolution of only 800x600 (or less) and more than 1/3 have color depth of 16 bits or less.
Approximately 90% of computers have JavaScript on.

User-agents have been developed for purposes other than traditional browsing from a computer. For example, robotic programs are used to collect information from the WWW and there are user-agents for cell phones and audio interfaces (e.g., to serve vision-impaired users). Some predict that household applicances and wearable computers soon will be commonly accessing the WWW.

Browser Features

Modern browsers offer many features related to browsing, including:

visual presentation of HTML documents
interpretation of stylistic elements (e.g., CSS or XSL)
visual presentation of other formats including PDF, XML, and MathML
interpretation of programming elements (e.g., JavaScript or Java applets)
access histories that allow forward and backward movement
favorite links organized in lists and folders
alternate views of data (e.g., the document source code)
customizable elements (e.g., default font)
operations such as saving or mailing documents

Browsers frequently include support for operations beyond viewing (or rendering) documents. For example, browsers may support ftp, newsgroup reading, email, chatting, messaging, and group meetings.

URLs, URNs, and URIs

Documents (or, more generally, resources) on the WWW are uniquely identified by a short character string called a Uniform Resource Identifier or URI. The URI convention is intended to provide a standard semantic form (i.e., uniform) for naming (i.e., giving an identifier) for instances of various types of resources. The original convention was called Uniform Resource Locator or URL. This convention was expanded to include Uniform Resource Names or URNs for which there is an institutional commitment to persistence and availability, independent of location. A URI is a locator (i.e., a URL), a name (i.e., a URN), or both. For a discussion of the terminology, see the W3C pages on Naming and Addressing: URIs, URLs, ... and URIs, URLs, and URNs: Clarifications and Recommendations.

Various operations can be performed on a resource (depending on its type), such as access, update, replace, etc.

Here are a few example URIs:

http://cse.unl.edu/~reich/inet
mailto:reich@cse.unl.edu
ftp://ftp.cse.unl.edu

The URI syntax (and URN syntax) specifics can be retrieved from the W3C site, but the general approach is:

<scheme>:<scheme-specific-part>

where the scheme-specific-part depends on the scheme being used. Commonly, schemes use the following approach:

<scheme>://<authority><path>?<query>

So, in the URI inet, the scheme is "http" for Hyper-Text Transfer Protocol, the authority is "cse.unl.edu" which is a server identified by its fully qualified host name (FQHN), and the path is "/~reich/inet" indicating the file location on the server.

A FQHN is either a fully qualified domain name (FQDN), that is a completely specified domain name ending in a top-level domain (TLD) such as com or edu, or a numeric Internet Protocol (IP) address, that is a 32-bit value commonly written as four octets, e.g., 129.93.165.2. Domain names are translated to IP addresses by name servers, which implement the Domain Name System (DNS). There is a hierarchy of name servers for the Internet. Both domain names and IP addresses are unique (with some exceptions), allowing packets to be routed across the Internet to the proper recipient. A program on cse named nslookup can be used to interogate a name server.