WPC  2BLJ Z|xTimes New Roman (TT)CourierArial (TT)TimesHelveticaCourier New (TT)C\  P6QPd6X@C@J2PQP*C\  PUP0`2PkCP:d6X@DQ@2?&ephoenix#C\  P6QP# Glossary Ian GrahamIan Graham92t8 heading 1heading 1C9#W\  P6QP# #C\  P6QP#heading 2heading 2C9#XP\  P6QXP# #C\  P6QP#heading 3heading 3F<#&J\  P6Q&P#  #C\  P6QP#Default Paragraph FoDefault Paragraph Font11#XP\  P6QXP##C\  P6QP#2v$.@ 4n tt toc 3toc 3` hp x (#4 ! 4 ! ` hp x (#toc 2toc 2  ` hp x (# !  ! ` hp x (#toc 1toc 1  ` hp x (# !T$   !T$ ` hp x (#2P   v footerfooter ` hp x (#!!` hp x (#headerheader ` hp x (#!!` hp x (#Normal IndentNormal Indent  coco >4#XP\  P6QXP##C\  P6QP#2X R:cncn A7#XP\  P6QXP#  #C\  P6QP#ctct>4#^\  P6QP# #C\  P6QP#nlnl` hp x (#4 <DL!4 <DL!` hp x (#blbl` hp x (#4 <DL!4 <DL!` hp x (#2vv:ll pp;1#b6X@C@##C\  P6QP#dd incode 2incode 277#I2PQP#  #C\  P6QP#22fp2p211#d6X@DQ@##C\  P6QP#variablevariable44#XP\  P6QXP##C\  P6QP#page numberpage number11#XP\  P6QXP##C\  P6QP#TCTC11#XP\  P6QXP##C\  P6QP# L    header` hp x (#!#;2PQP#The HTML Sourcebook Supporting Documents`Internet/Web Glossary Copyright  1997 Ian S. Graham#C\  P6QP#  header!4 <DL!8 footer` hp x (#!!! #g2PkCP#  footerpage number#XP\  P6QXP##XX2PQXP#  "page number"#C\  P6QP#  footer! 4 <DL!8 footer` hp x (#!!! page number#XP\  P6QXP##XX2PQXP#  "page number"#C\  P6QP#  footer! 4 <DL!t` hp x (#4 <DL!#2PQP# A Glossary of Internet and Web Terminology S   LLL #y\  P6QP#by #2PQP#Ian Graham 4 <DL!4 <DL!#C\  P6QP# S  t cn#XP\  P6QXP# 4 <DL!4 <DL!#y\  P6QP# Copyright  1997 Ian S. Graham    cnt4 <DL!4 <DL!#C\  P6QP# S  4 <DL! <DL!agentA program that can travel over the Internet and access remote resources, on behalf of a user. A proper agent should be able to run on remote machines and travel freely from machine to machine. The Java language shows promise for permitting safe agents, since the Java interpreter does not let Java applets harm the computers they contact. anchorThe location of a hypertext link in a document. An anchor can be either the start of a hypertext link or the destination of a hypertext link. anonymous FTPComputers can run an anonymous FTP server, which lets anyone log in to the computer under the username tincode 2#I2PQP# anonymous incode 2#C\  P6QP#, and access public resources. In general, when you log in as user tincode 2#I2PQP# anonymous incode 2#C\  P6QP#, you (or your browser) use your email address for the password string. appletA program or miniapplication that can be downloaded over a network and activated on the user's computer. To do this safely, you must have a secure way of running applets. The Java language is designed to support safe applets. archieA system that automatically generates and maintains a database for anonymous FTP servers' contents. An archie server accesses information from FTP servers and archives the directory listings. An archie client can access these databases and search for programs or files matching a particular name. archive fileA single file that contains a collection of different files and/or directories. Archive files are often used to transport collections of files across the Internet, since you can transport a large collection in a single archive file. UNIX archives have the extension tincode 2#I2PQP# .tar incode 2#C\  P6QP# (for T ape AR chive). PKZIP is often used to create archives on DOS computers (suffix tincode 2#I2PQP# .zip incode 2#C\  P6QP#), while Stuffit is often used to create Macintosh archives (suffixes tincode 2#I2PQP# .sea incode 2#C\  P6QP# or tincode 2#I2PQP# .sit incode 2#C\  P6QP#). PKZIP and Stuffit archives are also compressed. ASCII A merican S tandard C ode for I nformation I nterchange. This is a 7bit character code capable of representing 128 characters. Several of these characters are special control characters used in communications control, and are not printable. attributeA quantity that defines a special property of an HTML element. Attributes are specified within an element start tag. For example, tp2#d6X@DQ@#p2#C\  P6QP# means that the element IMG has an attribute SRC , which is assigned the indicated value. browserAny program used to view material prepared for the World Wide Web. Mosaic, Netscape, and lynx are some examples. Browsers are able to interpret URLs and HTML markup and also understand Internet protocols such as HTTP, FTP, and Gopher. CERNCentre Europ)en pour la R)cherche Nucleaire; a large particleaccelerator laboratory located near Geneva, on the FrenchSwiss border. The World Wide Web originated here, largely due to the efforts of Tim BernersLee. CGI C ommon G ateway I nterface, the specification for how an HTTP server should communicate with server gateway programs. Most servers support the CGI specification. character referenceA way, within an SGML language such as HTML, of referencing a character using a simple string of numbers reflecting the position of the character in the current character set. For example, the character reference tp2#d6X@DQ@#ép2#C\  P6QP# is the reference for e with an acute accent ()) within the ISO Latin1 character set. Note, however, that character reference will produce a different character if a different character set is involved. For a characterset independent representation, see entity reference. CJK  C hinese/ J apanese/ K orean, often used in the discussion of character sets and of the issues important to these language/character set groups. client Any program used to extract information from a server. For example, a browser such as Mosaic is a client that can access data from HTTP (and other) servers. All browsers are Web clients. compressedMany files on the Internet are compressedthis reduces the space taken up by a file and makes transmission over the Internet faster. The client must then have software able to decompress the file. cookieA small quantity of data exchanged between (and then stored on) a client and a server, and usually hidden from the user. An example is the Netscape cookie mechanism, discussed in Chapters 7 and 8. CRLFThe combination of carriagereturn ( CR ) and linefeed ( LF ) characters. This combination is used by several Internet protocols, including HTTP, to denote the end of a line. CSO C omputing S ervices O ffice, a system that lets users search for student and/or faculty names at a school or university. It is one approach at creating "white pages" for Internet email addresses. CWIS C ampus W ide I nformation S ystems, electronic systems for distributing campus information, which first became common with universitybased Gopher servers. dialup connectionThe action of using a telephone and modem to connect to a remote computer. Dialup connections are slow compared with direct connections, or ISDN. domain nameA symbolic name for a computer, that can be translated by a nameserver into a computers formal numeric Internet address (IP address). Domain names let users reference Internet sites without having to know the numerical address. download Transfer of a file from a remote computer to a local computer. DTD  D ocument T ype D efinition. An SGML document type definition is a specific description of a markup language. This description is written as a plain text file, often with the filename extension tincode 2#I2PQP# .dtd incode 2#C\  P6QP#. The HyperText Markup Language (HTML) has its own Document Type Definition file, often called tincode 2#I2PQP# html.dtd incode 2#C\  P6QP#. email`Electronic mail. element (HTML) The basic unit of an HTML document. HTML documents use start and stop tags to define structural elements in the document. These elements are arranged hierarchically, to define the overall document structure. The name of the element is given by the tag, and indicates the meaning associated with the block. Some elements are empty, since they don't affect a block of text. Elements that have content are also often called containers. end tagA markup tag that denotes the end of an element. entity referenceA way, within an SGML language such as HTML, of referencing a character using a simple string of ASCII characters. For example, the entity reference tp2#d6X@DQ@#ép2#C\  P6QP# is the reference for e with an acute accent ()). See also character reference. FAQ F requently A sked Q uestions, on the Internet, a FAQ is a document that answers the most frequently asked questions on a particular topic. Most newsgroups have FAQs that are frequently posted to the newsgroup. firewallA firewall is used to separate a local network from the outside world. In general a local network is connected to the outside world by a "gateway" computer. This gateway machine can be converted into a firewall by installing special software that does not let unauthorized TCP/IP packets pass from inside to outside and vice versa. You can give users on the local network, and "inside" the firewall, access to the outside world using the SOCKS package or by installing the a proxy server on the firewall machine. fragment identifierA text string included using a NAME attribute in an A element, that labels the anchored location in a documentthus the word fragment, since it references a document fragment. FTP F ile T ransfer P rotocol, an Internet clientserver protocol for transferring files between computers. GIF G raphics I nterchange F ormat, a format for storing image files. It is the most common format for inline images in HTML documents. The other common format is JPEG. Gopher A protocol for information delivery used in distributed information systems. Gopher clients give you access to this information. Gopher is a menubased delivery system and does not have hypertext capabilities. Gopher has been largely supplanted by HTTP. headerThe leading part of a data message. HTTP messages are sent with an HTTP header preceding the actual communicated data. helperA program launched or used by a browser (such as Netscape Navigator) to process files that the browser cannot handle internally. Most users have helper applications to play sound or movie files, to uncompress compressed files or to unstuff archives. hitsIn database searches, the number of documents that resulted from the search; for servers, the number of document requests received by a server. home page The introductory page for a World Wide Web site. A home page usually provides an introduction to the site, along with hypertext links to local resources. HTMLHyperText Markup Language, a markup language defined by an SGML Document Type Definition (DTD). To a document writer, HTML is simply a collection of tags used to mark blocks of text and assign them special meanings. hyperlinkSee hypertext link. hypertextAny document that contains hypertext links to other documents. HTML documents are almost always hypertext documents. hypertext linkA hypertext relationship between two anchors, leading from the head anchor to the tail anchor. On the Web, this is usually a link from one hypertext document to another. Lining points are associated with anchors. inline imageAn image that is merged with the displayed text. Placing images in this manner is often described as "inlining" the images. I18NSee internationalization.  D  <DL! <DLIANA I nternet A ssigned N aming A uthority, the agency which registers name for common use on the Internet. General information is found at: tp2#d6X@DQ@#ftp://ftp.isi.edu/innotes/iana/p2#C\  P6QP#  D  <DL <DL!IETF I nternet E ngineering T ask F orce, a collection of task forces at work on developing standards for Internet protocols and architectures. There are IETF groups working on such issues as URLs, HTTP, and HTML. InternetYou mean you don't know? The Internet is the world wide network of computers communicating via the TCP/IP protocols. InternationalizationSoftware development aimed at providing software that can serve a multilingual, internationalized audience. Often abbreviated as I18N ( I nternational I z 8 tio n ). Internet providerA company from whom users purchase Internet connectivity. This could either be a dedicated connection (for example, a telephone connection that stays open twentyfour hours a day) or a dialup connection. Usually users run software such as PPP or SLIP to allow Internet connectivity across the line. Internet resourcesThe collection of data, documents, and databases available on the Internet. IntranetAn Intranet is a collection of services that use an Internet as the underlying communications technology, designed to support business operations and applications. Basically just another buzzword, like enterprise computing, and missioncritical applications. IP AddressThe numerical Internet protocol address of a computer on the Internet. Every computer on the Internet has a unique numerical address. ISO I nternational S tandards O rganization, an international organization responsible for setting international standards, such as the ISO Latin1 character set. ISO 10646A multibyte character set proposed by the ISO as a universal character set for the characters and symbols used by all the world's languages. The most important 2byte subset of this language, know as the basic multilingual plane, is equivalent to the Unicode character set. ISO Latin1 An 8bit character code developed by the International Standards Organization. An 8bit code contains 256 different characters. In the ISO Latin1 code, the first 128 characters are the equivalent to the 128 characters of the USASCII character set (also called the ISO 646 character set). The remaining 128 characters consist of control characters and a large collection of accented and other characters commonly used in European languages. JavaA programming language, developed by Sun Microsystems, designed specifically for use in applet and agent applications. Java programs can only run under a Java interpreter, which is designed to eliminate the risk of a rogue Java applet damaging the local computer. JavascriptA scripting language developed by Netscape Inc. Javascript program listings can be included within an HTML document, and are then executed by the Web browser when the document is loaded. A similar scripting language, known as VBScript, has been developed by Microsoft. JPEG J oint P hotographic E xperts G roup, an image format. In general JPEG allows for higher quality images than GIF. Browsers cannot display JPEG images inline, and instead must display them using helper programs. kerberosA network authentication system, based on the key distribution model. It allows machines communicating over networks to prove their identity to each other through a trusted third party. It also prevents eavesdropping or replay attacks (recording and retrying encryption information "snooped" off the network), through support for a variety of data encryption schemes. LAN L ocal A rea N etwork. linkSee hypertext link. linux A freeware clone of UNIX for 386based PC computers. Linux consists of the linux kernel (core operating system), originally written by Linus Torvalds, along with utility programs developed by the Free Software Foundation and by others. Since PC hardware is inexpensive and linux is essentially free the combination of the two is a practical way of developing inexpensive and reliable HTTP service. ListservAn automated electronic mailing list, managed by a listserv program. Listservs are commonly used by discussion groups. Lynx A popular charactermode (textonly) World Wide Web browser. MIME M ultipurpose I nternet M ail E xtensions, a scheme that lets electronic mail messages contain mixed media (sound, video, image, and text). The World Wide Web uses MIME contenttypes to specify the type of data contained in a file or being sent from an HTTP server to a client. MosaicA graphical browser for the World Wide Web, developed at NCSA. There are several commercial browsers based on Mosaic. MPEG M otion P icture E xperts G roup, a common video file compression method. multimediaA mixture of mediatext, audio, and video, under the control of a computer. The World Wide Web is a form of multimedia. name tokenIn SGML, this is a character string composed of the ASCII letters tp2#d6X@DQ@#azp2#C\  P6QP# or tp2#d6X@DQ@#AZp2#C\  P6QP#, the numerals, tp2#d6X@DQ@#09p2#C\  P6QP#, a dash (tp2#d6X@DQ@#۩p2#C\  P6QP#) or a period (tp2#d6X@DQ@#.p2#C\  P6QP#), and that must begin with a letter. Name tokens are usually caseinsensitive. Many attribute values are defined as name tokens. nameserverA computer (and a program on the computer) that translates domain names into the proper numeric IP address (or vice versa). NCSA  N ational C enter for S upercomputing A pplications. The NCSA is situated at the UrbanaChampaign campus of the University of Illinois. The NCSA software development team developed the Mosaic and NCSA HTTPD server programs. NNTP N etwork N ews T ransfer P rotocol, used for communicating USENET articles across the Internet. packet A small package of data. TCP/IP breaks messages up into packets, and sends each packet independently to the message destination. The protocol ensures that there is no error in transmission and that the entire message arrives. Partial URLA location scheme containing only partial information about the resource location. To access the resource, the client must construct a full URL, based on the partial URL. It does so by assuming that all the information not found in the partial URL is the same as that used when the client accessed the document containing the partial URL reference. A partial URL is often called a relative URL, since the location of the linked resource is determined relative to the location of the document containing the partial URL. PEM P rivacy E nhanced M ail, a special mail protocol that provides encryption of mail message content. perl P ractical E xtraction and R eporting L anguage, a scripting language created by Larry Wall. Because powerful data and text manipulation programs can be written quickly and easily using perl, it has become a popular language for writing CGI applications. PGP P retty G ood P rivacy, a publicly available encryption scheme that uses the "public key" approachmessages are encrypted using a "public" key, but can only be decrypted by a "private" key, retained by the intended recipient of the message. pluginA program module that adds inline functionality to a Web browser (or, in general, any other program). On the Web, plugins let Web browsers display data such as VRML scenes, realtime video, or multimedia data inline with the HTML document. Plugins, when available, are accessed through HTML EMBED or OBJECT elements. port numberAny Internet application communicates at a particular port number specific to the application. For example FTP, HTTP, Gopher and telnet are all assigned unique port numbers so that the computer knows what to do when contacted at a particular port. There are accepted standard numbers for these ports so that computers know which port to connect to for a particular service. For example, Gopher servers generally "talk" at port 70, while HTTP servers generally "talk" at port 80. These default values can be overridden in a URL. PPP  P oint to P oint P rotocol, a communications protocol that turns a dialup telephone connection into a pointtopoint Internet connection. This is commonly used to run WWW browsers over a phone line. providerSee Internet provider. protocolIn computer networks, a protocol is simply an agreed convention for intercomputer communication. Thus the TCP/IP protocol defines how messages are passed on the Internet, while the FTP protocol, which is built using the TCP/IP protocol, defines how FTP messages should be sent and received. proxy serverA server that acts as an intermediary between a user's computer and the computer they want to access. If a user makes a request for a resource from computer "A," this request is directed to the proxy server, which makes the request, gets the response from computer "A," and then forwards the response to the client. Proxy servers are useful for accessing World Wide Web resources from inside a firewall. RFC R equest F or C omments, is a document, written by groups or individuals involved in Internet development, that describes agreedupon standards or proposes new standards for Internet protocols. For example, the rules for electronic mail message composition are specified in the document RFC 822. robotsOn the World Wide Web, a program that autonomously searches through trees of hypertext documents, retrieving files for indexing (or other purposes). Also called a worm. routerA computer that determines, on a local basis, which route packets will take en route to their destination. RSAA common, commercial publickey encryption technology, owned by RSA Data Security Inc. RSA Inc. also holds several patents on publickey encryption in general, so that popular publicly available encryption tools, such as PGP and PEM, infringe on RSA patents. PGP and PEM therefore cannot be used in commercial products without licensing approval of RSA Inc. serverA program, running on a networked computer, that responds to requests from client programs running on other networked computers. The server and client communicate using a clientserver protocol. SGML S tandard G eneralized M arkup L anguage, is a standard for describing markup languages. HTML is defined as an instance of SGML. shellThe UNIX shell is the program that interprets the commands typed at the terminal. A shell can also be used to run simple script programs called shell scripts. There are several different shells, with slightly different commands and syntax. The most common are the Bourne shell ( sh ), the C shell ( csh ), and the Korn dhell ( ksh ). The DOS commandline interpreter can be thought of as a shell. SLIP S erial L ine I nternet P rotocol, a communications protocol that that can turn a dialup telephone connection into an Internet connection. SLIP can be used to run Web browsers over a phone line, but is less stable than a PPP connection. SMTP S imple M ail T ransfer P rotocol, the standard by which electronic mail messages are communicated over the Internet. SOCKSA software package that allows hosts inside a firewall to communicate with the outside world. To allow access to the outside world a secure network can run a SOCKS server on its gateway/firewall machine; all networking software inside the network must be configured to talk to the SOCKS server. SOCKS is a proxy server without the special caching capabilities of a caching HTTP proxy server. SSL S ecure S ockets L ayer, is a technology developed by Netscape Communications Inc. for encrypting data sent between clients and servers. SSL is the basis for Netscape's secure communication technologies. start tag (HTML)A markup tag that denotes the start of an element. tag (HTML)HTML marks documents using tags. A tag is simply typed text surrounded by the less than and greater than signs, for example:tp2#d6X@DQ@# .p2#C\  P6QP# An end tag has a slash in front of the tag name; for example tp2#d6X@DQ@#p2#C\  P6QP#. tar T ape ar chiver, a program (and file format) commonly used on UNIX systems for archiving and transporting large collections of files and/or directories. TCP/IP T ransmission C ontrol P rotocol/ I nternet P rotocol, the basic communication protocol that is the foundation of the Internet. All the other protocols, such as HTTP, FTP, and Gopher, are built on top of TCP/IP. telnetA terminal emulation protocol that allows you to make a terminal connection to other computers on the Internet. This requires that you run a telnet client on your computer and connect to a telnet server on the other machine. TIA T he I nternet A dapter, a program run under a dialin UNIX account that supports SLIPlike connection between a dialup computer and the dialin site. TIA is useful if you have a UNIX account with a company that does not provide PPP or SLIP service. TIFF T ag I mage F ile F ormat, a graphic file format developed by Aldus Corporation. TIFF is the standard format of many graphics and desktop publishing programs. tn3270A variant of telnet that emulates the behavior of IBM model 3270 display terminals. UnicodeA 2byte character set, developed as a universal character set for international use. The current 2 version of Unicode is equivalent to the basic multilingual plane subset of the ISO 10646 character set. Internationalized HTML uses Unicode as its base character set. UNIXAn operating system, commonly used on the backbone machines on the Internet. Most Web servers are run under the UNIX operating system. URC U niform R esource C haracteristics, is an asyet unspecified format for representing aggregate information about a resource or collection of resources. URI U niform R esource I dentifier, the generic term for a coded string that identifies a (typically Internet) resource. There are currently two practical examples of URI's, namely Uniform Resource Locators (URLs) and partial URLs. URL U niform R esource L ocator, the scheme used to address Internet resources on the World Wide Web. A URL specifies the protocol, domain name/IP address, port number, path, and resource details needed to access a resource from a particular machine. Partial URLs are an associated scheme that specify a location relative to the location of a document or resource containing the URL reference. URN U niform R esource N ames, are as yet defined, but are the holy grail of addressing, as any file would retain the same URN, regardless of which computer the file resided on. URNs would be universal identifiers for Internet resources, regardless of the resource origins. USENETThe Internet's world wide bulletin board system, consisting of over 6,000 topical discussion groups, called newsgroups. The newsgroups related to the World Wide Web were mentioned at the end of Chapter 1. USENET postings are distributed around the world using the NNTP protocol. VBScriptA document scripting language developed by Microsoft, also known as Visual Basic Script. See Javascript for a description of document scripting languages. viewerA program launched by a browser to view files that the browser cannot handle internally and that are accessed by standard hypertext anchors. Thus you have viewers for JPEG images, sound files, and MPEG movies. Viewers are also often called helpers or helper applications. Viewers are distinct from plugins, since they work separately from the browser. visitWhen you access a World Wide Web document you are said to be visiting the site. W3CWorld Wide Web Consortium, an academic and industrial consortium devoted to the development of Web standards and technologies. whitespaceAny combination of space or tab characters that separate two characters or two character strings. WAIS W ide A rea I nformation S ervers, a system and protocol for Internet accessible databases. The WAIS protocol is based on the Z39.50 protocol. WAN W ide A rea N etwork. wormsA computer program that can make copies of itself. Alternatively, on the Web, a worm is synonymous with a robot. WWWThe World Wide Web. Also called the Web or W3. Z39.50A protocol for communicating search information and search results, allowing remote searching of databases. Many library systems support the Z39.50 protocol.