What is the “internet”

The Internet is a global system of interconnected computer networks that use the standard Internet Protocol Suite (TCP/IP) to serve billions of users worldwide. It is a network of networks that consists of millions of private, public, academic, business, and government networks, of local to global scope, that are linked by a broad array of electronic, wireless and optical networking technologies. The Internet carries a vast range of information resources and services, such as the inter-linked hypertext documents of the World Wide Web (WWW) and the infrastructure to support electronic mail.

Most traditional communications media including telephone, music, film, and television are reshaped or redefined by the Internet, giving birth to new services such as Voice over Internet Protocol (VoIP) and IPTV. Newspaper, book and other print publishing are adapting to Web site technology, or are reshaped into blogging and web feeds. The Internet has enabled or accelerated new forms of human interactions through instant messaging, Internet forums, and social networking. Online shopping has boomed both for major retail outlets and small artisans and traders. Business-to-business and financial services on the Internet affect supply chains across entire industries.

The origins of the Internet reach back to research of the 1960s, commissioned by the United States government in collaboration with private commercial interests to build robust, fault-tolerant, and distributed computer networks. The funding of a new U.S. backbone by the National Science Foundation in the 1980s, as well as private funding for other commercial backbones, led to worldwide participation in the development of new networking technologies, and the merger of many networks. The commercialization of what was by the 1990s an international network resulted in its popularization and incorporation into virtually every aspect of modern human life. As of 2009, an estimated quarter of Earth's population used the services of the Internet.

The Internet has no centralized governance in either technological implementation or policies for access and usage; each constituent network sets its own standards. Only the overreaching definitions of the two principal name spaces in the Internet, the Internet Protocol address space and the Domain Name System, are directed by a maintainer organization, the Internet Corporation for Assigned Names and Numbers (ICANN). The technical underpinning and standardization of the core protocols (IPv4 and IPv6) is an activity of the Internet Engineering Task Force (IETF), a non-profit organization of loosely affiliated international participants that anyone may associate with by contributing technical expertise.

Terminology

Internet is a short form of the technical term internetwork,^[1] the result of interconnecting computer networks with special gateways or routers. The Internet is also often referred to as the Net.

The term the Internet, when referring to the entire global system of IP networks, has been treated as a proper noun and written with an initial capital letter. In the media and popular culture a trend has also developed to regard it as a generic term or common noun and thus write it as "the internet", without capitalization. Some guides specify that the word should be capitalized as a noun but not capitalized as an adjective.

Depiction of the Internet as a cloud in network diagrams

The terms Internet and World Wide Web are often used in everyday speech without much distinction. However, the Internet and the World Wide Web are not one and the same. The Internet is a global data communications system. It is a hardware and software infrastructure that provides connectivity between computers. In contrast, the Web is one of the services communicated via the Internet. It is a collection of interconnected documents and other resources, linked by hyperlinks and URLs.

In many technical illustrations when the precise location or interrelation of Internet resources is not important, extended networks such as the Internet are often depicted as a cloud. The verbal image has been formalized in the newer concept of cloud computing.

Histroy of the internet

The USSR's launch of Sputnik spurred the United States to create the Advanced Research Projects Agency (ARPA or DARPA) in February 1958 to regain a technological lead. ARPA created the Information Processing Technology Office (IPTO) to further the research of the Semi Automatic Ground Environment (SAGE) program, which had networked country-wide radar systems together for the first time. The IPTO's purpose was to find ways to address the US military's concern about survivability of their communications networks, and as a first step interconnect their computers at the Pentagon, Cheyenne Mountain, and Strategic Air Command headquarters (SAC). J. C. R. Licklider, a promoter of universal networking, was selected to head the IPTO. Licklider moved from the Psycho-Acoustic Laboratory at Harvard University to MIT in 1950, after becoming interested in information technology. At MIT, he served on a committee that established Lincoln Laboratory and worked on the SAGE project. In 1957 he became a Vice President at BBN, where he bought the first production PDP-1 computer and conducted the first public demonstration of time-sharing.

Professor Leonard Kleinrock with one of the first ARPANET Interface Message Processors at UCLA

At the IPTO, Licklider's successor Ivan Sutherland in 1965 got Lawrence Roberts to start a project to make a network, and Roberts based the technology on the work of Paul Baran,^[6] who had written an exhaustive study for the United States Air Force that recommended packet switching (opposed to circuit switching) to achieve better network robustness and disaster survivability. Roberts had worked at the MIT Lincoln Laboratory originally established to work on the design of the SAGE system. UCLA professor Leonard Kleinrock had provided the theoretical foundations for packet networks in 1962, and later, in the 1970s, for hierarchical routing, concepts which have been the underpinning of the development towards today's Internet.

Sutherland's successor Robert Taylor convinced Roberts to build on his early packet switching successes and come and be the IPTO Chief Scientist. Once there, Roberts prepared a report called Resource Sharing Computer Networks which was approved by Taylor in June 1968 and laid the foundation for the launch of the working ARPANET the following year.

After much work, the first two nodes of what would become the ARPANET were interconnected between Kleinrock's Network Measurement Center at the UCLA's School of Engineering and Applied Science and Douglas Engelbart's NLS system at SRI International (SRI) in Menlo Park, California, on 29 October 1969. The third site on the ARPANET was the Culler-Fried Interactive Mathematics center at the University of California at Santa Barbara, and the fourth was the University of Utah Graphics Department. In an early sign of future growth, there were already fifteen sites connected to the young ARPANET by the end of 1971.

The ARPANET was origin of today's Internet. In an independent development, Donald Davies at the UK National Physical Laboratory developed the concept of packet switching in the early 1960s, first giving a talk on the subject in 1965, after which the teams in the new field from two sides of the Atlantic ocean first became acquainted. It was actually Davies' coinage of the wording packet and packet switching that was adopted as the standard terminology. Davies also built a packet-switched network in the UK, called the Mark I in 1970.^[7] Bolt Beranek and Newman (BBN), the private contractors for ARPANET, set out to create a separate commercial version after establishing "value added carriers" was legalized in the U.S.^[8] The network they established was called Telenet and began operation in 1975, installing free public dial-up access in cities throughout the U.S. Telenet was the first packet-switching network open to the general public.

Following the demonstration that packet switching worked on the ARPANET, the British Post Office, Telenet, DATAPAC and TRANSPAC collaborated to create the first international packet-switched network service. In the UK, this was referred to as the International Packet Switched Service (IPSS), in 1978. The collection of X.25-based networks grew from Europe and the US to cover Canada, Hong Kong and Australia by 1981. The X.25 packet switching standard was developed in the CCITT (now called ITU-T) around 1976.

A plaque commemorating the birth of the Internet at Stanford University

X.25 was independent of the TCP/IP protocols that arose from the experimental work of DARPA on the ARPANET, Packet Radio Net and Packet Satellite Net during the same time period.

The early ARPANET ran on the Network Control Program (NCP), implementing the host-to-host connectivity and switching layers of the protocol stack, designed and first implemented in December 1970 by a team called the Network Working Group (NWG) led by Steve Crocker. To respond to the network's rapid growth as more and more locations connected, Vinton Cerf and Robert Kahn developed the first description of the now widely used TCP protocols during 1973 and published a paper on the subject in May 1974. Use of the term "Internet" to describe a single global TCP/IP network originated in December 1974 with the publication of RFC 675, the first full specification of TCP that was written by Vinton Cerf, Yogen Dalal and Carl Sunshine, then at Stanford University. During the next nine years, work proceeded to refine the protocols and to implement them on a wide range of operating systems. The first TCP/IP-based wide-area network was operational by 1 January 1983 when all hosts on the ARPANET were switched over from the older NCP protocols. In 1985, the United States' National Science Foundation (NSF) commissioned the construction of the NSFNET, a university 56 kilobit/second network backbone using computers called "fuzzballs" by their inventor, David L. Mills. The following year, NSF sponsored the conversion to a higher-speed 1.5 megabit/second network. A key decision to use the DARPA TCP/IP protocols was made by Dennis Jennings, then in charge of the Supercomputer program at NSF.

The opening of the NSFNET to other networks began in 1988. The US Federal Networking Council approved the interconnection of the NSFNET to the commercial MCI Mail system in that year and the link was made in the summer of 1989. Other commercial electronic mail services were soon connected, including OnTyme, Telemail and Compuserve. In that same year, three commercial Internet service providers (ISPs) began operations: UUNET, PSINet, and CERFNET. Important, separate networks that offered gateways into, then later merged with, the Internet include Usenet and BITNET. Various other commercial and educational networks, such as Telenet (by that time renamed to Sprintnet), Tymnet, Compuserve and JANET were interconnected with the growing Internet in the 1980s as the TCP/IP protocol became increasingly popular. The adaptability of TCP/IP to existing communication networks allowed for rapid growth. The open availability of the specifications and reference code permitted commercial vendors to build interoperable network components, such as routers, making standardized network gear available from many companies. This aided in the rapid growth of the Internet and the proliferation of local-area networking. It seeded the widespread implementation and rigorous standardization of TCP/IP on UNIX and virtually every other common operating system.

This NeXT Computer was used by Sir Tim Berners-Lee at CERN and became the world's first Web server.

Although the basic applications and guidelines that make the Internet possible had existed for almost two decades, the network did not gain a public face until the 1990s. On 6 August 1991, CERN, a pan-European organization for particle research, publicized the new World Wide Web project. The Web was invented by British scientist Tim Berners-Lee in 1989. An early popular web browser was ViolaWWW, patterned after HyperCard and built using the X Window System. It was eventually replaced in popularity by the Mosaic web browser. In 1993, the National Center for Supercomputing Applications at the University of Illinois released version 1.0 of Mosaic, and by late 1994 there was growing public interest in the previously academic, technical Internet. By 1996 usage of the word Internet had become commonplace, and consequently, so had its use as a synecdoche in reference to the World Wide Web.

Meanwhile, over the course of the decade, the Internet successfully accommodated the majority of previously existing public computer networks (although some networks, such as FidoNet, have remained separate). During the late 1990s, it was estimated that traffic on the public Internet grew by 100 percent per year, while the mean annual growth in the number of Internet users was thought to be between 20% and 50%. This growth is often attributed to the lack of central administration, which allows organic growth of the network, as well as the non-proprietary open nature of the Internet protocols, which encourages vendor interoperability and prevents any one company from exerting too much control over the network. The estimated population of Internet users is 1.97 billion as of 30 June 2010.

From 2009 onward, the Internet is expected to grow significantly in Brazil, Russia, India, China, and Indonesia (BRICI countries). These countries have large populations and moderate to high economic growth, but still low Internet penetration rates. In 2009, the BRICI countries represented about 45 percent of the world's population and had approximately 610 million Internet users, but by 2015, Internet users in BRICI countries will double to 1.2 billion, and will triple in Indonesia.

Technology And IPv6

Protocols

The complex communications infrastructure of the Internet consists of its hardware components and a system of software layers that control various aspects of the architecture. While the hardware can often be used to support other software systems, it is the design and the rigorous standardization process of the software architecture that characterizes the Internet and provides the foundation for its scalability and success. The responsibility for the architectural design of the Internet software systems has been delegated to the Internet Engineering Task Force (IETF). The IETF conducts standard-setting work groups, open to any individual, about the various aspects of Internet architecture. Resulting discussions and final standards are published in a series of publications, each called a Request for Comments (RFC), freely available on the IETF web site. The principal methods of networking that enable the Internet are contained in specially designated RFCs that constitute the Internet Standards. Other less rigorous documents are simply informative, experimental, or historical, or document the best current practices (BCP) when implementing Internet technologies.

The Internet Standards describe a framework known as the Internet Protocol Suite. This is a model architecture that divides methods into a layered system of protocols (RFC 1122, RFC 1123). The layers correspond to the environment or scope in which their services operate. At the top is the Application Layer, the space for the application-specific networking methods used in software applications, e.g., a web browser program. Below this top layer, the Transport Layer connects applications on different hosts via the network (e.g., client–server model) with appropriate data exchange methods. Underlying these layers are the core networking technologies, consisting of two layers. The Internet Layer enables computers to identify and locate each other via Internet Protocol (IP) addresses, and allows them to connect to one-another via intermediate (transit) networks. Lastly, at the bottom of the architecture, is a software layer, the Link Layer, that provides connectivity between hosts on the same local network link, such as a local area network (LAN) or a dial-up connection. The model, also known as TCP/IP, is designed to be independent of the underlying hardware which the model therefore does not concern itself with in any detail. Other models have been developed, such as the Open Systems Interconnection (OSI) model, but they are not compatible in the details of description, nor implementation, but many similarities exist and the TCP/IP protocols are usually included in the discussion of OSI networking.

The most prominent component of the Internet model is the Internet Protocol (IP) which provides addressing systems (IP addresses) for computers on the Internet. IP enables internetworking and essentially establishes the Internet itself. IP Version 4 (IPv4) is the initial version used on the first generation of the today's Internet and is still in dominant use. It was designed to address up to ~4.3 billion (10⁹) Internet hosts. However, the explosive growth of the Internet has led to IPv4 address exhaustion which is estimated to enter its final stage in approximately 2011. A new protocol version, IPv6, was developed in the mid 1990s which provides vastly larger addressing capabilities and more efficient routing of Internet traffic. IPv6 is currently in commercial deployment phase around the world and Internet address registries (RIRs) have begun to urge all resource managers to plan rapid adoption and conversion.

IPv6 is not interoperable with IPv4. It essentially establishes a "parallel" version of the Internet not directly accessible with IPv4 software. This means software upgrades or translator facilities are necessary for every networking device that needs to communicate on the IPv6 Internet. Most modern computer operating systems are already converted to operate with both versions of the Internet Protocol. Network infrastructures, however, are still lagging in this development. Aside from the complex physical connections that make up its infrastructure, the Internet is facilitated by bi- or multi-lateral commercial contracts (e.g., peering agreements), and by technical specifications or protocols that describe how to exchange data over the network. Indeed, the Internet is defined by its interconnections and routing policies.

Version control systems allow collaborating teams to work on shared sets of documents without either accidentally overwriting each other's work or having members wait until they get "sent" documents to be able to make their contributions. Business and project teams can share calendars as well as documents and other information. Such collaboration occurs in a wide variety of areas including scientific research, software development, conference planning, political activism and creative writing. Social and political collaboration is also becoming more widespread as both Internet access and computer literacy grow. From the flash mob 'events' of the early 2000s to the use of social networking in the 2009 Iranian election protests, the Internet allows people to work together more effectively and in many more ways than was possible without it.

The Internet allows computer users to remotely access other computers and information stores easily, wherever they may be across the world. They may do this with or without the use of security, authentication and encryption technologies, depending on the requirements. This is encouraging new ways of working from home, collaboration and information sharing in many industries. An accountant sitting at home can audit the books of a company based in another country, on a server situated in a third country that is remotely maintained by IT specialists in a fourth. These accounts could have been created by home-working bookkeepers, in other remote locations, based on information emailed to them from offices all over the world. Some of these things were possible before the widespread use of the Internet, but the cost of private leased lines would have made many of them infeasible in practice. An office worker away from their desk, perhaps on the other side of the world on a business trip or a holiday, can open a remote desktop session into his normal office PC using a secure Virtual Private Network (VPN) connection via the Internet. This gives the worker complete access to all of his or her normal files and data, including email and other applications, while away from the office. This concept has been referred to among system administrators as the Virtual Private Nightmare, because it extends the secure perimeter of a corporate network into its employees' homes.

IPv6

Internet Protocol version 6 (IPv6) is a version of the Internet Protocol (IP) that is designed to succeed Internet Protocol version 4 (IPv4). The Internet operates by transferring data in small packets that are independently routed across networks as specified by an international communications protocol known as the Internet Protocol. Each data packet contains two numeric addresses that are the packet's origin and destination devices. Since 1981, IPv4 has been the publicly used Internet Protocol, and it is currently the foundation for most Internet communications. The Internet's growth has created a need for more addresses than IPv4 has. IPv6 allows for vastly more numerical addresses, but switching from IPv4 to IPv6 may be a difficult process.

IPv6 was developed by the Internet Engineering Task Force (IETF) to deal with the long-anticipated IPv4 address exhaustion, and is described in Internet standard document RFC 2460, published in December 1998. Like IPv4, IPv6 is an Internet Layer protocol for packet-switched internetworking and provides end-to-end datagram transmission across multiple IP networks. While IPv4 allows 32 bits for an Internet Protocol address, and can therefore support 2³² (4,294,967,296) addresses, IPv6 uses 128-bit addresses, so the new address space supports 2¹²⁸ (approximately 340 undecillion or 3.4×1038
) addresses. This expansion allows for many more devices and users on the internet as well as extra flexibility in allocating addresses and efficiency for routing traffic. It also eliminates the primary need for network address translation (NAT), which gained widespread deployment as an effort to alleviate IPv4 address exhaustion.

IPv6 implements additional features not present in IPv4. It simplifies aspects of address assignment (stateless address autoconfiguration) and network renumbering (prefix and router announcements) when changing Internet connectivity providers. The IPv6 subnet size has been standardized by fixing the size of the host identifier portion of an address to 64 bits to facilitate an automatic mechanism for forming the host identifier from link layer media addressing information (MAC address). Network security is also integrated into the design of the IPv6 architecture, and the IPv6 specification mandates support for IPsec as a fundamental interoperability requirement.

The last top level (/8) block of free IPv4 addresses was assigned in February 2011, although many free addresses still remain in most assigned blocks and will continue to be allocated for some time. While IPv6 has been implemented on all major operating systems in use in commercial, business, and home consumer environments, IPv6 does not implement interoperability features with IPv4, and creates essentially a parallel, independent network. Exchanging traffic between the two networks requires special translator gateways, but modern computer operating systems implement dual-protocol software for transparent access to both networks using 'tunneling'. In December 2010, despite marking its 12th anniversary as a Standards Track protocol, IPv6 was only in its infancy in terms of general worldwide deployment. A 2008 study

Motivation and origins

IPv4

The first publicly used version of the Internet Protocol, Version 4 (IPv4), provides an addressing capability of 2³² or approximately 4.3 billion addresses. This was deemed sufficient in the early design stages of the Internet when the explosive growth and worldwide proliferation of networks were not anticipated.

During the first decade of operation of the Internet, by the late 1980s, it became apparent that methods had to be developed to conserve address space. In the early 1990s, even after the redesign of the addressing system using a classless network model, it became clear that this would not suffice to prevent IPv4 address exhaustion, and that further changes to the Internet infrastructure were needed.

Working group proposal

By the beginning of 1992, several proposals appeared and by the end of 1992, the IETF announced a call for white papers and the creation of the IP Next Generation (IPng) area of working groups.

IPv4 address exhaustion from 1995 to 2011.

The Internet Engineering Task Force adopted the IPng model on July 25, 1994, with the formation of several IPng working groups. By 1996, a series of RFCs was released defining Internet Protocol version 6 (IPv6), starting with RFC 1883. (Version 5 was used by the experimental Internet Stream Protocol.)

It is widely expected that IPv4 will be supported alongside IPv6 for the foreseeable future. IPv4-only and IPv6-only nodes cannot communicate directly, and need assistance from an intermediary gateway or must use other transition mechanisms.

IPv4 exhaustion

As of February 3, 2011, the last batch of 5 /8 address blocks were allocated to the Regional Internet Registries. Each of the address blocks represents approximately 16.7 million possible addresses, or over 80 million combined potential addresses. These addresses could well be fully consumed within three to six months at current rates of allocation.

In 2003, the director of Asia-Pacific Network Information Centre (APNIC), Paul Wilson, stated that, based on then-current rates of deployment, the available space would last for one or two decades. In September 2005, a report by Cisco Systems suggested that the pool of available addresses would exhaust in as little as 4 to 5 years^.In 2010, a daily updated report projected the global address pool exhaustion pool for the first quarter of 2011, and depletion at the five regional Internet registries before the end of 2011. In 2008, a policy process started for the end-game and post-exhaustion era. On February 3, 2011, in a ceremony in Miami, the Internet Assigned Numbers Authority (IANA) assigned the last five /8 allocation blocks of IPv4 addresses, officially depleting the global pool of completely fresh blocks of addresses.

Allocation of addresses from within these blocks to organizations will continue for some time, but APNIC is expected to be the first Regional Internet registry (RIR) to run out of IPv4 addresses completely in June or July of 2011, but depending on demand there is some chance it will occur as early as May or as late as September.

Comparison to IPv4

IPv6 specifies a new packet format, designed to minimize packet header processing by routers. Because the headers of IPv4 packets and IPv6 packets are significantly different, the two protocols are not interoperable. However, in most respects, IPv6 is a conservative extension of IPv4. Most transport and application-layer protocols need little or no change to operate over IPv6; exceptions are application protocols that embed internet-layer addresses, such as FTP and NTPv3.

Larger address space

Decomposition of an IPv6 address into its binary form

The most important feature of IPv6 is a much larger address space than in IPv4. The length of an IPv6 address is 128 bits, compared to 32 bits in IPv4.The address space therefore supports 2¹²⁸ or approximately 3.4×1038
addresses. By comparison, this amounts to approximately 5×1028
addresses for each of the 6.8 billion people alive in 2010. (In addition, the IPv4 address space is poorly allocated, with approximately 14% of all available addresses utilized.) While these numbers are very large, it was not the intent of the designers of the IPv6 address space to assure geographical saturation with usable addresses. Rather, the longer addresses simplify allocation of addresses, enable efficient route aggregation, and implementation of special addressing features. In IPv4, complex Classless Inter-Domain Routing (CIDR) methods were developed to make the best use of the small address space. The standard size of a subnet in IPv6 is 2⁶⁴ addresses, the square of the size of the entire IPv4 address space. Thus, actual address space utilization rates will be small in IPv6, but network management and routing efficiency is improved by the large subnet space and hierarchical route aggregation.

Renumbering an existing network for a new connectivity provider with different routing prefixes is a major effort with IPv4. With IPv6, however, changing the prefix announced by a few routers can in principle renumber an entire network since the host identifiers (the least-significant 64 bits of an address) can be independently self-configured by a host.

Multicast

Multicast, the transmission of a packet to multiple destinations in a single send operation, is part of the base specification in IPv6. In IPv4 this is an optional although commonly implemented feature. IPv6 multicast addressing shares common features and protocols with IPv4 multicast, but also provides changes and improvements by eliminating the need for certain protocols. IPv6 does not implement traditional IP broadcast, i.e. the transmission of a packet to all hosts on the attached link using a special broadcast address, and therefore does not define broadcast addresses. In IPv6, the same result can be achieved by sending a packet to the link-local all nodes multicast group at address ff02::1, which is analogous to IPv4 multicast to address 224.0.0.1. IPv6 also supports new multicast solutions, including embedding rendezvous point addresses in an IPv6 multicast group address which simplifies the deployment of inter-domain solutions.

In IPv4 it was very difficult for an organization to get even one globally routable multicast group assignment and implementation of inter-domain solutions was very arcane. Unicast address assignments by a local Internet registry for IPv6 have at least a 64-bit routing prefix, yielding the smallest subnet size available in IPv6 (also 64 bits). With such an assignment it is possible to embed the unicast address prefix into the IPv6 multicast address format, while still providing a 32-bit block, the least significant bits of the address, or approximately 4.2 billion multicast group identifiers. Thus each user of an IPv6 subnet automatically has available a set of globally routable source-specific multicast groups for multicast applications.

Stateless address autoconfiguration (SLAAC)

IPv6 hosts can configure themselves automatically when connected to a routed IPv6 network using Internet Control Message Protocol version 6 (ICMPv6) router discovery messages. When first connected to a network, a host sends a link-local router solicitation multicast request for its configuration parameters; if configured suitably, routers respond to such a request with a router advertisement packet that contains network-layer configuration parameters.

If IPv6 stateless address autoconfiguration is unsuitable for an application, a network may use stateful configuration with the Dynamic Host Configuration Protocol version 6 (DHCPv6) or hosts may be configured statically.

Routers present a special case of requirements for address configuration, as they often are sources for autoconfiguration information, such as router and prefix advertisements. Stateless configuration for routers can be achieved with a special router renumbering protocol.

Mandatory support for network layer security

Internet Protocol Security (IPsec) was originally developed for IPv6, but found widespread deployment first in IPv4, into which it was back-engineered. IPsec is an integral part of the base protocol suite in IPv6.IPsec support is mandatory in IPv6 but optional for IPv4.

Simplified processing by routers

In IPv6, the packet header and the process of packet forwarding have been simplified. Although IPv6 packet headers are at least twice the size as IPv4 packet headers, packet processing by routers is generally more efficient,thereby extending the end-to-end principle of Internet design. Specifically:

The packet header in IPv6 is simpler than that used in IPv4, with many rarely used fields moved to separate optional header extensions.
IPv6 routers do not perform fragmentation. IPv6 hosts are required to either perform PMTU discovery, perform end-to-end fragmentation, or to send packets no larger than the IPv6 default minimum MTU size of 1280 octets.
The IPv6 header is not protected by a checksum; integrity protection is assumed to be assured by both link layer and higher layer (TCP, UDP, etc.) error detection.Therefore, IPv6 routers do not need to recompute a checksum when header fields (such as the time to live (TTL) or hop count) change.
The TTL field of IPv4 has been renamed to Hop Limit, reflecting the fact that routers are no longer expected to compute the time a packet has spent in a queue.

Mobility

Unlike mobile IPv4, mobile IPv6 avoids triangular routing and is therefore as efficient as native IPv6. IPv6 routers may also support network mobility which allows entire subnets to move to a new router connection point without renumbering.

Options extensibility

The IPv6 protocol header has a fixed size (40 octets). Options are implemented as additional extension headers after the IPv6 header, which limits their size only by the size of an entire packet. The extension header mechanism provides extensibility to support future services for quality of service, security, mobility, and others, without redesign of the basic protocol.

Jumbograms

IPv4 limits packets to 65535 (2¹⁶ - 1) octets of payload. IPv6 has optional support for packets over this limit, referred to as jumbograms, which can be as large as 4294967295 (2³² - 1) octets. The use of jumbograms may improve performance over high-MTU links. The use of jumbograms is indicated by the Jumbo Payload Option header.

Uploading and downloading

In computer networks, to download means to receive data to a local system from a remote system, or to initiate such a data transfer. Examples of a remote system from which a download might be performed include a webserver, FTP server, email server, or other similar systems. A download can mean either any file that is offered for downloading or that has been downloaded, or the process of receiving such a file.

It has become more common to mistake and confuse the meaning of downloading and installing or simply combine them incorrectly together.

The inverse operation, uploading, can refer to the sending of data from a local system to a remote system such as a server or another client with the intent that the remote system should store a copy of the data being transferred, or the initiation of such a process. The words first came into popular usage among computer users with the increased popularity of Bulletin Board Systems (BBSs), facilitated by the widespread distribution and implementation of dial-up access the in the 1970s.

Download

A symbol for downloading to a hard drive.

The use of the terms uploading and downloading often imply that the data sent or received is to be stored permanently, or at least stored more than temporarily. In contrast, the term downloading is distinguished from the related concept of streaming, which indicates the receiving of data that is used near immediately as it is received, while the transmission is still in progress and which may not be stored long-term, whereas in a process described using the term downloading, this would imply that the data is only usable when it has been received in its entirety. Increasingly, websites that offer streaming media or media displayed in-browser, such as YouTube, and which place restrictions on the ability of users to save these materials to their computers after they have been received, say that downloading is not permitted. In this context, "download" implies specifically "receive and save" instead of simply "receive". However, it is also important to note that "downloading" is not the same as "transferring" (i.e., sending/receiving data between two storage devices would be a transferral of data, but receiving data from the Internet would be considered a download of data).

Sideload

When applied to local transfers (sending data from one local system to another local system), it is often difficult to decide if it is an upload or download, as both source and destination are in the local control of the user. Technically if the user uses the receiving device to initiate the transfer then it would be a download and if they used the sending device to initiate it would be an upload. However, as most non-technical users tend to use the term download to refer to any data transfer, the term "sideload" is sometimes being used to cover all local to local transfers to end this confusion.

Remote upload

When there is a transfer of data from a remote system to another remote system, the process is called "remote uploading". This is used by some online file hosting services.

Remote uploading is also used in situations where the computers that need to share data are located on a distant high speed local area network, and the remote control is being performed using a comparatively slow dialup modem connection.

For example:

The user remotely accesses a file hosting service at MyRemoteHost.
The user finds a public file at PublicRemoteHost and wants to keep a copy in their MyRemoteHost.
To have it done they "remote upload" the file from PublicRemoteHost to MyRemoteHost.
None of the hosts are located on the user's the local network.

Without remote uploading functionality, the user would be required to download the file first to their local host and then re-upload it to the remote file hosting server.

Where the connection to the remote computers is via a dialup connection, the transfer time required to download locally and then re-upload could increase from seconds, to hours or days.

BitTorrent (protocol)

BitTorrent is a peer-to-peer file sharing protocol used for distributing large amounts of data. BitTorrent is one of the most common protocols for transferring large files, and it has been estimated that it accounted for roughly 27% to 55% of all Internet traffic (depending on geographical location) as of February 2009.

Programmer Bram Cohen designed the protocol in April 2001 and released a first implementation on July 2, 2001. It is now maintained by Cohen's company BitTorrent, Inc. There are numerous BitTorrent clients available for a variety of computing platforms.

Description

The BitTorrent protocol can distribute a large file without the heavy load on the source computer and network. Rather than downloading a file from a single source, the BitTorrent protocol allows users to join a "swarm" of hosts to download and upload from each other simultaneously . The protocol works as an alternative method to distribute data and can work over networks with low bandwidth so even small computers, like mobile phones, are able to distribute files to many recipients.

A user who wants to upload a file first creates a small torrent descriptor file that he distributes by conventional means (web, email, etc.). He then makes the file itself available through a BitTorrent node acting as a seed. Those with the torrent descriptor file can give it to their own BitTorrent nodes which, acting as peers or leechers, download it by connecting to the seed and/or other peers.

The file being distributed is divided into segments called pieces. As each peer receives a new piece of the file it becomes a source of that piece to other peers, relieving the seed from having to send a copy to every peer. With BitTorrent, the task of distributing the file is shared by those who want it; it is entirely possible for the seed to send only a single copy of the file itself to an unlimited number of peers.

Each piece is protected by a cryptographic hash contained in the torrent descriptor. This prevents nodes from maliciously modifying the pieces they pass on to other nodes. If a node starts with an authentic copy of the torrent descriptor, it can verify the authenticity of the actual file it has received.

When a peer completely downloads a file, it becomes an additional seed. This eventual shift from peers to seeders determines the overall "health" of the file (as determined by the number of times a file is available in its complete form).

This distributed nature of BitTorrent leads to a flood like spreading of a file throughout peers. As more peers join the swarm, the likelihood of a successful download increases. Relative to standard Internet hosting, this provides a significant reduction in the original distributor's hardware and bandwidth resource costs. It also provides redundancy against system problems, reduces dependence on the original distributor and provides a source for the file which is generally temporary and therefore harder to trace than when provided by the enduring availability of a host in standard file distribution techniques.

Operation

A BitTorrent client is any program that implements the BitTorrent protocol. Each client is capable of preparing, requesting, and transmitting any type of computer file over a network, using the protocol. A peer is any computer running an instance of a client.

To share a file or group of files, a peer first creates a small file called a "torrent" (e.g. MyFile.torrent). This file contains metadata about the files to be shared and about the tracker, the computer that coordinates the file distribution. Peers that want to download the file must first obtain a torrent file for it and connect to the specified tracker, which tells them from which other peers to download the pieces of the file.

Though both ultimately transfer files over a network, a BitTorrent download differs from a classic download (as is typical with an HTTP or FTP request, for example) in several fundamental ways:

BitTorrent makes many small data requests over different TCP connections to different machines, while classic downloading is typically made via a single TCP connection to a single machine.
BitTorrent downloads in a random or in a "rarest-first"approach that ensures high availability, while classic downloads are sequential.

Taken together, these differences allow BitTorrent to achieve much lower cost to the content provider, much higher redundancy, and much greater resistance to abuse or to "flash crowds" than regular server software. However, this protection, theoretically, comes at a cost: downloads can take time to rise to full speed because it may take time for enough peer connections to be established, and it may take time for a node to receive sufficient data to become an effective uploader. This contrasts with regular downloads (such as from an HTTP server, for example) that, while more vulnerable to overload and abuse, rise to full speed very quickly and maintain this speed throughout.

In general, BitTorrent's non-contiguous download methods have prevented it from supporting "progressive downloads" or "streaming playback". However, comments made by Bram Cohen in January 2007 suggest that streaming torrent downloads will soon be commonplace and ad supported streaming appears to be the result of those comments. In January 2011 Cohen demonstrated an early version of BitTorrent streaming, saying the feature will be available by summer 2011.

Creating and publishing torrents

The peer distributing a data file treats the file as a number of identically sized pieces, usually with byte sizes of a power of 2, and typically between 32 kB and 4 MB each. The peer creates a hash for each piece, using the SHA-1 hash function, and records it in the torrent file. Pieces with sizes greater than 512 kB will reduce the size of a torrent file for a very large payload, but is claimed to reduce the efficiency of the protocol. When another peer later receives a particular piece, the hash of the piece is compared to the recorded hash to test that the piece is error-free. Peers that provide a complete file are called seeders, and the peer providing the initial copy is called the initial seeder.

The exact information contained in the torrent file depends on the version of the BitTorrent protocol. By convention, the name of a torrent file has the suffix .torrent. Torrent files have an "announce" section, which specifies the URL of the tracker, and an "info" section, containing (suggested) names for the files, their lengths, the piece length used, and a SHA-1 hash code for each piece, all of which are used by clients to verify the integrity of the data they receive.

Torrent files are typically published on websites or elsewhere, and registered with at least one tracker. The tracker maintains lists of the clients currently participating in the torrent. Alternatively, in a trackerless system (decentralized tracking) every peer acts as a tracker. Azureus was the first^[^{citation needed}^] BitTorrent client to implement such a system through the distributed hash table (DHT) method. An alternative and incompatible DHT system, known as Mainline DHT, was later developed and adopted by the BitTorrent (Mainline), µTorrent, Transmission, rTorrent, KTorrent, BitComet, and Deluge clients.

After the DHT was adopted, a "private" flag — analogous to the broadcast flag — was unofficially introduced, telling clients to restrict the use of decentralized tracking regardless of the user's desires. The flag is intentionally placed in the info section of the torrent so that it cannot be disabled or removed without changing the identity of the torrent. The purpose of the flag is to prevent torrents from being shared with clients that do not have access to the tracker. The flag was requested for inclusion in the official specification in August, 2008, but has not been accepted. Clients that have ignored the private flag were banned by many trackers, discouraging the practice.

Downloading torrents and sharing files

Users browse the web to find a torrent of interest, download it, and open it with a BitTorrent client. The client connects to the tracker(s) specified in the torrent file, from which it receives a list of peers currently transferring pieces of the file(s) specified in the torrent. The client connects to those peers to obtain the various pieces. If the swarm contains only the initial seeder, the client connects directly to it and begins to request pieces.

Clients incorporate mechanisms to optimize their download and upload rates; for example they download pieces in a random order to increase the opportunity to exchange data, which is only possible if two peers have different pieces of the file.

The effectiveness of this data exchange depends largely on the policies that clients use to determine to whom to send data. Clients may prefer to send data to peers that send data back to them (a tit for tat scheme), which encourages fair trading. But strict policies often result in suboptimal situations, such as when newly joined peers are unable to receive any data because they don't have any pieces yet to trade themselves or when two peers with a good connection between them do not exchange data simply because neither of them takes the initiative. To counter these effects, the official BitTorrent client program uses a mechanism called "optimistic unchoking", whereby the client reserves a portion of its available bandwidth for sending pieces to random peers (not necessarily known good partners, so called preferred peers) in hopes of discovering even better partners and to ensure that newcomers get a chance to join the swarm.

Although swarming scales well to tolerate flash crowds for popular content, it is less useful for unpopular content. Peers arriving after the initial rush might find the content unavailable and need to wait for the arrival of a seed in order to complete their downloads. The seed arrival, in turn, may take long to happen (this is termed the seeder promotion problem). Since maintaining seeds for unpopular content entails high bandwidth and administrative costs, this runs counter to the goals of publishers that value BitTorrent as a cheap alternative to a client-server approach. This occurs on a huge scale; measurements have shown that 38% of all new torrents become unavailable within the first month. A strategy adopted by many publishers which significantly increases availability of unpopular content consists of bundling multiple files in a single swarm. More sophisticated solutions have also been proposed; generally, these use cross-torrent mechanisms through which multiple torrents can cooperate to better share content.

BitTorrent does not offer its users anonymity. It is possible to obtain the IP addresses of all current and possibly previous participants in a swarm from the tracker. This may expose users with insecure systems to attacks. It may also expose users to the risk of being sued, if they are distributing files without permission from the copyright holder(s). However, there are ways to promote anonymity; for example, the OneSwarm project layers privacy-preserving sharing mechanisms on top of the original BitTorrent protocol.

Adoption

A growing number of individuals and organizations are using BitTorrent to distribute their own or licensed material. Independent adopters report that without using BitTorrent technology and its dramatically reduced demands on their private networking hardware and bandwidth, they could not afford to distribute their files.

Film, video and music

BitTorrent Inc. has amassed a number of licenses from Hollywood studios for distributing popular content from their websites.
Sub Pop Records releases tracks and videos via BitTorrent Inc. to distribute its 1000+ albums. Babyshambles and The Libertines (both bands associated with Pete Doherty) have extensively used torrents to distribute hundreds of demos and live videos. US industrial rock band Nine Inch Nails frequently distributes albums via BitTorrent.
Podcasting software is starting to integrate BitTorrent to help podcasters deal with the download demands of their MP3 "radio" programs. Specifically, Juice and Miro (formerly known as Democracy Player) support automatic processing of .torrent files from RSS feeds. Similarly, some BitTorrent clients, such as µTorrent, are able to process web feeds and automatically download content found within them.
DGM Live! purchases are provided via BitTorrent.

Broadcasters

In 2008, the CBC became the first public broadcaster in North America to make a full show (Canada's Next Great Prime Minister) available for download using BitTorrent
The Norwegian Broadcasting Corporation (NRK) has since March 2008 experimented with bittorrent distribution, available online. Only selected material in which NRK owns all royalties are published. Responses have been very positive, and NRK is planning to offer more content.
The Dutch VPRO broadcasting organization released three documentaries under a Creative Commons license using the content distribution feature of the Mininova tracker.

Personal material

The Amazon S3 "Simple Storage Service" is a scalable Internet-based storage service with a simple web service interface, equipped with built-in BitTorrent support.
Blog Torrent offers a simplified BitTorrent tracker to enable bloggers and non-technical users to host a tracker on their site. Blog Torrent also allows visitors to download a "stub" loader, which acts as a BitTorrent client to download the desired file, allowing users without BitTorrent software to use the protocol. This is similar to the concept of a self-extracting archive.

Software

Blizzard Entertainment uses BitTorrent (via a proprietary client called the "Blizzard Downloader") to distribute most content for StarCraft II and World of Warcraft, including the games themselves.
Many software games, especially those whose large size makes them difficult to host due to bandwidth limits, extremely frequent downloads, and unpredictable changes in network traffic, will distribute instead a specialized, stripped down bittorrent client with enough functionality to download the game from the other running clients and the primary server (which is maintained in case not enough peers are available).
Many major open source and free software projects encourage BitTorrent as well as conventional downloads of their products (via HTTP, FTP etc.) to increase availability and to reduce load on their own servers, especially when dealing with larger files.
Entropia Universe also begun distributing the client file(s) through BitTorrent.

Government

The UK government used BitTorrent to distribute details about how the tax money of UK citizens was spent.

Others

Facebook uses BitTorrent to distribute updates to Facebook servers.
Twitter uses BitTorrent to distribute updates to Twitter servers.

Network impact

CableLabs, the research organization of the North American cable industry, estimates that BitTorrent represents 18% of all broadband traffic. In 2004, CacheLogic put that number at roughly 35% of all traffic on the Internet. The discrepancies in these numbers are caused by differences in the method used to measure P2P traffic on the Internet.

Routers that use network address translation (NAT) must maintain tables of source and destination IP addresses and ports. Typical home routers are limited to about 2000 table entries while some more expensive routers have larger table capacities. BitTorrent frequently contacts 300–500 servers per second rapidly filling the NAT tables. This is a common cause of home routers locking up.

Indexing

The BitTorrent protocol provides no way to index torrent files. As a result, a comparatively small number of websites have hosted a large majority of torrents, many linking to copyrighted material without the authorization of copyright holders, rendering those sites especially vulnerable to lawsuits. Several types of websites support the discovery and distribution of data on the BitTorrent network.

Public torrent hosting sites such as The Pirate Bay allow users to search and download from their collection of torrent files. Users can typically also upload torrent files for content they wish to distribute. Often, these sites also run BitTorrent trackers for their hosted torrent files, but these two functions are not mutually dependent: a torrent file could be hosted on one site and tracked by another, unrelated site.

Private host/tracker sites operate like public ones except that they restrict access to registered users and keep track of the amount of data each user uploads and downloads, in an attempt to reduce leeching.

Search engines allow the discovery of torrent files that are hosted and tracked on other sites; examples include Mininova, BTJunkie, Torrentz, The Pirate Bay, Eztorrent and isoHunt. These sites allow the user to ask for content meeting specific criteria (such as containing a given word or phrase) and retrieve a list of links to torrent files matching those criteria. This list can often be sorted with respect to several criteria, relevance (seeders-leechers ratio) being one of the most popular and useful (due to the way the protocol behaves, the download bandwidth achievable is very sensitive to this value). Bram Cohen launched a BitTorrent search engine on http://www.bittorrent.com/search that co-mingles licensed content with search results. Metasearch engines allow one to search several BitTorrent indices and search engines at once.

Technologies built on BitTorrent

The BitTorrent protocol is still under development and therefore may still acquire new features and other enhancements such as improved efficiency.

Distributed trackers

On May 2, 2005, Azureus 2.3.0.0 (now known as Vuze) was released, introducing support for "trackerless" torrents through a system called the "distributed database." This system is a DHT implementation which allows the client to use torrents that do not have a working BitTorrent tracker. The following month, BitTorrent, Inc. released version 4.2.0 of the Mainline BitTorrent client, which supported an alternative DHT implementation (popularly known as "Mainline DHT") that is incompatible with that of Azureus. Current versions of the official BitTorrent client, µTorrent, BitComet, and BitSpirit all share compatibility with Mainline DHT. Both DHT implementations are based on Kademlia. As of version 3.0.5.0, Azureus also supports Mainline DHT in addition to its own distributed database through use of an optional application plugin. This potentially allows the Azureus client to reach a bigger swarm.

Another idea that has surfaced in Vuze is that of virtual torrents. This idea is based on the distributed tracker approach and is used to describe some web resource. Currently, it is used for instant messaging. It is implemented using a special messaging protocol and requires an appropriate plugin. Anatomic P2P is another approach, which uses a decentralized network of nodes that route traffic to dynamic trackers.

Most BitTorrent clients also use Peer exchange (PEX) to gather peers in addition to trackers and DHT. Peer exchange checks with known peers to see if they know of any other peers. With the 3.0.5.0 release of Vuze, all major BitTorrent clients now have compatible peer exchange.

Web seeding

Web seeding was implemented in 2006 as the ability of BitTorrent clients to download torrent pieces from an HTTP source in addition to the swarm. The advantage of this feature is that a website may distribute a torrent for a particular file or batch of files and make those files available for download from that same web server; this can simplify long-term seeding and load balancing through the use of existing, cheap, web hosting setups. In theory, this would make using BitTorrent almost as easy for a web publisher as creating a direct HTTP download. In addition, it would allow the "web seed" to be disabled if the swarm becomes too popular while still allowing the file to be readily available.

This feature has two specifications.

The first was created by John "TheSHAD0W" Hoffman, who created BitTornado. From version 5.0 onward, the Mainline BitTorrent client also supports web seeds, and the BitTorrent web site had a simple publishing tool that creates web seeded torrents. µTorrent added support for web seeds in version 1.7. BitComet added support for web seeds in version 1.14. This first specification requires running a web service that serves content by info-hash and piece number, rather than filename.

The other specification can rely on a basic HTTP download space.

In September 2010, a new service named Burnbit was launched which generates a torrent from any URL using webseeding.

RSS feeds

A technique called Broadcatching combines RSS with the BitTorrent protocol to create a content delivery system, further simplifying and automating content distribution. Steve Gillmor explained the concept in a column for Ziff-Davis in December, 2003. The discussion spread quickly among bloggers (Ernest Miller, Chris Pirillo, etc.). In an article entitled Broadcatching with BitTorrent, Scott Raymond explained:

I want RSS feeds of BitTorrent files. A script would periodically check the feed for new items, and use them to start the download. Then, I could find a trusted publisher of an Alias RSS feed, and "subscribe" to all new episodes of the show, which would then start downloading automatically — like the "season pass" feature of the TiVo.

—Scott Raymond, scottraymond.net

The RSS feed will track the content, while BitTorrent ensures content integrity with cryptographic hashing of all data, so feed subscribers will receive uncorrupted content.

One of the first and popular software clients (free and open source) for broadcatching is Miro. Other free software clients such as PenguinTV and KatchTV are also now supporting broadcatching.

The BitTorrent web-service MoveDigital has the ability to make torrents available to any web application capable of parsing XML through its standard REST-based interface. Additionally, Torrenthut is developing a similar torrent API that will provide the same features, as well as further intuition to help bring the torrent community to Web 2.0 standards. Alongside this release is a first PHP application built using the API called PEP, which will parse any Really Simple Syndication (RSS 2.0) feed and automatically create and seed a torrent for each enclosure found in that feed.

Throttling and encryption

Since BitTorrent makes up a large proportion of total traffic, some ISPs have chosen to throttle (slow down) BitTorrent transfers to ensure network capacity remains available for other uses. For this reason, methods have been developed to disguise BitTorrent traffic in an attempt to thwart these efforts.

Protocol header encrypt (PHE) and Message stream encryption/Protocol encryption (MSE/PE) are features of some BitTorrent clients that attempt to make BitTorrent hard to detect and throttle. At the moment Vuze, Bitcomet, KTorrent, Transmission, Deluge, µTorrent, MooPolice, Halite, rTorrent and the latest official BitTorrent client (v6) support MSE/PE encryption.

In September 2006 it was reported that some software could detect and throttle BitTorrent traffic masquerading as HTTP traffic.

Reports in August 2007 indicated that Comcast was preventing BitTorrent seeding by monitoring and interfering with the communication between peers. Protection against these efforts is provided by proxying the client-tracker traffic via an encrypted tunnel to a point outside of the Comcast network. Comcast has more recently called a "truce" with BitTorrent, Inc. with the intention of shaping traffic in a protocol-agnostic manner. Questions about the ethics and legality of Comcast's behavior have led to renewed debate about Net neutrality in the United States.

In general, although encryption can make it difficult to determine what is being shared, BitTorrent is vulnerable to traffic analysis. Thus even with MSE/PE, it may be possible for an ISP to recognize BitTorrent and also to determine that a system is no longer downloading but only uploading data, and terminate its connection by injecting TCP RST (reset flag) packets.

Multitracker

Another unofficial feature is an extension to the BitTorrent metadata format proposed by John Hoffman and implemented by several indexing websites. It allows the use of multiple trackers per file, so if one tracker fails, others can continue to support file transfer. It is implemented in several clients, such as BitComet, BitTornado, BitTorrent, KTorrent, Transmission, Deluge, µTorrent, rtorrent, and Vuze. Trackers are placed in groups, or tiers, with a tracker randomly chosen from the top tier and tried, moving to the next tier if all the trackers in the top tier fail.

Torrents with multiple trackers can decrease the time it takes to download a file, but also has a few consequences:

Poorly implemented clients may contact multiple trackers, leading to more overhead-traffic.
Torrents from closed trackers suddenly become downloadable by non-members, as they can connect to a seed via an open tracker.

Decentralized keyword search

Even with distributed trackers, a third party is still required to find a specific torrent. This is usually done in the form of a hyperlink from the website of the content owner or through indexing websites like The Pirate Bay or Torrentz.

The Tribler BitTorrent client is the first to incorporate decentralized search capabilities. With Tribler, users can find .torrent files that are hosted among other peers, instead of on a centralized index sites. It adds such an ability to the BitTorrent protocol using a gossip protocol, somewhat similar to the eXeem network which was shut down in 2005. The software includes the ability to recommend content as well. After a dozen downloads the Tribler software can roughly estimate the download taste of the user and recommend additional content.

In May 2007 Cornell University published a paper proposing a new approach to searching a peer-to-peer network for inexact strings, which could replace the functionality of a central indexing site. A year later, the same team implemented the system as a plugin for Vuze called Cubit and published a follow-up paper reporting its success.

A somewhat similar facility but with a slightly different approach is provided by the BitComet client through its "Torrent Exchange" feature. Whenever two peers using BitComet (with Torrent Exchange enabled) connect to each other they exchange lists of all the torrents (name and info-hash) they have in the Torrent Share storage (torrent files which were previously downloaded and for which the user chose to enable sharing by Torrent Exchange).

Thus each client builds up a list of all the torrents shared by the peers it connected to in the current session (or it can even maintain the list between sessions if instructed). At any time the user can search into that Torrent Collection list for a certain torrent and sort the list by categories. When the user chooses to download a torrent from that list, the .torrent file is automatically searched for (by info-hash value) in the DHT Network and when found it is downloaded by the querying client which can after that create and initiate a downloading task.

Implementations

The BitTorrent specification is free to use and many clients are open source, so BitTorrent clients have been created for all common operating systems using a variety of programming languages. The official BitTorrent client, µTorrent, Vuze, Transmission, and BitComet are some of the most popular clients.

Some BitTorrent implementations such as MLDonkey and Torrentflux are designed to run as servers. For example, this can be used to centralize file sharing on a single dedicated server which users share access to on the network. Server-oriented BitTorrent implementations can also be hosted by hosting providers at co-located facilities with high bandwidth Internet connectivity (e.g., a datacenter) which can provide dramatic speed benefits over using BitTorrent from a regular home broadband connection.

Services such as ImageShack can download files on BitTorrent for the user, allowing them to download the entire file by HTTP once it is finished.

The Opera web browser supports BitTorrent, as does Wyzo. BitLet allows users to download Torrents directly from their browser using a Java applet. Sites such as xFiles and DuShare allow to transfer big files directly using bittorrent inside adobe Flash.

An increasing number of hardware devices are being made to support BitTorrent. These include routers and NAS devices containing BitTorrent-capable firmware like OpenWrt.

Proprietary versions of the protocol which implement DRM, encryption, and authentication are found within managed clients such as Pando.

Development

An unimplemented (as of February 2008^[update]) unofficial feature is Similarity Enhanced Transfer (SET), a technique for improving the speed at which peer-to-peer file sharing and content distribution systems can share data. SET, proposed by researchers Pucha, Andersen, and Kaminsky, works by spotting chunks of identical data in files that are an exact or near match to the one needed and transferring these data to the client if the "exact" data are not present. Their experiments suggested that SET will help greatly with less popular files, but not as much for popular data, where many peers are already downloading it. Andersen believes that this technique could be immediately used by developers with the BitTorrent file sharing system.

As of December 2008^[update], BitTorrent, Inc. is working with Oversi on new Policy Discover Protocols that query the ISP for capabilities and network architecture information. Oversi's ISP hosted NetEnhancer box is designed to "improve peer selection" by helping peers find local nodes, improving download speeds while reducing the loads into and out of the ISP's network.

Legal issues

There has been much controversy over the use of BitTorrent trackers. BitTorrent metafiles themselves do not store file contents. Whether the publishers of BitTorrent metafiles violate copyrights by linking to copyrighted material without the authorization of copyright holders is controversial.

Various jurisdictions have pursued legal action against websites that host BitTorrent trackers. High-profile examples include the closing of Suprnova.org, Torrentspy, LokiTorrent, Mininova and OiNK.cd. The Pirate Bay torrent website, formed by a Swedish group, is noted for the "legal" section of its website in which letters and replies on the subject of alleged copyright infringements are publicly displayed. On 31 May 2006, The Pirate Bay's servers in Sweden were raided by Swedish police on allegations by the MPAA of copyright infringement; however, the tracker was up and running again three days later.

BitTorrent and malware

Several studies on BitTorrent have indicated that a large portion of files available for download via BitTorrent contain malware. In particular, one small sample indicated that 18% of all executable programs available for download contained malware. Another study claims that as much as 14.5% of BitTorrent downloads contain zero-day malware, and that BitTorrent was used as the distribution mechanism for 47% of all zero-day malware they have found.

Modern uses, Services and Social impact

Modern uses

The Internet is allowing greater flexibility in working hours and location, especially with the spread of unmetered high-speed connections and web applications.

The Internet can now be accessed almost anywhere by numerous means, especially through mobile Internet devices. Mobile phones, datacards, handheld game consoles and cellular routers allow users to connect to the Internet from anywhere there is a wireless network supporting that device's technology. Within the limitations imposed by small screens and other limited facilities of such pocket-sized devices, services of the Internet, including email and the web, may be available. Service providers may restrict the services offered and wireless data transmission charges may be significantly higher than other access methods.

Educational material at all levels from pre-school to post-doctoral is available from websites. Examples range from CBeebies, through school and high-school revision guides, virtual universities, to access to top-end scholarly literature through the likes of Google Scholar. In distance education, help with homework and other assignments, self-guided learning, whiling away spare time, or just looking up more detail on an interesting fact, it has never been easier for people to access educational information at any level from anywhere. The Internet in general and the World Wide Web in particular are important enablers of both formal and informal education.

The low cost and nearly instantaneous sharing of ideas, knowledge, and skills has made collaborative work dramatically easier, with the help of collaborative software. Not only can a group cheaply communicate and share ideas, but the wide reach of the Internet allows such groups to easily form in the first place. An example of this is the free software movement, which has produced, among other programs, Linux, Mozilla Firefox, and OpenOffice.org. Internet "chat", whether in the form of IRC chat rooms or channels, or via instant messaging systems, allow colleagues to stay in touch in a very convenient way when working at their computers during the day. Messages can be exchanged even more quickly and conveniently than via email. Extensions to these systems may allow files to be exchanged, "whiteboard" drawings to be shared or voice and video contact between team members.

Services

Information

Many people use the terms Internet and World Wide Web, or just the Web, interchangeably, but the two terms are not synonymous. The World Wide Web is a global set of documents, images and other resources, logically interrelated by hyperlinks and referenced with Uniform Resource Identifiers (URIs). URIs allow providers to symbolically identify services and clients to locate and address web servers, file servers, and other databases that store documents and provide resources and access them using the Hypertext Transfer Protocol (HTTP), the primary carrier protocol of the Web. HTTP is only one of the hundreds of communication protocols used on the Internet. Web services may also use HTTP to allow software systems to communicate in order to share and exchange business logic and data.

World Wide Web browser software, such as Microsoft's Internet Explorer, Mozilla Firefox, Opera, Apple's Safari, and Google Chrome, let users navigate from one web page to another via hyperlinks embedded in the documents. These documents may also contain any combination of computer data, including graphics, sounds, text, video, multimedia and interactive content including games, office applications and scientific demonstrations. Through keyword-driven Internet research using search engines like Yahoo! and Google, users worldwide have easy, instant access to a vast and diverse amount of online information. Compared to printed encyclopedias and traditional libraries, the World Wide Web has enabled the decentralization of information.

The Web has also enabled individuals and organizations to publish ideas and information to a potentially large audience online at greatly reduced expense and time delay. Publishing a web page, a blog, or building a website involves little initial cost and many cost-free services are available. Publishing and maintaining large, professional web sites with attractive, diverse and up-to-date information is still a difficult and expensive proposition, however. Many individuals and some companies and groups use web logs or blogs, which are largely used as easily updatable online diaries. Some commercial organizations encourage staff to communicate advice in their areas of specialization in the hope that visitors will be impressed by the expert knowledge and free information, and be attracted to the corporation as a result. One example of this practice is Microsoft, whose product developers publish their personal blogs in order to pique the public's interest in their work. Collections of personal web pages published by large service providers remain popular, and have become increasingly sophisticated. Whereas operations such as Angelfire and GeoCities have existed since the early days of the Web, newer offerings from, for example, Facebook and MySpace currently have large followings. These operations often brand themselves as social network services rather than simply as web page hosts.

Advertising on popular web pages can be lucrative, and e-commerce or the sale of products and services directly via the Web continues to grow.

When the Web began in the 1990s, a typical web page was stored in completed form on a web server, formatted with HTML, ready to be sent to a user's browser in response to a request. Over time, the process of creating and serving web pages has become more automated and more dynamic. Websites are often created using content management or wiki software with, initially, very little content. Contributors to these systems, who may be paid staff, members of a club or other organization or members of the public, fill underlying databases with content using editing pages designed for that purpose, while casual visitors view and read this content in its final HTML form. There may or may not be editorial, approval and security systems built into the process of taking newly entered content and making it available to the target visitors.

Social impact

The Internet has enabled entirely new forms of social interaction, activities, and organizing, thanks to its basic features such as widespread usability and access. Social networking websites such as Facebook, Twitter and MySpace have created new ways to socialize and interact. Users of these sites are able to add a wide variety of information to pages, to pursue common interests, and to connect with others. It is also possible to find existing acquaintances, to allow communication among existing groups of people. Sites like LinkedIn foster commercial and business connections. YouTube and Flickr specialize in users' videos and photographs.

In the first decade of the 21st century the first generation is raised with widespread availability of Internet connectivity, bringing consequences and concerns in areas such as personal privacy and identity, and distribution of copyrighted materials. These "digital natives" face a variety of challenges that were not present for prior generations.

The Internet has achieved new relevance as a political tool, leading to Internet censorship by some states. The presidential campaign of Howard Dean in 2004 in the United States was notable for its success in soliciting donation via the Internet. Many political groups use the Internet to achieve a new method of organizing in order to carry out their mission, having given rise to Internet activism. Some governments, such as those of Iran, North Korea, Myanmar, the People's Republic of China, and Saudi Arabia, restrict what people in their countries can access on the Internet, especially political and religious content. This is accomplished through software that filters domains and content so that they may not be easily accessed or obtained without elaborate circumvention.

In Norway, Denmark, Finland and Sweden, major Internet service providers have voluntarily, possibly to avoid such an arrangement being turned into law, agreed to restrict access to sites listed by authorities. While this list of forbidden URLs is only supposed to contain addresses of known child pornography sites, the content of the list is secret. Many countries, including the United States, have enacted laws against the possession or distribution of certain material, such as child pornography, via the Internet, but do not mandate filtering software. There are many free and commercially available software programs, called content-control software, with which a user can choose to block offensive websites on individual computers or networks, in order to limit a child's access to pornographic materials or depiction of violence.

The Internet has been a major outlet for leisure activity since its inception, with entertaining social experiments such as MUDs and MOOs being conducted on university servers, and humor-related Usenet groups receiving much traffic. Today, many Internet forums have sections devoted to games and funny videos; short cartoons in the form of Flash movies are also popular. Over 6 million people use blogs or message boards as a means of communication and for the sharing of ideas. The pornography and gambling industries have taken advantage of the World Wide Web, and often provide a significant source of advertising revenue for other websites. Although many governments have attempted to restrict both industries' use of the Internet, this has generally failed to stop their widespread popularity.

One main area of leisure activity on the Internet is multiplayer gaming. This form of recreation creates communities, where people of all ages and origins enjoy the fast-paced world of multiplayer games. These range from MMORPG to first-person shooters, from role-playing video games to online gambling. This has revolutionized the way many people interact while spending their free time on the Internet. While online gaming has been around since the 1970s, modern modes of online gaming began with subscription services such as GameSpy and MPlayer. Non-subscribers were limited to certain types of game play or certain games. Many people use the Internet to access and download music, movies and other works for their enjoyment and relaxation. Free and fee-based services exist for all of these activities, using centralized servers and distributed peer-to-peer technologies. Some of these sources exercise more care with respect to the original artists' copyrights than others.

Many people use the World Wide Web to access news, weather and sports reports, to plan and book vacations and to find out more about their interests. People use chat, messaging and email to make and stay in touch with friends worldwide, sometimes in the same way as some previously had pen pals. The Internet has seen a growing number of Web desktops, where users can access their files and settings via the Internet.

Cyberslacking can become a drain on corporate resources; the average UK employee spent 57 minutes a day surfing the Web while at work, according to a 2003 study by Peninsula Business Services. Internet addiction disorder is excessive computer use that interferes with daily life. Some psychologists believe that Internet use has other effects on individuals for instance interfering with the deep thinking that leads to true creativity.

Structure And Governance

Structure

The Internet structure and its usage characteristics have been studied extensively. It has been determined that both the Internet IP routing structure and hypertext links of the World Wide Web are examples of scale-free networks. Similar to the way the commercial Internet providers connect via Internet exchange points, research networks tend to interconnect into large subnetworks such as GEANT, GLORIAD, Internet2 (successor of the Abilene Network), and the UK's national research and education network JANET. These in turn are built around smaller networks (see also the list of academic computer network organizations).

Many computer scientists describe the Internet as a "prime example of a large-scale, highly engineered, yet highly complex system". The Internet is extremely heterogeneous; for instance, data transfer rates and physical characteristics of connections vary widely. The Internet exhibits "emergent phenomena" that depend on its large-scale organization. For example, data transfer rates exhibit temporal self-similarity. The principles of the routing and addressing methods for traffic in the Internet reach back to their origins the 1960s when the eventual scale and popularity of the network could not be anticipated. Thus, the possibility of developing alternative structures is investigated.

Governance

ICANN headquarters in Marina Del Rey, California, United States

The Internet is a globally distributed network comprising many voluntarily interconnected autonomous networks. It operates without a central governing body. However, to maintain interoperability, all technical and policy aspects of the underlying core infrastructure and the principal name spaces are administered by the Internet Corporation for Assigned Names and Numbers (ICANN), headquartered in Marina del Rey, California. ICANN is the authority that coordinates the assignment of unique identifiers for use on the Internet, including domain names, Internet Protocol (IP) addresses, application port numbers in the transport protocols, and many other parameters. Globally unified name spaces, in which names and numbers are uniquely assigned, are essential for the global reach of the Internet. ICANN is governed by an international board of directors drawn from across the Internet technical, business, academic, and other non-commercial communities. The government of the United States continues to have the primary role in approving changes to the DNS root zone that lies at the heart of the domain name system. ICANN's role in coordinating the assignment of unique identifiers distinguishes it as perhaps the only central coordinating body on the global Internet. On 16 November 2005, the World Summit on the Information Society, held in Tunis, established the Internet Governance Forum (IGF) to discuss Internet-related issues.

Communication on the Internet

Electronic mail, or email, is an important communications service available on the Internet. The concept of sending electronic text messages between parties in a way analogous to mailing letters or memos predates the creation of the Internet. Pictures, documents and other files are sent as email attachments. Emails can be cc-ed to multiple email addresses.

Internet telephony is another common communications service made possible by the creation of the Internet. VoIP stands for Voice-over-Internet Protocol, referring to the protocol that underlies all Internet communication. The idea began in the early 1990s with walkie-talkie-like voice applications for personal computers. In recent years many VoIP systems have become as easy to use and as convenient as a normal telephone. The benefit is that, as the Internet carries the voice traffic, VoIP can be free or cost much less than a traditional telephone call, especially over long distances and especially for those with always-on Internet connections such as cable or ADSL. VoIP is maturing into a competitive alternative to traditional telephone service. Interoperability between different providers has improved and the ability to call or receive a call from a traditional telephone is available. Simple, inexpensive VoIP network adapters are available that eliminate the need for a personal computer.

Voice quality can still vary from call to call but is often equal to and can even exceed that of traditional calls. Remaining problems for VoIP include emergency telephone number dialing and reliability. Currently, a few VoIP providers provide an emergency service, but it is not universally available. Traditional phones are line-powered and operate during a power failure; VoIP does not do so without a backup power source for the phone equipment and the Internet access devices. VoIP has also become increasingly popular for gaming applications, as a form of communication between players. Popular VoIP clients for gaming include Ventrilo and Teamspeak. Wii, PlayStation 3, and Xbox 360 also offer VoIP chat features.

Data transfer

File sharing is an example of transferring large amounts of data across the Internet. A computer file can be emailed to customers, colleagues and friends as an attachment. It can be uploaded to a website or FTP server for easy download by others. It can be put into a "shared location" or onto a file server for instant use by colleagues. The load of bulk downloads to many users can be eased by the use of "mirror" servers or peer-to-peer networks. In any of these cases, access to the file may be controlled by user authentication, the transit of the file over the Internet may be obscured by encryption, and money may change hands for access to the file. The price can be paid by the remote charging of funds from, for example, a credit card whose details are also passed—usually fully encrypted—across the Internet. The origin and authenticity of the file received may be checked by digital signatures or by MD5 or other message digests. These simple features of the Internet, over a worldwide basis, are changing the production, sale, and distribution of anything that can be reduced to a computer file for transmission. This includes all manner of print publications, software products, news, music, film, video, photography, graphics and the other arts. This in turn has caused seismic shifts in each of the existing industries that previously controlled the production and distribution of these products.

Streaming media is the real-time delivery of digital media for the immediate consumption or enjoyment by end users. Many radio and television broadcasters provide Internet feeds of their live audio and video productions. They may also allow time-shift viewing or listening such as Preview, Classic Clips and Listen Again features. These providers have been joined by a range of pure Internet "broadcasters" who never had on-air licenses. This means that an Internet-connected device, such as a computer or something more specific, can be used to access on-line media in much the same way as was previously possible only with a television or radio receiver. The range of available types of content is much wider, from specialized technical webcasts to on-demand popular multimedia services. Podcasting is a variation on this theme, where—usually audio—material is downloaded and played back on a computer or shifted to a portable media player to be listened to on the move. These techniques using simple equipment allow anybody, with little censorship or licensing control, to broadcast audio-visual material worldwide.

Digital media streaming increases the demand for network bandwidth. For example, standard image quality needs 1 Mbps link speed for SD 480p, HD 720p quality requires 2.5 Mbps, and the top-of-the-line HDX quality needs 4.5 Mbps for 1080p.

Webcams are a low-cost extension of this phenomenon. While some webcams can give full-frame-rate video, the picture is usually either small or updates slowly. Internet users can watch animals around an African waterhole, ships in the Panama Canal, traffic at a local roundabout or monitor their own premises, live and in real time. Video chat rooms and video conferencing are also popular with many uses being found for personal webcams, with and without two-way sound. YouTube was founded on 15 February 2005 and is now the leading website for free streaming video with a vast number of users. It uses a flash-based web player to stream and show video files. Registered users may upload an unlimited amount of video and build their own personal profile. YouTube claims that its users watch hundreds of millions, and upload hundreds of thousands of videos daily.

Access

Graph of Internet users per 100 inhabitants between 1997 and 2007 by International Telecommunication Union

The prevalent language for communication on the Internet has been English. This may be a result of the origin of the Internet, as well as the language's role as a lingua franca. Early computer systems were limited to the characters in the American Standard Code for Information Interchange (ASCII), a subset of the Latin alphabet.

After English (27%), the most requested languages on the World Wide Web are Chinese (23%), Spanish (8%), Japanese (5%), Portuguese and German (4% each), Arabic, French and Russian (3% each), and Korean (2%). By region, 42% of the world's Internet users are based in Asia, 24% in Europe, 14% in North America, 10% in Latin America and the Caribbean taken together, 6% in Africa, 3% in the Middle East and 1% in Australia/Oceania. The Internet's technologies have developed enough in recent years, especially in the use of Unicode, that good facilities are available for development and communication in the world's widely used languages. However, some glitches such as mojibake (incorrect display of some languages' characters) still remain.

Common methods of Internet access in homes include dial-up, landline broadband (over coaxial cable, fiber optic or copper wires), Wi-Fi, satellite and 3G/4G technology cell phones. Public places to use the Internet include libraries and Internet cafes, where computers with Internet connections are available. There are also Internet access points in many public places such as airport halls and coffee shops, in some cases just for brief use while standing. Various terms are used, such as "public Internet kiosk", "public access terminal", and "Web payphone". Many hotels now also have public terminals, though these are usually fee-based. These terminals are widely accessed for various usage like ticket booking, bank deposit, online payment etc. Wi-Fi provides wireless access to computer networks, and therefore can do so to the Internet itself. Hotspots providing such access include Wi-Fi cafes, where would-be users need to bring their own wireless-enabled devices such as a laptop or PDA. These services may be free to all, free to customers only, or fee-based. A hotspot need not be limited to a confined location. A whole campus or park, or even an entire city can be enabled. Grassroots efforts have led to wireless community networks. Commercial Wi-Fi services covering large city areas are in place in London, Vienna, Toronto, San Francisco, Philadelphia, Chicago and Pittsburgh. The Internet can then be accessed from such places as a park bench. Apart from Wi-Fi, there have been experiments with proprietary mobile wireless networks like Ricochet, various high-speed data services over cellular phone networks, and fixed wireless services. High-end mobile phones such as smartphones generally come with Internet access through the phone network. Web browsers such as Opera are available on these advanced handsets, which can also run a wide variety of other Internet software. More mobile phones have Internet access than PCs, though this is not as widely used. An Internet access provider and protocol matrix differentiates the methods used to get online.

In contrast, an Internet blackout or outage can be caused by accidental local signaling interruptions. Disruptions of submarine communications cables may cause blackouts or slowdowns to large areas depending on them, such as in the 2008 submarine cable disruption. Internet blackouts of almost entire countries can be achieved by governments, such as with the Internet in Egypt, where approximately 93% of networks were shut down in 2011 in an attempt to stop mobilisation for anti-government protests.

In an American study in 2005, the percentage of men using the Internet was very slightly ahead of the percentage of women, although this difference reversed in those under 30. Men logged on more often, spend more time online, and are more likely to be broadband users, whereas women tended to make more use of opportunities to communicate (such as email). Men were more likely to use the Internet to pay bills, participate in auctions, and for recreation such as downloading music and videos. Men and women were equally likely to use the Internet for shopping and banking. More recent studies indicate that in 2008, women significantly outnumbered men on most social networking sites, such as Facebook and Myspace, although the ratios varied with age. In addition, women watched more streaming content, whereas men downloaded more. In terms of blogs, men were more likely to blog in the first place; among those who blog, men were more likely to have a professional blog, whereas women were more likely to have a personal blog.

Electronic mail, commonly called email, e-mail or e.mail, is a method of exchanging digital messages from an author to one or more recipients. Modern email operates across the Internet or other computer networks. Some early email systems required that the author and the recipient both be online at the same time, a la instant messaging. Today's email systems are based on a store-and-forward model. Email servers accept, forward, deliver and store messages. Neither the users nor their computers are required to be online simultaneously; they need connect only briefly, typically to an email server, for as long as it takes to send or receive messages.

An email message consists of three components, the message envelope, the message header, and the message body. The message header contains control information, including, minimally, an originator's email address and one or more recipient addresses. Usually descriptive information is also added, such as a subject header field and a message submission date/time stamp.

Originally a text only (7 bit ASCII and others) communications medium, email was extended to carry multi-media content attachments, a process standardized in RFC 2045 through 2049. Collectively, these RFCs have come to be called Multipurpose Internet Mail Extensions (MIME).

The history of modern, global Internet email services reaches back to the early ARPANET. Standards for encoding email messages were proposed as early as 1973 (RFC 561). Conversion from ARPANET to the Internet in the early 1980s produced the core of the current services. An email sent in the early 1970s looks quite similar to a basic text message sent on the Internet today.

Network-based email was initially exchanged on the ARPANET in extensions to the File Transfer Protocol (FTP), but is now carried by the Simple Mail Transfer Protocol (SMTP), first published as Internet standard 10 (RFC 821) in 1982. In the process of transporting email messages between systems, SMTP communicates delivery parameters using a message envelope separate from the message (header and body) itself.

Spelling

There are several spelling variations that occasionally prove cause for surprisingly vehement disagreement.

email is the form required by IETF Requests for Comment and working groups and increasingly by style guides. This spelling also appears in most dictionaries.
e-mail is a form recommended by some prominent journalistic and technical style guides. According to Corpus of Contemporary American English data, this form appears most frequently in edited, published American English writing.
mail was the form used in the original RFC. The service is referred to as mail and a single piece of electronic mail is called a message.
eMail, capitalizing only the letter M, was common among ARPANET users and the early developers of Unix, CMS, AppleLink, eWorld, AOL, GEnie, and Hotmail.
EMail is a traditional form that has been used in RFCs for the "Author's Address", and is expressly required "...for historical reasons...".
E-mail, capitalizing the initial letter E in the same way as A-bomb, H-bomb, X-ray, T-shirt, and similar shortenings.

Origin

Electronic mail predates the inception of the Internet, and was in fact a crucial tool in creating it.

MIT first demonstrated the Compatible Time-Sharing System (CTSS) in 1961. It allowed multiple users to log into the IBM 7094 ^[22] from remote dial-up terminals, and to store files online on disk. This new ability encouraged users to share information in new ways. Email started in 1965 as a way for multiple users of a time-sharing mainframe computer to communicate. Among the first systems to have such a facility were SDC's Q32 and MIT's CTSS.

Host-based mail systems

The original email systems allowed communication only between users who logged into the same host or "mainframe". This could be hundreds or even thousands of users within an organization. Examples include MIT's 1965 CTSS MAIL, Larry Breed's 1972 APL Mailbox (which was used by the 1976 Carter/Mondale presidential campaign), the original 1972 Unix mail program, IBM's 1981 PROFS, and Digital Equipment Corporation's 1982 ALL-IN-1,

Homogeneous email networks and LAN-based mail systems

Many early peer-to-peer email networking only worked among computers running the same OS or program. Examples include:

By 1966 or earlier, it is possible that the SAGE system had a limited form of email
1978's uucpand 1980's Usenet provided Unix-to-Unix copying of email, files, and shared fora over dialup modems or leased lines
BITNET in 1981 allowed IBM mainframes to communicate email over leased lines.
FidoNet's 1984 application software for IBM PC's running DOS transferred email and shared bulletin board postings by dialup modem

In the early 1980s, networked personal computers on LANs became increasingly important. Server-based systems similar to the earlier mainframe systems were developed. Again these systems initially allowed communication only between users logged into the same server infrastructure. Eventually these systems could also be linked between different organizations, as long as they ran the same email system and proprietary protocol.

Examples include cc:Mail, Lantastic, WordPerfect Office, Microsoft Mail, Banyan VINES and Lotus Notes - with various vendors supplying gateway software to link these incompatible systems.

Attempts at interoperability

This section needs additional citations for verification.
Please help improve this article by adding reliable references. Unsourced material may be challenged and removed. (August 2010)

Early interoperability among independent systems included:

ARPANET, the forerunner of today's Internet, defined the first protocols for dissimilar computers to exchange email
uucp implementations for non-Unix systems were used as an open "glue" between differing mail systems, primarily over dialup telephones
CSNet used dial-up telephone access to link additional sites to the ARPANET and then Internet

Later efforts at interoperability standardization included:

Novell briefly championed the open MHS protocol but abandoned it after purchasing the non-MHS WordPerfect Office (renamed Groupwise)
The Coloured Book protocols on UK academic networks until 1992
X.400 in the 1980s and early 1990s was promoted by major vendors and mandated for government use under GOSIP but abandoned by all but a few — in favor of Internet SMTP by the mid-1990s.

From SNDMSG to MSG

In the early 1970s, Ray Tomlinson updated an existing utility called SNDMSG so that it could copy messages (as files) over the network. Lawrence Roberts, the project manager for the ARPANET development, took the idea of READMAIL, which dumped all "recent" messages onto the user's terminal, and wrote a program for TENEX in TECO macros called RD which permitted accessing individual messages. Barry Wessler then updated RD and called it NRD.

Marty Yonke combined rewrote NRD to include reading, access to SNMSG for sending, and a help system, and called the utility WRD which was later known as BANANARD. John Vittal then updated this version to include message forwarding and an Answer command that automatically created a reply message with the correct address(es). This was the first email "reply" command; the system was called MSG. With inclusion of these features, MSG is considered to be the first integrated modern email program, from which many other applications have descended.

The rise of ARPANET mail

The ARPANET computer network made a large contribution to the development of email. There is one report that indicates experimental inter-system email transfers began shortly after its creation in 1969. Ray Tomlinson is generally credited as having sent the first email across a network, initiating the use of the "@" sign to separate the names of the user and the user's machine in 1971, when he sent a message from one Digital Equipment Corporation DEC-10 computer to another DEC-10. The two machines were placed next to each other.Tomlinson's work was quickly adopted across the ARPANET, which significantly increased the popularity of email. For many years, email was the killer app of the ARPANET and then the Internet.

Most other networks had their own email protocols and address formats; as the influence of the ARPANET and later the Internet grew, central sites often hosted email gateways that passed mail between the Internet and these other networks. Internet email addressing is still complicated by the need to handle mail destined for these older networks. Some well-known examples of these were UUCP (mostly Unix computers), BITNET (mostly IBM and VAX mainframes at universities), FidoNet (personal computers), DECNET (various networks) and CSNET a forerunner of NSFNet.

An example of an Internet email address that routed mail to a user at a UUCP host:

hubhost!middlehost!edgehost!user@uucpgateway.somedomain.example.com

This was necessary because in early years UUCP computers did not maintain (or consult servers for) information about the location of all hosts they exchanged mail with, but rather only knew how to communicate with a few network neighbors; email messages (and other data such as Usenet News) were passed along in a chain among hosts who had explicitly agreed to share data with each other.

Operation overview

The diagram to the right shows a typical sequence of events that takes place when Alice composes a message using her mail user agent (MUA). She enters the email address of her correspondent, and hits the "send" button

1. Her MUA formats the message in email format and uses the Submission Protocol (a profile of the Simple Mail Transfer Protocol (SMTP), see RFC 4409) to send the message to the local mail submission agent (MSA), in this case smtp.a.org, run by Alice's internet service provider (ISP).

2. The MSA looks at the destination address provided in the SMTP protocol (not from the message header), in this case bob@b.org. An Internet email address is a string of the form localpart@exampledomain. The part before the @ sign is the local part of the address, often the username of the recipient, and the part after the @ sign is a domain name or a fully qualified domain name. The MSA resolves a domain name to determine the fully qualified domain name of the mail exchange server in the Domain Name System (DNS).

3. The DNS server for the b.org domain, ns.b.org, responds with any MX records listing the mail exchange servers for that domain, in this case mx.b.org, a message transfer agent (MTA) server run by Bob's ISP.

4. smtp.a.org sends the message to mx.b.org using SMTP.

This server may need to forward the message to other MTAs before the message reaches the final message delivery agent (MDA).

1. The MDA delivers it to the mailbox of the user bob.

2. Bob presses the "get mail" button in his MUA, which picks up the message using either the Post Office Protocol (POP3) or the Internet Message Access Protocol (IMAP4).

That sequence of events applies to the majority of email users. However, there are many alternative possibilities and complications to the email system:

Alice or Bob may use a client connected to a corporate email system, such as IBM Lotus Notes or Microsoft Exchange. These systems often have their own internal email format and their clients typically communicate with the email server using a vendor-specific, proprietary protocol. The server sends or receives email via the Internet through the product's Internet mail gateway which also does any necessary reformatting. If Alice and Bob work for the same company, the entire transaction may happen completely within a single corporate email system.
Alice may not have a MUA on her computer but instead may connect to a webmail service.
Alice's computer may run its own MTA, so avoiding the transfer at step 1.
Bob may pick up his email in many ways, for example logging into mx.b.org and reading it directly, or by using a webmail service.
Domains usually have several mail exchange servers so that they can continue to accept mail when the main mail exchange server is not available.
Email messages are not secure if email encryption is not used correctly.

Many MTAs used to accept messages for any recipient on the Internet and do their best to deliver them. Such MTAs are called open mail relays. This was very important in the early days of the Internet when network connections were unreliable. If an MTA couldn't reach the destination, it could at least deliver it to a relay closer to the destination. The relay stood a better chance of delivering the message at a later time. However, this mechanism proved to be exploitable by people sending unsolicited bulk email and as a consequence very few modern MTAs are open mail relays, and many MTAs don't accept messages from open mail relays because such messages are very likely to be spam.

Message format

The Internet email message format is defined in RFC 5322, with multi-media content attachments being defined in RFC 2045 through RFC 2049, collectively called Multipurpose Internet Mail Extensions or MIME. Prior to the introduction of RFC 2822 in 2001, the format described by RFC 822 was the standard for Internet email for nearly 20 years. RFC 822 was published in 1982 and based on the earlier RFC 733 for the ARPANET (see).

Internet email messages consist of two major sections:

Header — Structured into fields such as From, To, CC, Subject, Date, and other information about the email.
Body — The basic content, as unstructured text; sometimes containing a signature block at the end. This is exactly the same as the body of a regular letter.

The header is separated from the body by a blank line.

Message header

Each message has exactly one header, which is structured into fields. Each field has a name and a value. RFC 5322 specifies the precise syntax.

Informally, each line of text in the header that begins with a printable character begins a separate field. The field name starts in the first character of the line and ends before the separator character ":". The separator is then followed by the field value (the "body" of the field). The value is continued onto subsequent lines if those lines have a space or tab as their first character. Field names and values are restricted to 7-bit ASCII characters. Non-ASCII values may be represented using MIME encoded words.

Header fields

The message header must include at least the following fields:

From: The email address, and optionally the name of the author(s). In many email clients not changeable except through changing account settings.
Date: The local time and date when the message was written. Like the From: field, many email clients fill this in automatically when sending. The recipient's client may then display the time in the format and time zone local to him/her.

The message header should include at least the following fields:

Message-ID: Also an automatically generated field; used to prevent multiple delivery and for reference in In-Reply-To: (see below).
In-Reply-To: Message-ID of the message that this is a reply to. Used to link related messages together. This field only applies for reply messages.

RFC 3864 describes registration procedures for message header fields at the IANA; it provides for permanent and provisional message header field names, including also fields defined for MIME, netnews, and http, and referencing relevant RFCs. Common header fields for email include:

To: The email address(es), and optionally name(s) of the message's recipient(s). Indicates primary recipients (multiple allowed), for secondary recipients see Cc: and Bcc: below.
Subject: A brief summary of the topic of the message. Certain abbreviations are commonly used in the subject, including "RE:" and "FW:".
Bcc: Blind Carbon Copy; addresses added to the SMTP delivery list but not (usually) listed in the message data, remaining invisible to other recipients.
Cc: Carbon copy; Many email clients will mark email in your inbox differently depending on whether you are in the To: or Cc: list.
Content-Type: Information about how the message is to be displayed, usually a MIME type.
Precedence: commonly with values "bulk", "junk", or "list"; used to indicate that automated "vacation" or "out of office" responses should not be returned for this mail, e.g. to prevent vacation notices from being sent to all other subscribers of a mailinglist. Sendmail uses this header to affect prioritization of queued email, with "Precedence: special-delivery" messages delivered sooner. With modern high-bandwidth networks delivery priority is less of an issue than it once was. Microsoft Exchange respects a fine-grained automatic response suppression mechanism, the X-Auto-Response-Suppress header.
Received: Tracking information generated by mail servers that have previously handled a message, in reverse order (last handler first).
References: Message-ID of the message that this is a reply to, and the message-id of the message the previous was reply a reply to, etc.
Reply-To: Address that should be used to reply to the message.
Sender: Address of the actual sender acting on behalf of the author listed in the From: field (secretary, list manager, etc.).

Note that the To: field is not necessarily related to the addresses to which the message is delivered. The actual delivery list is supplied separately to the transport protocol, SMTP, which may or may not originally have been extracted from the header content. The "To:" field is similar to the addressing at the top of a conventional letter which is delivered according to the address on the outer envelope. Also note that the "From:" field does not have to be the real sender of the email message. One reason is that it is very easy to fake the "From:" field and let a message seem to be from any mail address. It is possible to digitally sign email, which is much harder to fake, but such signatures require extra programming and often external programs to verify. Some ISPs do not relay email claiming to come from a domain not hosted by them, but very few (if any) check to make sure that the person or even email address named in the "From:" field is the one associated with the connection. Some ISPs apply email authentication systems to email being sent through their MTA to allow other MTAs to detect forged spam that might appear to come from them.

Recently the IETF EAI working group has defined some experimental extensions to allow Unicode characters to be used within the header. In particular, this allows email addresses to use non-ASCII characters. Such characters must only be used by servers that support these extensions.

Message body

Content encoding

Email was originally designed for 7-bit ASCII.^[38] Much email software is 8-bit clean but must assume it will communicate with 7-bit servers and mail readers. The MIME standard introduced character set specifiers and two content transfer encodings to enable transmission of non-ASCII data: quoted printable for mostly 7 bit content with a few characters outside that range and base64 for arbitrary binary data. The 8BITMIME and BINARY extensions were introduced to allow transmission of mail without the need for these encodings, but many mail transport agents still do not support them fully. In some countries, several encoding schemes coexist; as the result, by default, the message in a non-Latin alphabet language appears in non-readable form (the only exception is coincidence, when the sender and receiver use the same encoding scheme). Therefore, for international character sets, Unicode is growing in popularity.

Plain text and HTML

Most modern graphic email clients allow the use of either plain text or HTML for the message body at the option of the user. HTML email messages often include an automatically generated plain text copy as well, for compatibility reasons.

Advantages of HTML include the ability to include in-line links and images, set apart previous messages in block quotes, wrap naturally on any display, use emphasis such as underlines and italics, and change font styles. Disadvantages include the increased size of the email, privacy concerns about web bugs, abuse of HTML email as a vector for phishing attacks and the spread of malicious software.

Some web based Mailing lists recommend that all posts be made in plain-text, with 72 or 80 characters per line for all the above reasons, but also because they have a significant number of readers using text-based email clients such as Mutt.

Some Microsoft email clients allow rich formatting using RTF, but unless the recipient is guaranteed to have a compatible email client this should be avoided.

In order to ensure that HTML sent in an email is rendered properly by the recipient's client software, an additional header must be specified when sending: "Content-type: text/html". Most email programs send this header automatically.

Servers and client applications

The interface of an email client, Thunderbird.

Messages are exchanged between hosts using the Simple Mail Transfer Protocol with software programs called mail transfer agents. Users can retrieve their messages from servers using standard protocols such as POP or IMAP, or, as is more likely in a large corporate environment, with a proprietary protocol specific to Lotus Notes or Microsoft Exchange Servers. Webmail interfaces allow users to access their mail with any standard web browser, from any computer, rather than relying on an email client.
Mail can be stored on the client, on the server side, or in both places. Standard formats for mailboxes include Maildir and mbox. Several prominent email clients use their own proprietary format and require conversion software to transfer email between them.
Accepting a message obliges an MTA to deliver it, and when a message cannot be delivered, that MTA must send a bounce message back to the sender, indicating the problem.

Filename extensions

Upon reception of email messages, email client applications save message in operating system files in the file-system. Some clients save individual messages as separate files, while others use various database formats, often proprietary, for collective storage. A historical standard of storage is the mbox format. The specific format used is often indicated by special filename extensions:

eml

Used by many email clients including Microsoft Outlook Express, Windows Mail and Mozilla Thunderbird.The files are plain text in MIME format, containing the email header as well as the message contents and attachments in one or more of several formats.

emlx

Used by Apple Mail.

msg

Used by Microsoft Office Outlook and OfficeLogic Groupware.

mbx

Used by Opera Mail, KMail, and Apple Mail based on the mbox format.

Some applications (like Apple Mail) leave attachments encoded in messages for searching while also saving separate copies of the attachments. Others separate attachments from messages and save them in a specific directory.

URI scheme mailto:

The URI scheme, as registered with the IANA, defines the mailto: scheme for SMTP email addresses. Though its use is not strictly defined, URLs of this form are intended to be used to open the new message window of the user's mail client when the URL is activated, with the address as defined by the URL in the To: field.

In society

There are numerous ways in which people have changed the way they communicate in the last 50 years; email is certainly one of them. Traditionally, social interaction in the local community was the basis for communication – face to face. Yet, today face-to-face meetings are no longer the primary way to communicate as one can use a landline telephone, mobile phones, fax services, or any number of the computer mediated communications such as email.

Flaming

Flaming occurs when a person sends a message with angry or antagonistic content. Flaming is assumed to be more common today because of the ease and impersonality of email communications: confrontations in person or via telephone require direct interaction, where social norms encourage civility, whereas typing a message to another person is an indirect interaction, so civility may be forgotten. Flaming is generally looked down upon by Internet communities as it is considered rude and non-productive.

Email bankruptcy

Also known as "email fatigue", email bankruptcy is when a user ignores a large number of email messages after falling behind in reading and answering them. The reason for falling behind is often due to information overload and a general sense there is so much information that it is not possible to read it all. As a solution, people occasionally send a boilerplate message explaining that the email inbox is being cleared out. Harvard University law professor Lawrence Lessig is credited with coining this term, but he may only have popularized it.

In business

Email was widely accepted by the business community as the first broad electronic communication medium and was the first ‘e-revolution’ in business communication. Email is very simple to understand and like postal mail, email solves two basic problems of communication: logistics and synchronization (see below).

LAN based email is also an emerging form of usage for business. It not only allows the business user to download mail when offline, it also provides the small business user to have multiple users email ID's with just one email connection.

Pros

The problem of logistics: Much of the business world relies upon communications between people who are not physically in the same building, area or even country; setting up and attending an in-person meeting, telephone call, or conference call can be inconvenient, time-consuming, and costly. Email provides a way to exchange information between two or more people with no set-up costs and that is generally far less expensive than physical meetings or phone calls.
The problem of synchronisation: With real time communication by meetings or phone calls, participants have to work on the same schedule, and each participant must spend the same amount of time in the meeting or call. Email allows asynchrony: each participant may control their schedule independently.

Cons

Most business workers today spend from one to two hours of their working day on email: reading, ordering, sorting, ‘re-contextualizing’ fragmented information, and writing email. The use of email is increasing due to increasing levels of globalisation—labour division and outsourcing amongst other things. Email can lead to some well-known problems:

Loss of context: which means that the context is lost forever; there is no way to get the text back. Information in context (as in a newspaper) is much easier and faster to understand than unedited and sometimes unrelated fragments of information. Communicating in context can only be achieved when both parties have a full understanding of the context and issue in question.
Information overload: Email is a push technology—the sender controls who receives the information. Convenient availability of mailing lists and use of "copy all" can lead to people receiving unwanted or irrelevant information of no use to them.
Inconsistency: Email can duplicate information. This can be a problem when a large team is working on documents and information while not in constant contact with the other members of their team.
Liability. Statements made in an email can be deemed legally binding and be used against a party in a Court of law.

Despite these disadvantages, email has become the most widely used medium of communication within the business world. In fact, a 2010 study on workplace communication, found that 83% of U.S. knowledge workers felt that email was critical to their success and productivity at work.

Attachment size limitation

Email messages may have one or more attachments. Attachments serve the purpose of delivering binary or text files of unspecified size. In principle there is no technical intrinsic restriction in the SMTP protocol limiting the size or number of attachments. In practice, however, email service providers implement various limitations on the permissible size of files or the size of an entire message.

Furthermore, due to technical reasons, often a small attachment can increase in size when sent, which can be confusing to senders when trying to assess whether they can or cannot send a file by email, and this can result in their message being rejected.

As larger and larger file sizes are being created and traded, many users are either forced to upload and download their files using an FTP server, or more popularly, use online file sharing facilities or services, usually over web-friendly HTTP, in order to send and receive them.

Information overload

A December 2007 New York Times blog post described information overload as "a $650 Billion Drag on the Economy", and the New York Times reported in April 2008 that "E-MAIL has become the bane of some people’s professional lives" due to information overload, yet "none of the current wave of high-profile Internet start-ups focused on e-mail really eliminates the problem of e-mail overload because none helps us prepare replies". GigaOm posted a similar article in September 2010, highlighting research that found 57% of knowledge workers were overwhelmed by the volume of email they received.

Technology investors reflect similar concerns.

Spamming and computer viruses

The usefulness of email is being threatened by four phenomena: email bombardment, spamming, phishing, and email worms.

Spamming is unsolicited commercial (or bulk) email. Because of the very low cost of sending email, spammers can send hundreds of millions of email messages each day over an inexpensive Internet connection. Hundreds of active spammers sending this volume of mail results in information overload for many computer users who receive voluminous unsolicited email each day.

Email worms use email as a way of replicating themselves into vulnerable computers. Although the first email worm affected UNIX computers, the problem is most common today on the more popular Microsoft Windows operating system.

The combination of spam and worm programs results in users receiving a constant drizzle of junk email, which reduces the usefulness of email as a practical tool.

A number of anti-spam techniques mitigate the impact of spam. In the United States, U.S. Congress has also passed a law, the Can Spam Act of 2003, attempting to regulate such email. Australia also has very strict spam laws restricting the sending of spam from an Australian ISP, but its impact has been minimal since most spam comes from regimes that seem reluctant to regulate the sending of spam.

Email spoofing

Email spoofing occurs when the header information of an email is altered to make the message appear to come from a known or trusted source. It is often used as a ruse to collect personal information.

Email bombing

Email bombing is the intentional sending of large volumes of messages to a target address. The overloading of the target email address can render it unusable and can even cause the mail server to crash.

Privacy concerns

Today it can be important to distinguish between Internet and internal email systems. Internet email may travel and be stored on networks and computers without the sender's or the recipient's control. During the transit time it is possible that third parties read or even modify the content. Internal mail systems, in which the information never leaves the organizational network, may be more secure, although information technology personnel and others whose function may involve monitoring or managing may be accessing the email of other employees.

Email privacy, without some security precautions, can be compromised because:

email messages are generally not encrypted.
email messages have to go through intermediate computers before reaching their destination, meaning it is relatively easy for others to intercept and read messages.
many Internet Service Providers (ISP) store copies of email messages on their mail servers before they are delivered. The backups of these can remain for up to several months on their server, despite deletion from the mailbox.
the "Received:"-fields and other information in the email can often identify the sender, preventing anonymous communication.

There are cryptography applications that can serve as a remedy to one or more of the above. For example, Virtual Private Networks or the Tor anonymity network can be used to encrypt traffic from the user machine to a safer network while GPG, PGP, SMEmail, or S/MIME can be used for end-to-end message encryption, and SMTP STARTTLS or SMTP over Transport Layer Security/Secure Sockets Layer can be used to encrypt communications for a single mail hop between the SMTP client and the SMTP server.

Additionally, many mail user agents do not protect logins and passwords, making them easy to intercept by an attacker. Encrypted authentication schemes such as SASL prevent this.

Finally, attached files share many of the same hazards as those found in peer-to-peer filesharing. Attached files may contain trojans or viruses.

Tracking of sent mail

The original SMTP mail service provides limited mechanisms for tracking a transmitted message, and none for verifying that it has been delivered or read. It requires that each mail server must either deliver it onward or return a failure notice (bounce message), but both software bugs and system failures can cause messages to be lost. To remedy this, the IETF introduced Delivery Status Notifications (delivery receipts) and Message Disposition Notifications (return receipts); however, these are not universally deployed in production. (A complete Message Tracking mechanism was also defined, but it never gained traction; see RFCs 3885 through 3888.)

Many ISPs now deliberately disable non-delivery reports (NDRs) and delivery receipts due to the activities of spammers:

Delivery Reports can be used to verify whether an address exists and so is available to be spammed
If the spammer uses a forged sender email address (E-mail spoofing), then the innocent email address that was used can be flooded with NDRs from the many invalid email addresses the spammer may have attempted to mail. These NDRs then constitute spam from the ISP to the innocent user

There are a number of systems that allow the sender to see if messages have been opened.The receiver could also let the sender know that the emails have been opened through an "Okay" button. A check sign can appear in the sender's screen when the receiver's "Okay" button is pressed.

US Government

The US Government has been involved in email in several different ways.

Starting in 1977, the US Postal Service (USPS) recognized that electronic mail and electronic transactions posed a significant threat to First Class mail volumes and revenue. Therefore, the USPS initiated an experimental email service known as E-COM. Electronic messages were transmitted to a post office, printed out, and delivered as hard copy. To take advantage of the service, an individual had to transmit at least 200 messages. The delivery time of the messages was the same as First Class mail and cost 26 cents. Both the Postal Regulatory Commission and the Federal Communications Commission opposed E-COM. The FCC concluded that E-COM constituted common carriage under its jurisdiction and the USPS would have to file a tariff. Three years after initiating the service, USPS canceled E-COM and attempted to sell it off.

The early ARPANET dealt with multiple email clients that had various, and at times incompatible, formats. For example, in the system Multics, the "@" sign meant "kill line" and anything after the "@" sign was ignored. The Department of Defense DARPA desired to have uniformity and interoperability for email and therefore funded efforts to drive towards unified inter-operable standards. This led to David Crocker, John Vittal, Kenneth Pogran, and Austin Henderson publishing RFC 733, "Standard for the Format of ARPA Network Text Message" (November 21, 1977), which was apparently not effective. In 1979, a meeting was held at BBN to resolve incompatibility issues. Jon Postel recounted the meeting in RFC 808, "Summary of Computer Mail Services Meeting Held at BBN on 10 January 1979" (March 1, 1982), which includes an appendix listing the varying email systems at the time. This, in turn, lead to the release of David Crocker's RFC 822, "Standard for the Format of ARPA Internet Text Messages" (August 13, 1982).

The National Science Foundation took over operations of the ARPANET and Internet from the Department of Defense, and initiated NSFNet, a new backbone for the network. A part of the NSFNet AUP forbade commercial traffic. In 1988, Vint Cerf arranged for an interconnection of MCI Mail with NSFNET on an experimental basis. The following year Compuserve email interconnected with NSFNET. Within a few years the commercial traffic restriction was removed from NSFNETs AUP, and NSFNET was privatised.

In the late 1990s, the Federal Trade Commission grew concerned with fraud transpiring in email, and initiated a series of procedures on spam, fraud, and phishing. In 2004, FTC jurisdiction over spam was codified into law in the form of the CAN SPAM Act.Several other US Federal Agencies have also exercised jurisdiction including the Department of Justice and the Secret Service.

Nanova Computer World

School My Project- A/L 2011-Part 2