Tuesday, February 12, 2008
ALL ABOUT TORRENTS AND FILE SHARING METHODS
BitTorrent is a peer-to-peer file sharing (P2P) communications protocol. BitTorrent is a method of distributing large amounts of data widely without the original distributor incurring the entire costs of hardware, hosting and bandwidth resources. Instead, when data is distributed using the BitTorrent protocol, each recipient supplies pieces of the data to newer recipients, reducing the cost and burden on any given individual source, providing redundancy against system problems, and reducing dependence on the original distributor. The protocol is the brainchild of programmer Bram Cohen, who designed it in April 2001 and released a first implementation on 2 July 2001. It is now maintained by Cohen's company BitTorrent, Inc. Usage of the protocol accounts for significant traffic on the Internet, but the precise amount has proven difficult to measure. There are numerous compatible BitTorrent clients, written in a variety of programming languages, and running on a variety of computing platforms. Creating and publishing torrents The peer distributing a data file treats the file as a number of identically-sized pieces, typically between 64 kB and 1 MB each. A piece with size greater than 512 kB will reduce the size of a torrent file for a very large payload, but is claimed to reduce the efficiency of the protocol . The peer creates a checksum for each piece, using the SHA1 hashing algorithm, and records it in the torrent file. When another peer later receives that piece, the checksum of the piece is compared to the recorded checksum to test that the piece is error-free. Peers that provide a complete file are called seeders, and the peer providing the initial copy is called the initial seeder. The exact information contained in the torrent file depends on the version of the BitTorrent protocol. By convention, the name of a torrent file has the suffix .torrent. Torrent files have an "announce" section, which specifies the URL of the tracker, and an "info" section, containing (suggested) names for the files, their lengths, the piece length used, and a SHA-1 hash code for each piece, all of which is used by clients to verify the integrity of the data they receive. Completed torrent files are typically published on websites or elsewhere, and registered with a tracker. The tracker maintains lists of the clients currently participating in the torrent. Alternatively, in a trackerless system (decentralized tracking) every peer acts as a tracker. This is implemented by the BitTorrent, µTorrent, BitComet, KTorrent and Deluge clients through the distributed hash table (DHT) method. Azureus also supports a trackerless method that is incompatible (as of April 2007) with the DHT offered by all other supporting clients. In November 2006, BitTorrent Inc. introduced its "Publish Torrent" service, which creates and hosts a torrent file (seeded from an existing web-hosted media file) and tracks the downloads. The service (http://www.bittorrent.com/publish) requires a client that supports web-seeding (currently the official client, Azureus, µTorrent and anything based on Libtorrent).  Downloading torrents and sharing files Users browse the web to find a torrent of interest, download it, and open it with a BitTorrent client. The client connects to the tracker(s) specified in the torrent file, from which it receives a list of peers currently transferring pieces of the file(s) specified in the torrent. The client connects to those peers to obtain the various pieces. Such a group of peers connected to each other to share a torrent is called a swarm. If the swarm contains only the initial seeder, the client connects directly to it and begins to request pieces. As peers enter the swarm, they begin to trade pieces with one another, instead of downloading directly from the seeder. Clients incorporate mechanisms to optimize their download and upload rates; for example they download pieces in a random order to increase the opportunity to exchange data, which is only possible if two peers have different pieces of the file. The effectiveness of this data exchange depends largely on the policies that clients use to determine to whom to send data. Clients may prefer to send data to peers that send data back to them (a tit for tat scheme), which encourages fair trading. But strict policies often result in suboptimal situations; e.g., when newly joined peers are unable to receive any data because they don't have any pieces yet to trade themselves or when two peers with a good connection between them do not exchange data simply because neither of them wants to take the initiative. To counter these effects, the official BitTorrent client program uses a mechanism called “optimistic unchoking,” where the client reserves a portion of its available bandwidth for sending pieces to random peers (not necessarily known-good partners, so called preferred peers), in hopes of discovering even better partners and to ensure that newcomers get a chance to join the swarm.  Adoption A growing number of individuals and organizations are using BitTorrent to distribute their own or licensed material. Independent adopters report that without using BitTorrent technology, and its dramatically reduced demands on networking hardware and bandwidth, they could not afford to distribute their files.  Film, video and music BitTorrent Inc. has amassed a number of licenses from Hollywood studios for distributing popular content at the company's website. Sub Pop Records releases tracks and videos via BitTorrent Inc. to distribute its 1000+ albums. The band Ween uses the website Browntracker.net to distribute free audio and video recordings of live shows. Furthermore, Babyshambles and The Libertines (both bands associated with Pete Doherty) have extensively used torrents to distribute hundreds of demos and live videos. The creator of the BitTorrent protocol, Bram Cohen, at one time worked for Valve Software. Valve uses the BitTorrent protocol in their Steam media streaming frontend. Podcasting software is starting to integrate BitTorrent to help podcasters deal with the download demands of their MP3 "radio" programs. Specifically, Juice and Miro (formerly known as Democracy Player) support automatic processing of .torrent files from RSS feeds. Similarly, some BitTorrent clients, such as µTorrent, are able to process web feeds and automatically download content found within them.  Personal material The Amazon S3 "Simple Storage Service" is a scalable Internet-based storage service with a simple web service interface, equipped with built-in BitTorrent support. Blog Torrent offers a simplified BitTorrent tracker to enable bloggers and non-technical users to host a tracker on their site. Blog Torrent also allows visitors to download a "stub" loader, which acts as a BitTorrent client to download the desired file, allowing users without BitTorrent software to use the protocol. This is similar to the concept of a self-extracting archive.  Software Many major open source and free software projects encourage BitTorrent as well as conventional downloads of their products to increase availability and reduce load on their own servers.  Games Blizzard's World of Warcraft video game utilizes the BitTorrent protocol to send game updates to clients. The game GunZ The Duel has a built-in BitTorrent client.  Network impact CableLabs, the research organization of the North American cable industry, estimates that BitTorrent represents 18% of all broadband traffic. In 2004, CacheLogic put that number at roughly 35% of all traffic on the Internet. The discrepancies in these numbers are caused by differences in the methodology used to measure P2P traffic on the Internet. Routers that use NAT, Network Address Translation, must maintain tables of source and destination IP addresses and ports. Typical home routers are limited to about 2000 table entries while some more expensive routers have larger table capacities. BitTorrent frequently contacts 300-500 servers per second rapidly filling the NAT tables. This is a common cause of home routers locking up.  Indexing The BitTorrent protocol provides no way to index torrent files. As a result, a comparatively small number of websites have hosted the large majority of torrents linking to (possibly) copyrighted material, rendering those sites especially vulnerable to lawsuits. Several types of websites support the discovery and distribution of data on the BitTorrent network. Public tracker sites such as The Pirate Bay allow users to search in and download from their collection of .torrent files; they also run BitTorrent trackers for those files. Users can typically also upload .torrent files for content they wish to distribute. Private tracker sites such as Demonoid operate like public ones except that they restrict access to registered users and keep track of the amount of data each user uploads and downloads, in an attempt to reduce leeching. There are specialized tracker sites such as FlixFlux for films, bitme for educational content, PureTnA for pornographic content, and tv torrents for television series. Often these will also be private. Search engines allow the discovery of .torrent files that are hosted and tracked on other sites; examples include Mininova, Btjunkie, TorrentSpy and isoHunt. These sites allow the user to ask for content meeting specific criteria (such as containing a given word or phrase) and retrieve a list of links to .torrent files matching those criteria. This list is often sorted with respect to relevance or number of seeders. Bram Cohen launched a BitTorrent search engine on http://search.bittorrent.com that commingles licensed content with search results. Metasearch engines allow to search several BitTorrent indices and search engines at once.  Legal issues Main article: Legal issues with BitTorrent There has been much controversy over the use of BitTorrent trackers. Strictly speaking, BitTorrent metafiles do not store copyrighted data, hence the technology itself does not constitute copyright infringement. Technically, the use of BitTorent is not illegal. Various jurisdictions have pursued legal action against websites that host BitTorrent trackers. High-profile examples include the closing of Suprnova.org, LokiTorrent, Demonoid, OiNK.cd and EliteTorrents.org. The Pirate Bay torrent website, formed by a Swedish anti-copyright group, is notorious for the "legal" section of its website in which letters and replies on the subject of alleged copyright infringements are publicly displayed. On May 31, 2006, The Pirate Bay's servers in Sweden were raided by Swedish police on allegations by the MPAA of copyright infringement; however, the tracker was up and running again three days later. HBO, in an effort to combat the distribution of its programming on BitTorrent networks, has sent cease and desist letters to the Internet Service Providers of BitTorrent users. Many users have reported receiving letters from their ISPs that threatened to cut off their internet service if the alleged infringement continues. HBO, unlike the RIAA, has not been reported to have filed suit against anyone for sharing files as of April 2007. In 2005 HBO began "poisoning" torrents of its show Rome, by providing bad chunks of data to clients.  On November 23, 2005, the movie industry and BitTorrent Inc. CEO Bram Cohen, signed a deal they hoped would reduce the number of unlicensed copies available through bittorrent.com's search engine, run by BitTorrent, Inc. It meant BitTorrent.com had to remove any links to unlicensed copies of films made by seven of Hollywood's major movie studios. There are two major differences between BitTorrent and many other peer-to-peer file-trading systems, which advocates suggest make it less useful to those sharing copyrighted material without authorization. First, BitTorrent itself does not offer a search facility to find files by name. A user must find the initial torrent file by other means, such as a web search. Second, BitTorrent makes no attempt to conceal the host ultimately responsible for facilitating the sharing: a person who wishes to make a file available must run a tracker on a specific host or hosts and distribute the tracker address(es) in the .torrent file. Because it is possible to operate a tracker on a server that is located in a jurisdiction where the copyright holder cannot take legal action, the protocol does offer some vulnerability that other protocols lack. It is far easier to request that the server's ISP shut down the site than it is to find and identify every user sharing a file on a peer-to-peer network. However, with the use of a distributed hash table (DHT), trackers are no longer required, though often used for client software that does not support DHT to connect to the stream.  Limitations and security vulnerabilities BitTorrent does not offer its users anonymity. It is possible to obtain the IP addresses of all current, and possibly previous, participants in a swarm from the tracker. This may expose users with insecure systems to attacks. Another drawback is that BitTorrent file sharers, compared to users of client/server technology, often have little incentive to become seeders after they finish downloading. The result of this is that torrent swarms gradually die out, meaning a lower possibility of obtaining older torrents. Some BitTorrent websites have attempted to address this by recording each user's download and upload ratio for all or just the user to see, as well as the provision of access to newer torrent files to people with better ratios. Also, users who have low upload ratios may see slower download speeds until they upload more. This prevents (statistical) leeching, since after a while they become unable to download much faster than 1-10 kB/s on a high-speed connection. Some trackers exempt dial-up users from this policy, because they cannot upload faster than 1-3 kB/s. BitTorrent is best suited to continuously connected broadband environments, since dial-up users find it less efficient due to frequent disconnects and slow download rates.  Technologies built on BitTorrent The BitTorrent protocol is still under development and therefore may still acquire new features and other enhancements such as improved efficiency.  Distributed trackers In May 2005, BitTorrent, Inc. released a new beta version of BitTorrent that eliminated the need for web site hosting of centralized servers known as "trackers." It is now possible to have a torrent up in minutes, with a file, a website, and no understanding of how it works. Cohen explained that the "trackerless" feature is part of his ongoing effort to make publishing files online "painless and disruptively cheap". The move is only one of several designed to remove BitTorrent's dependence on centralized trackers. In June 2005, software version 4.2.0 was released, supporting "trackerless" torrents, featured a DHT implementation that allows the client to download torrents that have been created without using a BitTorrent tracker. BitTorrent Mainline DHT: BitTorrent client (5.0.7), µTorrent (1.7.5), BitComet (0.96), and BitSpirit (3.0+) all share DHT which is based on an implementation of the Kademlia DHT, for trackerless torrents. This change is said to cause some trouble in the legal efforts to shut down illegal file sharing. However, Tarun Sawney, BSA Asia anti-copyright infringement director, said BitTorrent files could still be identified, since with or without the tracker sites, actual users still host the infringing files. Another interesting idea that has surfaced recently in Azureus is virtual torrent. This idea is based on the distributed tracker approach and is used to describe some web resource. Right now, it is used for instant messaging. It is implemented using a special messaging protocol and requires an appropriate plugin. Anatomic P2P is another approach, which uses a decentralized network of nodes that route traffic to dynamic trackers. Peer exchange is another method to gather peers for BitTorrent in addition to trackers and DHT. Peer exchange checks with known peers to see if they know of any other peers.  Content delivery Web seeding was implemented in 2006. The advantage of this feature is that a site may distribute a torrent for a particular file or batch of files and make those files available for download from that same web server; this can simplify seeding and load balancing greatly once support for this feature is implemented in the various BitTorrent clients. In theory, this would make using BitTorrent almost as easy for a web publisher as simply creating a direct download while allowing some of the upload bandwidth demands to be placed upon the downloaders (who normally use only a very small portion of their upload bandwidth capacity). This feature was created by John "TheSHAD0W" Hoffman, who created BitTornado.. From version 5.0 onward the Mainline BitTorrent client also supports web seeds and the BitTorrent web site has a simple publishing tool that creates web seeded torrents. µTorrent added support for web seeds in version 1.7. The latest version of the popular download manager GetRight supports downloading a file from both HTTP/FTP protocols and using BitTorrent. Broadcatching combines RSS with the BitTorrent protocol to create a content delivery system, further simplifying and automating content distribution. Steve Gillmor explained the concept in a column for Ziff-Davis in December, 2003. The discussion spread quickly among bloggers (Techdirt, Ernest Miller, Chris Pirillo, etc.). In an article entitled Broadcatching with BitTorrent, Scott Raymond explained: I want RSS feeds of BitTorrent files. A script would periodically check the feed for new items, and use them to start the download. Then, I could find a trusted publisher of an Alias RSS feed, and 'subscribe' to all new episodes of the show, which would then start downloading automatically — like the 'season pass' feature of the TiVo. — The RSS feed will track the content, while BitTorrent ensures content integrity with cryptographic hashing of all data, so subscribers to a feed receive uncorrupted content. An early implementor of this approach is the IPTV show mariposaHD, which uses BitTorrent to distribute large (2-4 GB) WMVHD files of high-definition video. One of the first software clients (free and open source) for broadcatching is Miro. Other free software clients such as PenguinTV and KatchTV are also now supporting broadcatching. The BitTorrent web-service MoveDigital has the ability to make torrents available to any web application capable of parsing XML through its standard Representational State Transfer (REST) based interface. Additionally, Torrenthut is developing a similar torrent API that will provide the same features, as well as further intuition to help bring the torrent community to Web 2.0 standards. Alongside this release is a first PHP application built using the API called PEP, which will parse any Really Simple Syndication (RSS 2.0) feed and automatically create and seed a torrent for each enclosure found in that feed.  Encryption Main article: BitTorrent protocol encryption Some ISPs throttle BitTorrent traffic of their customers because it makes up a large proportion of total traffic and the ISPs don't want to spend money purchasing extra capacity. Protocol header encrypt (PHE) and Message stream encryption/Protocol encryption (MSE/PE) are features of some BitTorrent clients that attempt to make BitTorrent hard to detect and throttle. At the moment Azureus, Bitcomet, KTorrent, Transmission, Deluge, µTorrent, rtorrent and the latest official BitTorrent client (v6) support MSE/PE encryption. In September 2006 it was reported that some software could detect and throttle BitTorrent traffic masquerading as HTTP traffic. Reports in August 2007 indicated that Comcast was preventing BitTorrent seeding by monitoring and interfering with the communication between peers. Protection against these efforts is provided by proxying the client-tracker traffic through the Tor anonymity network or, via an encrypted tunnel to a point outside of the Comcast network. In general, although encryption can make it difficult to determine what is being shared, BitTorrent is generally vulnerable to traffic analysis. Thus even with MSE/PE, it may be possible for an ISP to recognize BitTorrent and also to determine that a system is no longer downloading, only uploading, information and terminate its connection by injecting TCP RST (reset flag) packets.  Multitracker Another unofficial feature is an extension to the BitTorrent metadata format proposed by John Hoffman and implemented by several indexing websites. It allows the use of multiple trackers per file, so if one tracker fails, others can continue supporting file transfer. It is implemented in several clients, such as BitComet, BitTornado, KTorrent and µTorrent. Trackers are placed in groups, or tiers, with a tracker randomly chosen from the top tier and tried, moving to the next tier if all the trackers in the top tier fail. Torrents with multiple trackers can decrease the time it takes to download a file, but also has a few consequences: Users have to contact more trackers, leading to more overhead-traffic. Torrents from closed trackers suddenly become downloadable by non-members, as they can connect to a seed via an open tracker.  Implementations Main article: Comparison of BitTorrent software Because of the open nature of the protocol, many clients have been developed that support numerous platforms and written using various programming languages. The official client is also named BitTorrent. Some clients, like Torrentflux, can be run straight from a server, allowing hosting companies to offer speeds unavailable to most users. Sites such as Torrent2FTP offer services to download torrents and then make them available to the customer on a FTP server. Opera Software now incorporates BitTorrent downloads through its popular browser software, as does Wyzo. An increasing number of hardware devices are being made to support BitTorrent. These include routers and NAS devices. ADS NAS BYOD NAS Asus WL-500gP WiFi router Asus WL-700gE WiFi router Coolmax CN-570 BYOD NAS Freecom: Freecom Storage Gateway / DataTank Gateway / Network Drive Pro QNAP TS-101 (Uses myBittorrent's search engine) Synology (a number of their products) Thecus YES Box N2100 BYOD NAS As well as anything capable of running OpenWrt (routers) or Openslug (NAS) like the NSLU2  Development An as yet (2 February]], [[2008) unimplemented unofficial feature is Similarity Enhanced Transfer (SET), a technique for improving the speed at which peer-to-peer file sharing and content distribution systems can share data. SET, proposed by researchers Pucha, Andersen, and Kaminsky, works by spotting chunks of identical data in files that are an exact or near match to the one needed and transferring these data to the client if the 'exact' data are not present. Their experiments suggested that SET will help greatly with less popular files, but not as much for popular data, where many peers are already downloading it. Andersen believes that this technique could be immediately used by developers with the BitTorrent file sharing system.