Jinyoung Han, Seungbae Kim, Taejoong Chung, Ted "Taekyoung" Kwon, Hyun-chul Kim, Yanghee Choi
ACM SIGMETRICS 2012, London, UK, June 11-15, 2012.

We conduct comprehensive measurements on the current practice of content bundling to understand the structural patterns of torrents and the participant behaviors of swarms on one of the largest BitTorrent portals: The Pirate Bay. From the datasets of the 120 K torrents and 14.8 M peers, we investigate what constitutes torrents and how users participate in swarms from the perspective of bundling, across different content categories: Movie, TV, Porn, Music, Application, Game and E-book. In particular, we focus on: (1) how prevalent content bundling is, (2) how and what files are bundled into torrents, (3) what motivates publishers to bundle files, and (4) how peers access the bundled files. We find that over 72% of BitTorrent torrents contain multiple files, which indicates that bundling is widely used for file sharing. We reveal that profit-driven BitTorrent publishers who promote their own web sites for financial gains like advertising tend to prefer to use the bundling. We also observe that most files (94%) in a bundle torrent are selected by users and the bundle torrents are more popular than the single (or non-bundle) ones on average. Overall, there are notable differences in the structural patterns of torrents and swarm characteristics (i) across different content categories and (ii) between single and bundle torrents.

Measurement Framework

Developed Codes

  • Torrent Crawling Agent (download)
    • The crawling agent timely fetches newly released ".torrent" files by using an RSS feed from TPB. (We can fetch all the published torrents by using the RSS feeds from TPB.)

  • Swarm Monitoring Agent (download)
    • The monitoring agent keeps track of each swarm by modifying the Azureus client software.


  • Torrent Dataset

  • Swarm Dataset
    • For the torrents discovered between March 25 and April 26, we have periodically (once every two hours) captured swarm snapshots, to investigate access pattern of peers participating in the swarms. We restrict the swarm dataset collection and analysis to those of the torrents collected between March 25 and April 26, due to the performance limitations of our monitoring facilities, which consist of 14 desktop PCs (admittedly research-grade).
    • [Swarm data (11.48GB)]

  • Dataset Description


