Jinyoung Han, Daejin Choi, Byung-Gon Chun, Ted "Taekyoung" Kwon, Hyun-chul Kim, Yanghee Choi
ACM SIGMETRICS 2014, Austin, Texas, USA, June 16-20, 2014.

 Pinterest, a popular social curating service where people collect, organize, and share content (pins in Pinterest), has gained great attention in recent years. Despite the increasing interest in Pinterest, little research has paid attention to how people collect, manage, and share pins in Pinterest. In this paper, to shed insight on such issues, we study the following questions. How do people collect and manage pins by their tastes in Pinterest? What factors do mainly drive people to share their pins in Pinterest? How do the characteristics of users (e.g., gender, popularity, country) or properties of pins (e.g., category, topic) play roles in propagating pins in Pinterest? To answer these questions, we have conducted a measurement study on patterns of pin curating and sharing in Pinterest. By keeping track of all the newly posted and shared pins in each category (e.g., animal, kids, women's fashion) from June 5 to July 18, 2013, we built 350 K pin propagation trees for 3 M users. With the dataset, we investigate: (1) how users collect and curate pins, (2) how users share their pins and why, and (3) how users are related by shared pins of interest. Our key finding is that pin propagation in Pinterest is mostly driven by pin's properties like its topic, not by user's characteristics like her number of followers. We further show that users in the same community in the interest graph (i.e., representing the relations among users) of Pinterest share pins (i) in the same category with 94% probability and (ii) of the same URL where pins come from with 89% probability. Finally, we explore the implications of our findings for predicting how pins are shared in Pinterest.

[PDF Link]
@inproceedings{Han:2014:COS:2591971.2591996,
 author = {Han, Jinyoung and Choi, Daejin and Chun, Byung-Gon and Kwon, Ted and Kim, Hyun-chul and Choi, Yanghee},
 title = {Collecting, Organizing, and Sharing Pins in Pinterest: Interest-driven or Social-driven?},
 booktitle = {The 2014 ACM International Conference on Measurement and Modeling of Computer Systems},
 series = {SIGMETRICS '14},
 year = {2014},
 isbn = {978-1-4503-2789-3},
 location = {Austin, Texas, USA},
 pages = {15--27},
 numpages = {13},
 url = {http://doi.acm.org/10.1145/2591971.2591996},
 doi = {10.1145/2591971.2591996},
 acmid = {2591996},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {content propagation, online social network, pinterest, repin, social curating},
} 


Measurement Framework

 Since Pinterest does not provide an official API for data collection, we developed our measurement system by crawling Pinterest pages as shown in Figure below. We fetch web pages in Pinterest, from which the relevant information is extracted; the data about each pin or board can be extracted from a web page. This is challenging since we need to crawl a large number of web pages from Pinterest. For example, if a user has 1,000 boards, we need make 1,000 HTTP requests to collect the data about her board. To address this problem, we designed a distributed crawling system. Our measurement cluster consists of 25 PCs, which continuously send HTTP requests assigned by the job scheduler. The HTTP dispatcher processes the HTTP requests and responses according to the tasks explained below.


  There are two main tasks in our system: pin task and user task. Unlike prior measurement studies, we focus on pin propagation patterns. To this end, we periodically (every five minutes) monitor all the newly-posted pins in the menu of each category (e.g., animal, kids, women's fashion). Since Pinterest shows all the recent activities including posting a pin, repinning, and leaving a comment in the menu of each category in the chronological order, our pin seeker fetches 10 recent web pages not to miss newlyposted pins. The pin-tree observer keeps track of each pin and its associated repins to build a pin propagation tree, which is called a pin-tree. If a user repins the original pin, Pinterest provides a link to the board that includes the repinned one; we can find and fetch the associated web page of the repinned one among other pin pages in the board, so that we can keep track of the chain of the pin-tree. The collected information of each pin-tree are stored in the pin-tree database. The pin (and repin) dataset consists of the number of likes, number of comments, its category, its source, and its description, which is stored in the pin database.

  In the user task, we collect the information (e.g., number of pins, number of followers, number of boards, gender, country, etc.) of each user. In addition to the 1 M users found in pin-trees, the user seeker additionally finds 2 M users using a breath first search (BFS) in Pinterest. For the discovered 3 M users, we collect the information of each user, including her name, her description, gender, number of followers, number of followings, number of boards, number of pins, number of likes, her external website, location, and Facebook/Twitter links, which are stored in the user profile database. Along with the user profiles, the board collector collects the information of each board including its category, and number of pins, which are stored in the board database. To identify the gender and country of users, we use external links to Facebook and Twitter, which can be found in the profile pages of users. The Facebook/Twitter collector sends queries to Facebook and Twitter through their APIs and fetches the gender and country information of each user if available. We finally decide the gender and country of each user by collectively combining information from Pinterest, Facebook, and Twitter.



Dataset

 Our dataset had been collected from June 5 to July 18, 2013. We kept track of 346,329 pin-trees, which contain 346,329 (original) pins and their 1,215,045 repins, which are shared by 1,561,374 users. In addition to the users found in pin-tress, we further discovered 1,412,754 users using a breath first search (BFS) through Pinterest. Finally, our dataset includes 2,974,128 users (i.e., 1,561,374 users found in pin-trees + 1,412,754 users discovered through BFS). The dataset collectively contains 40,800,940 boards, 3,362,100,884 pins, 656,123,740 followers, 302,363,300 followings, 1,392,394 Facebook links, and 183,900 Twitter links. We also obtained the country and gender information of 1,354,132 and 1,392,394 users, respectively.
* Data is only available on a condition that the paper listed above is cited by your work.


Links

  1. Seoul National University
  2. Network Convergence & Security Laboratory


Contact