Exemplar Viral Images.
Exemplar Non Viral Images.
Virality of online content on social networking websites is an important but esoteric phenomenon often studied in fields like marketing, psychology and data mining. In this paper we study viral images from a computer vision perspective. We introduce three new image datasets from Reddit, and define a virality score using Reddit metadata. We train classifiers with state-of-the-art image features to predict virality of individual images, relative virality in pairs of images, and the dominant topic of a viral image. We also compare machine performance to human performance on these tasks. We find that computers perform poorly with low level features, and high level information is critical for predicting virality. We encode semantic information through relative attributes. We identify the 5 key visual attributes that correlate with virality. We create an attribute-based characterization of images that can predict relative virality with 68.10% accuracy (SVM+Deep Relative Attributes) -- better than humans at 60.12%. Finally, we study how human prediction of image virality varies with different `contexts' in which the images are viewed, such as the influence of neighbouring images, images recently viewed, as well as the image title or caption. This work is a first step in understanding the complex but important phenomenon of image virality. Our datasets and annotations will be made publicly available.
Full paper (To be presented at CVPR): http://arxiv.org/abs/1503.02318
The Virality Image Dataset consists of ~132K Reddit submissions that are summarized in 10078 images (that include resubmissions). Each image has a unique ID tag, and has different score meta-data, including virality, popularity (max normalized upvotes), mean normalized upvotes, number of resubmissions , max number of comments, and max raw Reddit score. Images have been sorted in ascending order of virality. Download here: Virality Dataset Disclaimer/Warning: Some images are NSFW (Not Safe For Work, given their explicit and/or sexual content. Viewer discretion is advised.) The authors don't hold any opinions of the images freely released, since they have been used for research purposes only.
Download Here. Dataset V2 (includes SNAP img id's): Download Here
The bold attribute in each mini-table is the primed attribute. Notice that each primed attribute comes along with a sign, to indicate how it is primed. The next attributes added to the list, are the relative attributes that should be added to the viral image to increase its correlation with virality. The column to the right of the attributes are correlation values. A negative correlation means that an attribute will make the image less viral. A zero correlation means that the attribute (or combination of attributes) do not contribute to vitality one way or the other. A positive correlation means that the attribute (or combination) make the image viral. Correlation values go from -1 (worst case scenario), to +1 (best case scenario); the higher the correlation, the better (in terms of making the image viral).
We believe that these tables will be of great use for Graphic Designers, Marketing researchers and common internet users who want to increase the virality likelihood (apriori) of a submission of an image to Reddit, by modifying their image relative to other images being submitted, or that are 'in vogue' at the moment. The tables are freely available for both commercial and research purposes under the BSD license.
Download Complete Positively Primed Attribute DataSheet
Download Complete Negatively Primed Attribute DataSheet
@inproceedings{deza2015virality, Author = {Arturo Deza and Devi Parikh}, Title = {Understanding Image Virality}, Year = {2015}, booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)} }