Abstract
The explosive growth of online social networks in recent years have generated massive amount of data-sets in user behaviors, social graphs, and contents. Given the scale, heterogeneity, and diversity of such big data, sampling becomes a simple and intuitive approach to reduce the size of the data-sets for collecting, measuring, and understanding users, behaviors and traffic in online social networks. In this paper, we quantify the impact of random sampling on the analysis of online social networks with Twitter streaming data as a case study. In addition, we design different sampling strategies including community sampling and strata sampling, and evaluate their impact on a broad range of behavioral characteristics of online social networks. Our experimental results show that community sampling has the minimum impact on tweet distributions across users and the structure of retweeting graphs, while achieving the similar data reductions as random and stratified sampling.
| Original language | English |
|---|---|
| Title of host publication | 2015 IEEE Global Communications Conference, GLOBECOM 2015 |
| Publisher | IEEE |
| ISBN (Print) | 9781479959525 |
| DOIs | |
| Publication status | Published - Dec 2015 |
| Event | 58th IEEE Global Communications Conference (GLOBECOM 2015) - San Diego, United States Duration: 6 Dec 2015 → 10 Dec 2015 |
Conference
| Conference | 58th IEEE Global Communications Conference (GLOBECOM 2015) |
|---|---|
| Place | United States |
| City | San Diego |
| Period | 6/12/15 → 10/12/15 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Fingerprint
Dive into the research topics of 'The impact of sampling on big data analysis of social media: A case study on flu and ebola'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver