TY - JOUR
T1 - A random digit search (RDS) method for sampling of blogs and other user-generated content
AU - Zhu, Jonathan J.H.
AU - Mo, Qian
AU - Wang, Fang
AU - Lu, Heng
PY - 2011/8
Y1 - 2011/8
N2 - Blogs are arguably the most popular genre of user-generated content (UGC), which make blogs a gold mine for social science research. However, existing research on blogs has suffered from nonprobability samples collected either manually or by computerized crawling based on random walks method. The current article presents a probability sampling method for blogs, called random digit search (RDS), that is modified from the popular "random digit dialing" (RDD) method used in telephone surveys. The RDS method was tested in a study of Sina Blog, a popular blog service provider (BSP) in China. The results show that, while "random walks" sampling tends to oversample popular/active blogs, probability samples generated by RDS yield consistent and precise estimates of population parameters. Although the RDS takes advantage of the numeric identification (ID) system used on Sina Blog, the general principles may be applicable to other BSPs and many other genres of UGC. © The Author(s) 2011.
AB - Blogs are arguably the most popular genre of user-generated content (UGC), which make blogs a gold mine for social science research. However, existing research on blogs has suffered from nonprobability samples collected either manually or by computerized crawling based on random walks method. The current article presents a probability sampling method for blogs, called random digit search (RDS), that is modified from the popular "random digit dialing" (RDD) method used in telephone surveys. The RDS method was tested in a study of Sina Blog, a popular blog service provider (BSP) in China. The results show that, while "random walks" sampling tends to oversample popular/active blogs, probability samples generated by RDS yield consistent and precise estimates of population parameters. Although the RDS takes advantage of the numeric identification (ID) system used on Sina Blog, the general principles may be applicable to other BSPs and many other genres of UGC. © The Author(s) 2011.
KW - random digit search
KW - random walks
KW - web crawling
KW - web sampling
UR - http://www.scopus.com/inward/record.url?scp=79960618536&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-79960618536&origin=recordpage
U2 - 10.1177/0894439310382512
DO - 10.1177/0894439310382512
M3 - RGC 21 - Publication in refereed journal
SN - 0894-4393
VL - 29
SP - 327
EP - 339
JO - Social Science Computer Review
JF - Social Science Computer Review
IS - 3
ER -