Misinformation

  • (Dis)belief: A dataset collected for [ICWSM’20].
    The dataset contains 6,800+ tweets, with belief and disbelief annotations, in reply to 18 claims.
    [ download (.zip 0.7M) ]

  • Social Media Posts: A dataset collected for [CSCW’18a].
    The dataset contains 5,000+ social media posts, with their veracity judged by Snopes and PolitiFact.
    [ download (.csv 1.4M) ]

  • User Comments: A dataset collected for [CSCW’18a] and later used in [ICWSM’20].
    The dataset contains 2,600,000+ social media comments, in reply to above posts.
    [ facebook (.bz2 54M) | youtube (.bz2 49M) | twitter (.bz2 2.8M) ]

  • ComLex: An emotional and topical lexicon developed in [CSCW’18a].
    The lexicon contains 300 categories but only the top 56 named ones are human evaluated.
    [ download (.csv 0.1M) ]

Content Moderation

  • YouTube Comments: A dataset collected for [ICWSM’19] and [AAAI’20].
    The dataset contains 84,000+ YouTube comments, with moderation decisions, misinformation, partisanship, and other annotations.
    [ download (.csv 27M) ]

Fact-Checks

Ridesharing

  • Drivers’ Trajectories: Due to Uber’s and Lyft’s Terms of Service, the dataset used for [WWW’18] is, unfortunately, not available to the public.

  • TNCsToday: Visualization of Uber and Lyft drivers in San Francisco using this dataset.
    Available at: https://tncstoday.sfcta.org