Misinformation

  • (Dis)belief dataset used in [ ICWSM’20 ].
    The dataset contains 6,800+ tweets, with belief and disbelief annotations, in reply to 18 claims.
    download

  • Social Media Posts dataset used in [ CSCW’18a ].
    The dataset contains 5,000+ social media posts, with their veracity judged by Snopes and PolitiFact.
    download

  • User Comments dataset used in [ CSCW’18a ] and [ ICWSM’20 ].
    The dataset contains 2,600,000+ social media comments, in reply to above posts.
    facebook youtube twitter

  • ComLex lexicon used in [ CSCW’18a ].
    The lexicon contains 300 categories but only the top 56 named ones are human evaluated.
    download

Content Moderation

  • YouTube Comments dataset used in [ ICWSM’19 ] and [ AAAI’20 ].
    The dataset contains 84,000+ YouTube comments, with moderation decisions, misinformation, partisanship, and other annotations.
    download

Fact-Checks

  • Fact-Checks dataset used in [ WWW’20 ].
    The dataset contains 6,000+ URLs of fact-checks, with reported factors (claim, claimant, verdict, etc).
    external link

Ridesharing

  • Drivers’ Trajectories dataset used in [ WWW’18 ].
    Unfortunately, due to Uber’s and Lyft’s Terms of Service, the dataset is not available to the public. A visualization of Uber and Lyft drivers using this dataset is made public by the San Francisco County Transportation Authority.
    external link