Fact-Check

  • Code for [ ACL’21 ].
    The code repository to replicate our results and figures.
    link

  • Fact-Checks dataset used in [ ACL’21 ].
    The dataset contains 20,000+ URLs of fact-checks from Snopes.
    download

  • Fact-Checks dataset used in [ WWW’20 ].
    The dataset contains 6,000+ URLs of fact-checks and their claims, claimants and verdicts.
    link

Misinformation

  • Code for [ ICWSM’20 ] and [ CSCW’18a ].
    The code repository to replicate our results and figures.
    link

  • (Dis)belief dataset used in [ ICWSM’20 ].
    The dataset contains 6,000+ tweets annotated with belief and disbelief labels.
    download

  • User Comments dataset used in [ ICWSM’20 ] and [ CSCW’18a ].
    The dataset contains 2,600,000+ social media comments in reply to above posts.
    facebook youtube twitter

  • Social Media Posts dataset used in [ CSCW’18a ].
    The dataset contains 5,000+ social media posts and their veracity judged by Snopes or PolitiFact.
    download

  • ComLex lexicon used in [ CSCW’18a ].
    The lexicon contains 300 categories but only the top 56 named ones are human validated.
    download

Content Moderation

  • Code for [ AAAI’20 ] and [ ICWSM’19 ].
    The code repository to replicate our results and figures.
    link

  • YouTube Comments dataset used in [ AAAI’20 ] and [ ICWSM’19 ].
    The dataset contains 84,000+ YouTube comments and their annotations described in our papers.
    download

Ridesharing

  • Drivers’ Trajectories dataset used in [ WWW’18 ].
    Unfortunately, due to Uber’s and Lyft’s Terms of Service, the dataset is not available to the public. A visualization of Uber and Lyft drivers using this dataset is made public by the San Francisco County Transportation Authority.
    link