Fact-Check

  • Code for [ACL’21].
    The code repository to replicate our results and figures.
    link

  • Fact-Checks dataset used in [ACL’21].
    The dataset contains 20,000+ URLs of fact-checks from Snopes.
    download

  • Fact-Checks dataset used in [WWW’20].
    The dataset contains 6,000+ URLs of fact-checks and their claims, claimants and verdicts.
    link

Misinformation

  • Code for [ICWSM’20] and [CSCW’18].
    The code repository to replicate our results and figures.
    link

  • (Dis)belief dataset used in [ICWSM’20].
    The dataset contains 6,000+ tweets annotated with belief and disbelief labels.
    download

  • User Comments dataset used in [ICWSM’20] and [CSCW’18].
    The dataset contains 2,600,000+ social media comments in reply to above posts.
    facebook youtube twitter

  • Social Media Posts dataset used in [CSCW’18].
    The dataset contains 5,000+ social media posts and their veracity judged by Snopes or PolitiFact.
    download

  • ComLex lexicon used in [CSCW’18].
    The lexicon contains 300 categories but only the top 56 named ones are human validated.
    download

Content Moderation

  • Code for [AAAI’20] and [ICWSM’19].
    The code repository to replicate our results and figures.
    link

  • YouTube Comments dataset used in [AAAI’20] and [ICWSM’19].
    The dataset contains 84,000+ YouTube comments and their annotations described in our papers.
    download

Ridesharing

  • Drivers’ Trajectories dataset used in [WWW’18].
    Unfortunately, due to Uber’s and Lyft’s Terms of Service, the dataset is not available to the public. A visualization of Uber and Lyft drivers using this dataset is made public by the San Francisco County Transportation Authority.
    link