Skip to content

This is code for How Do Social Bots Participate in Misinformation Spread? A Comprehensive Dataset and Analysis

License

Notifications You must be signed in to change notification settings

whr000001/MisBot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MisBot

The official repository for the paper How Do Social Bots Participate in Misinformation Spread? A Comprehensive Dataset and Analysis, which is accepted in EMNLP 2025.

The homepage of MisBot: https://whr000001.github.io/MisBot/

We have published MisBot on Baidu Disk: https://pan.baidu.com/s/1h7ga9yDBZ9JI4VQsXXptdQ?pwd=8dtl

Due to privacy concerns, we only upload the data after removing the private information. If you have downloaded the previous version, please remove it and download the newest one.

If you need the original data, please contact Herun Wan through wanherun at stu.xjtu.edu.cn and state the purpose of use.

What you can get from MisBot

Note that we modified MisBot slightly, so some statistics are slightly different from those reported in the paper.

  • We have developed a crawler to collect related information from the Weibo platform. However, we cannot make this crawler public. If you are interested in it, please contact Herun Wan through wanherun at stu.xjtu.edu.cn.
  • 99,874 annotated Weibo users that are annotated by human annotators, among them, there are 48,536 active users. You can employ it to develop and evaluate your own social bot detectors.
  • 23,622 information instances from Weibo, where the misinformation and real information are clearly distinguished (the accuracy of a vanilla detector is 95.2%). Thus, we recommend that you use the data for analysis rather than developing detectors.
  • 942,430 Weibo users who take part in the information spread (including reposting, commenting, and liking). We have annotated 407,801 users who are active using a weakly supervised annotator. The weakly supervised annotator is far from perfect (the accuracy of it is only 81.5%). As a result, we recommend that you use relevant data and do not trust the label information.

Data Structure

MisBot has two subsets: 'Information_Instances' and 'User_Instances':

  • 'Information_Instances':
    • 'misinformation.jsonl': The misinformation instances flagged by the Weibo platform.
    • 'verified_information.jsonl': The information instances from the verified news accounts.
    • 'trend_information.jsonl': The information instances from the trend in the Weibo platform.
    • 'imgs.zip': The related images.
    • 'videos.zip': The related videos. We have sampled frames of each video instance. If you need the original data, please contact Herun Wan through wanherun at stu.xjtu.edu.cn.
  • 'User_Instances':
    • 'train_data.jsonl': 99,874 annotated Weibo users that are annotated by human annotators.
    • 'train_data_sampled.jsonl': The active annotated Weibo users.
    • 'inference_data.jsonl': The Weibo users who take part in the information spread (including reposting, commenting, and liking).
    • 'inference_labels.json': The labels of active inference users generated by the weakly supervised annotator.

To obtain the detailed structures of each file and load MisBot, please refer to 'load_example.py'.

Citation

If you find our work interesting/helpful, please consider citing MisBot:

@inproceedings{wan-etal-2025-social,
    title = "How Do Social Bots Participate in Misinformation Spread? A Comprehensive Dataset and Analysis",
    author = "Wan, Herun  and
      Luo, Minnan  and
      Ma, Zihan  and
      Dai, Guang  and
      Zhao, Xiang",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.emnlp-main.1604/",
    pages = "31481--31504",
    ISBN = "979-8-89176-332-6"
}
@article{wan2024social,
  title={How Do Social Bots Participate in Misinformation Spread? A Comprehensive Dataset and Analysis},
  author={Wan, Herun and Luo, Minnan and Ma, Zihan and Dai, Guang and Zhao, Xiang},
  journal={arXiv preprint arXiv:2408.09613},
  year={2024}
}

Questions?

Feel free to open issues in this repository! Instead of emails, GitHub issues are much better at facilitating a conversation between you and our team to address your needs. You can also contact Herun Wan through wanheruna at stu.xjtu.edu.cn.

Updating

20250825

  • We have uploaded the newest version of MisBot, which masks the information that could determine the users on the Weibo platform. If you have downloaded the previous version, please remove it and download the newest one.

20250820

  • Our work is accepted by EMNLP 2025! 🎉🎉🎉

20250418

  • We have uploaded the preprocessed data and a part of the raw data.
  • We are considering employing other platforms to upload MisBot.

About

This is code for How Do Social Bots Participate in Misinformation Spread? A Comprehensive Dataset and Analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages