You might suspect that robots are taking over the world, you might know that they have already built up a huge powerbase on the internet. But you don’t know how important this is. No one does. UCL researcher Juan Echeverria, explains how you can take part in a ground-breaking project to assess out the extent of ‘botnets’
Bots have long been a part of computer science. They have plagued almost all aspects of information technology, from chat networks to websites and all the way to social media and Pokemon GO. But “bots” is a really broad term, in this case we will focus on Twitter bots. Twitter bots are accounts which are mostly automated, they need little to no human interaction to operate. When managed in numbers and centrally controlled, we refer to them as a botnet.
Botnet attacks on Twitter range from pro-Kremlin political propaganda to diet supplements and spam. Botnets often go undetected for years. Researchers have consistently tried to gauge just how many bots are on Twitter, and how to detect them. Bot classifying efforts have spanned all sorts of heuristics, machine-learning, and human wit. Some of these efforts have resulted in services that are publicly available, like botornot. Whole sessions at top notch conferences have been devoted to this problem, like this tutorial. Even the US Defense Department’s research arm DARPA has chimed in with a challenge on bot detection. All these efforts by so many intelligent people who are worried about this problem, and still, here we are.
Our research in the UCL Media Futures group is varied, from the spread of viruses and detecting epidemics using Google and Twitter, to bidding and optimization. Particularly I investigate sampling methods on Twitter, and bots. We began with a random sample of all English speaking Twitter users. After plotting their tweets in a map, we noticed a strange pattern which was unlikely to appear organically. Investigating the tweets that fell on this pattern we found the Star Wars bots. These bots are rather silent, have been sitting idle for several years, and do not seem to bother anyone in general. So why are they important?
They are a symptom. They are really simple bots, all it took to know something was wrong was to plot some tweets in a map. After manually identifying a percentage of them, we collected all 357,000 Star Wars bots. This was feasible because all of them shared a narrow ID range (which means they were created in just two months). How can these bots still be there polluting Twitter (with their presence, if not their tweets), if they are so simple. Maybe we aren’t as good as we thought we were at classifying bots.
Bots like these seem harmless, nevertheless it is well known that older accounts are sold at a premium in the black market. We know that botmasters stockpile accounts for when they are needed.
The biggest impact we can have on this, is research. We believe the most important problem is having ground truth data for bots and real users. Ground truth in this scenario means bots that have been identified, and it normally means manually identified. There is little ground truth data for bots, and the few instances that are publicly available can be deleted at any time. Getting the data and tagging it is complicated further by Twitter’s data policies.
To address this data problem, we have a simple solution. We have created a twitter account ( @thatisabot ) in which we ask people to tag, mention, or direct message us with any bots they find. This account is public. Using these accounts tagged by users as bots, we plan on creating a large dataset of bot IDs, which will be released to researchers to use as they see fit. This will address the ground truth data problem at a time in which it’s needed the most.
As for our research, some of the lessons learned from the Star Wars bots have allowed us to detect a larger and more malicious botnet with over 500,000 accounts. Hopefully we will be reporting on that soon. We also plan to release the IDs of the Star Wars bots, and of the new botnet when we are allowed to do so.
Juan Echeverria, PhD Candidate at UCL
The views expressed in opinion articles published on euronews do not represent our editorial position