Christopher Bouzy is trying to stay ahead of the bots. As the person behind Bot Sentinel, a popular bot-detection system, he and his team continuously update their machine learning models out of fear that they will get "stale." The task? Sorting 3.2 million tweets from suspended accounts into two folders: "Bot" or "Not."
To detect bots, Bot Sentinel's models must first learn what problematic behavior is through exposure to data. And by providing the model with tweets in two distinct categories—bot or not a bot—Bouzy's model can calibrate itself and allegedly find the very essence of what, he thinks, makes a tweet problematic.
Training data is the heart of any machine learning model. In the burgeoning field of bot detection, how bot hunters define and label tweets determines the way their systems interpret and classify bot-like behavior. According to experts, this can be more of an art than a science. "At the end of the day, it is about a vibe when you are doing the labeling," Bouzy says. "It's not just about the words in the tweet, context matters."
From Wired
View Full Article
No entries found