In December 2016, White Ops broke news uncovering a massive $1 billion-plus Russian botnet fraud ring, dubbed Methbot. As part of the coverage, White Ops released 4,112 compromised IP addresses which advertisers and technology companies could subsequently block from their campaigns. In many cases though, the damage was already done and advertisers had to accept the billions of ad dollars they had lost to the fraudsters and move on. Upon hearing the news from White Ops, Goodway Group immediately ran an internal check to assess any impact on our clients’ campaigns. Not surprising to Goodway Group, we were virtually unaffected by Methbot with only .004% of total 2016 impressions being delivered on the compromised IP addresses listed from White Ops. To better understand how we achieved such impressive results, we sat down with this Goodway expert to learn more: Scotty is a seven-year data scientist veteran, seasoned in developing and executing creative terabyte scale design applications across federal, finance, healthcare, and retail industries. In past roles, he structured data analytic solutions and formulated strategic recommendations for $60M+ DoD initiatives. At Goodway Group, he is tasked with balancing machine learning with human processes, moving easily from a vision and strategy to measurement and accountability. Ultimately, his job is to bring science to the art of programmatic digital advertising.
Q: Methbot was one of the most sophisticated fraud rings to ever hit the industry—impacting over a billion dollars of advertising revenue. How was Goodway Group able to avoid delivering on Methbot’s thousands of compromised IP addresses?
A: (Scotty) We have five different data science controls in place to protect us against delivering on compromised IP addresses—a global frequency cap, a campaign-specific frequency cap, general IP blacklisting, data center IP blacklisting via Pixalate data, and our proprietary clustering algorithm.
Also earlier this year as part of our exclusive anti-fraud initiative, we cut 13 SSPs from our ecosystem and worked directly with our technology partners to enhance fraud prevention controls. The biggest complement to these algorithms and technology controls is our trading team of 70+ world-class pros. They are vigilant in looking for suspicious activity that technical controls simply can’t catch. For instance, they scrub site lists daily and maintain a healthy blacklist.
It’s likely the combination of both our human policing efforts and all of our technical controls helped us to shut out Methbot.
Q: Have you found that one of these controls was more impactful than the others in blocking Methbot or any other type of fraud?
A: (Scotty) Each technique works to control a certain aspect of fraud, but, for Methbot specifically, we found the greatest results from our clustering algorithm. White Ops released 4,112 compromised IP addresses, and our clustering algorithm had already blocked 1,877 of them —nearly 46%.
Q: Wow, that seems like a lot! Do you know how that compares to others in the industry?
A: (Scotty) We have a limited ability to see into others’ networks, but we did learn that one other company blocked only 4 out of the 4,112 compromised IPs, so we’re definitely proud of our achievement in comparison. In broader terms though, our clustering algorithm normally identifies about 65K IP addresses on 30 days’ worth of data. So in a given year, we block the equivalent of 156 Methbots’ worth of IP addresses. There is a lot of suspicious activity out there; so if you’re waiting on White Ops or any other fraud technology company to tell you what IPs are bad, you are already behind the race 156X.
Q: Clearly, the clustering algorithm is an important piece of Goodway Group’s anti-fraud efforts. For all of us non-data scientists, can you please explain what a clustering algorithm is?
A: (Scotty) Sure, a clustering algorithm takes a set of data that describes the number of devices we see on an IP address, number of cities, number of browsers, number of operating systems, etc., and then creates clusters of IP addresses based on that data that can be associated to each other by distance. Clusters of people are separated by geographic distance; for example, the cluster of people you would call your neighbors is determined by their geographic distance from you.
Well, clusters of data are separated by geometric distance (Euclidean distance is the metric we use for anyone interested in the specifics), for instance, the distance between two points on a Cartesian plot. We then look at the distances between those clusters and say, “Well, 95% of our IP addresses are in this cluster, but 5% are in this outlier cluster WAY far away from the rest.” We don’t want those, so we blacklist them. To be fair, we don’t know why we don’t want them, we just know they are so different from the norm that they look risky to us.
If you want more than just a snapshot of the amazing data science behind our results, contact us today, and we’ll totally talk nerdy with you.