Turn has attained an incredible milestone in our reach of the global digital audience. We now see 4 million queries per second (QPS) on our platform, meaning that during every second Turn has 4 million opportunities to access Internet end users. This is particularly astonishing given that just a little over five months ago, Turn achieved 3 million QPS. While such scale and growth is not unexpected from the Internet, what is extraordinary is that Turn did it without any incremental hardware expenditure and at the same time increased ROI for our advertisers. In other words, our engineering team was able to extract more performance from existing servers by a series of application performance tuning. In this blog post we’d like to share some of the work we’ve done to achieve this high level of scale.
The reality of programmatic bidding is that while we have a tremendous number of opportunities to bid, we might not always want to do so if it is not beneficial to Turn or advertisers. So, we developed a rich set of quick decision-making algorithms in bid servers to perform “early exit.” Essentially, these algorithms filter out potentially undesirable bids to avoid passing them to ad servers. This is where Turn’s “secret sauce” resides – preventing the waste of expensive computing and network resources.
One example of early exit is to drop bids from fraudulent sources like botnets. Our integration with DoubleVerify helps us ascertain if a bid is fraudulent, but because we observe many unique signals, we developed and deployed our own in-house fraud detection model as well. Additionally, we might find that certain bids are less valuable, such as those from sites with low predicted performance or from users with low-quality or no signals, in which case we probabilistically filter out some percentage of these bids. For example, in the most recent optimization, we were able to reduce the ad server queue overflow by 20% and increase delivery and ROI on high-impact campaigns by as much as 75% in some cases.The ad server is the “secret sauce” so to speak, in that it is the most computationally intense part of our pipeline and we continue to look for ways to improve it. We reduced memory usage of our ad server by 15%, but in return we developed a cache to store the most complex user targeting rules which decreased ad selection time by 5%. This forms part of a larger effort of ad selection optimizing that reduced average execution time from 2.6ms to 2.1ms, a gain of 19%.
Turn leverages large-scale data to make intelligent decisions on bidding, including page-level contextual data, which is important to determine if a page fits advertisers’ branding requirements. But the feature set for each page has become very rich, and data has grown to a point where it has started to have memory and network implications. Therefore we embarked on optimizing contextual data which is stored in our NoSQL database but is transferred to bid and ad servers. We were able to restructure the data and reduce its footprint by 90%, leading to a 30% decrease in network load, which translated into an ability to take on 30% more traffic.
Communication is the backbone of distributed systems, and Turn spent a lot of effort tuning it. Recently we have integrated GRPC (see http://www.grpc.io/) as our new messaging layer, which led to faster response times and lower failure rates. For example, the 98th percentile of remote procedure call (RPC) latency between the bid server and ad server dropped from 80ms to 60ms, an improvement of 25%. The timeout rate also dropped from 10% to 1%, leading to a 10% global increase in RPC volume. However, this also presents a quandary in that excessive traffic to ad servers might exceed their capacity. We solved that by implementing new concurrency management and robust back pressure so that bursts in traffic are smoothed out, leading to near 100% CPU utilization of ad servers with less than 1% failure rate.
On a concluding note, these innovations are possible only because Turn’s culture empowers our engineers to act on their natural curiosity by providing them with interesting challenges to tackle that have direct bearing on the company’s bottom line. The improvements described in this post were done not by one single super-developer but are accomplishments from different team members. By democratizing the spirit of innovation, time and time again we see the fruits of our labor.