It’s Not All About Size: Why Parameter Choice is Important when Learning from Big Data


By Beth Logan, Director of Optimization, DataXu

Is more always better?

I am often asked how many parameters DataXu’s system evaluates when bidding. This is usually in response to a customer hearing that company X’s system has a large number of parameters, the implicit assumption being that more is better. However, while this is certainly a valid first question, an equally important follow-up question is “How do you handle these parameters?”

This follow-up question is needed due to a phenomenon known as “The Curse of Dimensionality”; as the number of parameters being modeled increases, combinatorially larger amounts of learning data is needed to model the parameters effectively. For a fixed amount of training data then, there are diminishing returns from adding new parameters to the model. This doesn’t mean more parameters cannot be considered, but the final model need only use the most informative ones.

At DataXu, we consider many parameters for each campaign. These parameters – commonly known as “features” in machine learning lingo – fall into three main categories: context, user information and creative.

  • Context considers where the creative will be shown: on what site, which content channel, which exchange, whether it’s above the fold and so on.
  • User information captures whether and if so when this user has visited any of the advertisers’ lower funnel pages, how many impressions have they previously seen, what time it is, where the user is located and what sort of device and browser are they using.
  • For the creative, we have features such as different creative concept and sizes.

For every campaign, we consider about 50 such features. Each feature can take multiple values; some can take 100s of values e.g. site name. Multiplying all the permutations for all the advertisers we serve certainly gives numbers in the millions. However, as discussed above, after a certain point, new features don’t add value for a fixed amount of learning data. A key part of our system is ranking these features and choosing the best for each advertiser each day so they only receive the most relevant, important information.

By using only a subset, we remove noise introduced by less-informative features. We also save memory and processing time. Last but certainly not least, we provide cool insights for our customers who love to hear which features best predict converters.

So the next time you hear that a system has millions of parameters, wait before being impressed. Ask how the parameters that matter are chosen and whether there’s enough learning data to justify that choice. By doing so, you show that you know that it’s not just about size when it comes to Big Data.