In last month's edition of dataxu’s Technical Tuesday series, we took a look at why you should write your own spark classifier. This month, Maximo Gurmendez, a Data Science Engineering Lead, dives into the practical steps of how to write a custom Spark Classifier using Categorical Naive Bayes.
How to write a custom Spark Classifier: Categorical Naive Bayes
Recently, we took a look at a few reasons why you would want to write your own Spark Classifier/Estimator. And now that you’ve decided to move forward, let’s take a look at how to write your own Spark Classifier by going over the implementation of a flavor of Naive Bayes for binary classification over categorical features. As opposed to the standard Spark implementation, this version does not need additional stages for feature encoding into numerical vectors, making it much faster to train.