Uplift Modeling with High Class Imbalance

Otto E Nyberg (University of Helsinki)*; Tomasz Kuśmierczyk (University of Helsinki); Arto Klami (University of Helsinki)


Uplift modeling refers to estimating the causal effect of a treatment on an individual sample, used for instance to identify customers worth targeting with a discount in e-commerce. We introduce a simple yet effective undersampling strategy for dealing with the prevalent problem of high class imbalance (low conversion rate) in such applications. Our strategy is agnostic to the base learners and produces a 6.5% improvement over the best published benchmark for the largest public uplift data which incidentally exhibits high class imbalance. We also introduce a new metric on calibration for uplift modeling and present a strategy to improve the calibration of the proposed method.