A Cautionary Tail - 1807.03860.pdf
Data scientists frequently train predictive models on administra-tive data. However, the process that generates this data can biaspredictive models, making it important to test models against theirintended use. We provide a field assessment framework that weuse to validate a model predicting rat infestations in Washington,D.C. the model was developed with data from the city’s 311 servicerequest system. Although the model performs well against new311 data, we €nd that it does not perform well when predicting theoutcomes of inspections in our €eld assessment. We recommendthat data scientists expand the use of €eld assessments to test theirmodels.
