Predictably inaccurate: The prevalence and perils of bad big data

Link: https://www2.deloitte.com/us/en/insights/deloitte-review/issue-21/analytics-bad-data-quality.html

Graphic:

Excerpt:

More than two-thirds of survey respondents stated that the third-party data about them was only 0 to 50 percent correct as a whole. One-third of respondents perceived the information to be 0 to 25 percent correct.

Whether individuals were born in the United States tended to determine whether they were able to locate their data within the data broker’s portal. Of those not born in the United States, 33 percent could not locate their data; conversely, of those born in the United States, only 5 percent had missing information. Further, no respondents born outside the United States and residing in the country for less than three years could locate their data.

The type of data on individuals that was most available was demographic information; the least available was home data. However, even if demographic information was available, it was not all that accurate and was often incomplete, with 59 percent of respondents judging their demographic data to be only 0 to 50 percent correct. Even seemingly easily available data types (such as date of birth, marital status, and number of adults in the household) had wide variances in accuracy.

Author(s): John Lucker, Susan K. Hogan, Trevor Bischoff

Publication Date: 31 July 2017

Publication Site: Deloitte

A Unified Approach to Interpreting Model Predictions

Link: https://papers.nips.cc/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf

Graphic:

Excerpt:

Understanding why a model makes a certain prediction can be as crucial as the
prediction’s accuracy in many applications. However, the highest accuracy for large
modern datasets is often achieved by complex models that even experts struggle to
interpret, such as ensemble or deep learning models, creating a tension between
accuracy and interpretability. In response, various methods have recently been
proposed to help users interpret the predictions of complex models, but it is often
unclear how these methods are related and when one method is preferable over
another. To address this problem, we present a unified framework for interpreting
predictions, SHAP (SHapley Additive exPlanations). SHAP assigns each feature
an importance value for a particular prediction. Its novel components include: (1)
the identification of a new class of additive feature importance measures, and (2)
theoretical results showing there is a unique solution in this class with a set of
desirable properties. The new class unifies six existing methods, notable because
several recent methods in the class lack the proposed desirable properties. Based
on insights from this unification, we present new methods that show improved
computational performance and/or better consistency with human intuition than
previous approaches.

Author(s): Scott M. Lundberg, Su-In Lee

Publication Date: 2017

Publication Site: Conference on Neural Information Processing Systems