Identifying Inliers

Abstract

The problem of outliers is well-known in statistics: an outlier is a value that is far from the general distribution of the other observed values, and can often perturb the results of a statistical analysis. Various procedures exist for identifying outliers, in case they need to receive special treatment, which in some cases can be exclusion from consideration. An inlier, by contrast, is an observation lying within the general distribution of other observed values, generally does not perturb the results but is nevertheless non-conforming and unusual. For single variables, an inlier is practically impossible to identify, but in the multivariate case, thanks to interrelationships between variables, values can be identified that are observed to be more central in a distribution but would be expected, based on the other information in the data matrix, to be more outlying. We propose an approach to identify inliers in a data matrix, based on the singular value decomposition. An application is presented using a table of economic indicators for the 27 member countries of the European Union in 2011, where inlying values are identified for some countries such as Estonia and Luxembourg.