Generalization Method when Manipulating Relational Databases
Abstract
Contemporary computers generate massive datasets. One way to handle these data is to aggregate them into smaller datasets (with the aggregation criteria dictated by meaningful scientific questions of interest). This paper focuses on aggregations that produce interval datasets. Algorithms are introduced both to build intervals which are typically homogeneous, and to test that such homogeneity pertains. They also test whether or not observations across the resulting intervals are mixtures of uniform distributions rather than the desired single distribution. These include consideration of outlier observations. The methods are illustrated on two datasets.