Abstract
Scagnostics (scatterplot diagnostics) are nine graph-theoretic measures that characterize properties of point clouds in scatterplots. These measures are derived from three foundational geometric structures: the Minimum Spanning Tree (MST), Convex Hull, and Alpha Shape.
Connects all points with minimum total edge length (Prim's algorithm).
Smallest convex polygon enclosing all points (Graham scan).
Generalized hull capturing concave regions (circumradius ≤ α).
Ratio of long MST edges (> Q3 + 1.5×IQR).
Asymmetry in edge length distribution.
Longest path relative to point count.
Empty space within convex hull.
How well points fill the hull.
Ratio of very short edges (< Q1 - 1.5×IQR).
Isoperimetric quotient of alpha shape.
Parallel edges in Delaunay (within 5°).
Rank correlation between coordinates.
| Parameter | Value | Affects | Notes |
|---|---|---|---|
α |
P90(MST) × 1.5 | Sparse, Convex, Skinny | Controls alpha shape detail level |
Outlier threshold |
Q3 + 1.5 × IQR | Outlying | Tukey boxplot definition |
Cluster threshold |
Q1 - 1.5 × IQR | Clumpy | Often negative |
Parallel angle |
5° | Striated | Parallelism tolerance |
Min points (K) |
5 | All | Below threshold → return 0 |
The 2005 paper included Straight (cstraight = dist(tj, tk) / diameter), later replaced with Sparse as Straight was redundant with Stringy and Monotonic.
References
[1] Wilkinson, L., Anand, A., & Grossman, R. (2005). Graph-Theoretic Scagnostics. IEEE InfoVis, 157-164.
[2] Tukey, J. W., & Tukey, P. A. (1985). Computer Graphics and Exploratory Data Analysis.