Graph-Theoretic Scagnostics: A Technical Reference

Based on Wilkinson, Anand, & Grossman (2005)

Abstract

Scagnostics (scatterplot diagnostics) are nine graph-theoretic measures that characterize properties of point clouds in scatterplots. These measures are derived from three foundational geometric structures: the Minimum Spanning Tree (MST), Convex Hull, and Alpha Shape.

1. Foundational Geometric Structures

Minimum Spanning Tree

Connects all points with minimum total edge length (Prim's algorithm).

✓ Deterministic

Convex Hull

Smallest convex polygon enclosing all points (Graham scan).

✓ Deterministic

Alpha Shape

Generalized hull capturing concave regions (circumradius ≤ α).

⚙️ α = P90(MST) × 1.5

2. The Nine Scagnostics Measures

1
Outlying MST
Σ(outlier edges) / Σ(all edges)

Ratio of long MST edges (> Q3 + 1.5×IQR).

⬆️ High = Isolated points
⚙️ 1.5 × IQR threshold
2
Skewed MST
Edge dist.
1 - (mean / max edge)

Asymmetry in edge length distribution.

⬆️ High = Right-skewed
3
Stringy MST
diameter(MST) / (n - 1)

Longest path relative to point count.

⬆️ High = Chain-like
4
Sparse α+Hull
gap
1 - (α-area / hull-area)

Empty space within convex hull.

⬆️ High = Holes/gaps
⚙️ Uses α parameter
5
Convex α+Hull
α-area / hull-area

How well points fill the hull.

⬆️ High = Uniform fill
⚙️ Uses α parameter
6
Clumpy MST
count(short) / count(all)

Ratio of very short edges (< Q1 - 1.5×IQR).

⬆️ High = Local clusters
⚙️ 1.5 × IQR threshold
7
Skinny Alpha
P² / (4π × A)

Isoperimetric quotient of alpha shape.

⬆️ High = Elongated
8
Striated Delaunay
parallel / total × 10

Parallel edges in Delaunay (within 5°).

⬆️ High = Layered
⚙️ 5° angle, ×10 mult.
9
Monotonic Coords
Spearman(x, y)|

Rank correlation between coordinates.

⬆️ High = Linear trend

3. Tunable Parameters Summary

Parameter Value Affects Notes
α P90(MST) × 1.5 Sparse, Convex, Skinny Controls alpha shape detail level
Outlier threshold Q3 + 1.5 × IQR Outlying Tukey boxplot definition
Cluster threshold Q1 - 1.5 × IQR Clumpy Often negative
Parallel angle Striated Parallelism tolerance
Min points (K) 5 All Below threshold → return 0

4. Historical Note

The 2005 paper included Straight (cstraight = dist(tj, tk) / diameter), later replaced with Sparse as Straight was redundant with Stringy and Monotonic.

References

[1] Wilkinson, L., Anand, A., & Grossman, R. (2005). Graph-Theoretic Scagnostics. IEEE InfoVis, 157-164.

[2] Tukey, J. W., & Tukey, P. A. (1985). Computer Graphics and Exploratory Data Analysis.