Latent Space Analysis#

inspect_latent_space() is designed for exploring the geometric structure of high-dimensional representation spaces: transformer hidden states, VAE latent codes, contrastive learning embeddings, GAN feature maps, etc.

What it renders#

Scatter cloud — one point per sample, coloured by class label.
Class centroids — diamond markers at the weighted centre of each cluster.
Mahalanobis ellipsoids — confidence ellipsoids that account for the full covariance structure of each class, not just its spread along each axis.
Convex hulls — transparent polyhedra bounding each class cluster.
Variance annotation — the explained variance ratios for the three projected dimensions (shown for PCA only).

Projection methods#

All three methods are supported: "pca", "tsne", "umap".

For embeddings with a known low-dimensional signal structure (e.g. cluster means separated along a few axes), set scale_input=False to prevent StandardScaler from equalising variance across all dimensions — which would wash out the signal.

cfg = DARK_SCIENTIFIC.copy()
cfg.projection.scale_input = False

fig = inspect_latent_space(embeddings, labels, config=cfg.with_method("pca"))

Example — sentence embeddings#

import numpy as np
from geolatent import inspect_latent_space, DARK_SCIENTIFIC

rng = np.random.default_rng(42)
D = 64

cluster_means = [
    np.array([0, 0, 0]   + [0] * (D - 3), dtype=float),
    np.array([8, 0, 0]   + [0] * (D - 3), dtype=float),
    np.array([0, 8, 0]   + [0] * (D - 3), dtype=float),
    np.array([4, 4, 6]   + [0] * (D - 3), dtype=float),
]
noise = np.full(D, 0.15)
noise[:3] = 0.7

embeddings = np.vstack([
    rng.normal(size=(200, D)) * noise + m for m in cluster_means
])
labels = np.repeat([0, 1, 2, 3], 200)

cfg = DARK_SCIENTIFIC.copy()
cfg.projection.scale_input = False

for method in ("pca", "tsne", "umap"):
    inspect_latent_space(
        embeddings, labels,
        config=cfg.with_method(method),
        show_ellipsoids=True,
        show_convex_hulls=(method == "pca"),
        class_names={0: "Science", 1: "Politics", 2: "Arts", 3: "Sport"},
        title=f"Sentence embeddings — {method.upper()}",
    ).show()

Ellipsoid confidence#

The ellipsoid_confidence parameter (default 0.90) sets the probability mass enclosed by each Mahalanobis ellipsoid, assuming a Gaussian class distribution. Lower values produce smaller ellipsoids that highlight the dense core of each cluster.

fig = inspect_latent_space(
    embeddings, labels,
    show_ellipsoids=True,
    ellipsoid_confidence=0.70,   # tighter ellipsoids
)