Can you please try and explain why that is? If true, you're very correct that lo...

orangecat · on Nov 4, 2017

Because an outlier in any single dimension will put the point outside the "center" of the distribution, and as the number of dimensions increases there's more of a chance of that happening.

Say you have an N-dimensional gaussian where each dimension has mean 0 and standard deviation 1. Define the center as the N-dimensional cube whose edges go from -3 to +3 in each dimension. A normally distributed value is within 3 standard deviations of the mean with probability 0.9973, so the probability that an N-dimensional point being in the center is 0.9973^N. With N=4 that's 0.989 which matches your intuition, but at N=1000 it's 0.067 and at N=10000 it's 1.81e-12.

smallnamespace · on Nov 4, 2017

The center of the distribution always has the highest density, but the ratio of 'probability mass close to centroid' / 'total probability mass' drops off as number of dimensions grows.

This is somewhat related to another 'curse of dimensionality' observation, which is that the volume of a hyperball / volume of hyperspace tends towards zero as dimensions grow -- there's just a lot more volume that's in some sense 'far' from the center.

eli_gottlieb · on Nov 5, 2017

>If true, you're very correct that lower-dimensional intuition does not transfer into higher-dimensional spaces: my intuition tells me that a Gaussian distribution drops off as you fall away from the mean, and it's quite easy for me to imagine that in 2 dimensions, 3 dimensions (e.g. by imagining a mound on a plane) and 4 dimensions (e.g. a cloud in 3-space with increased density around the mean).

Density is different from mass. Namely, mass is the integral of density. So your intuition is roughly correct for density, but you need to make it accord with a good intuition for mass.

Since getting the mass requires an integral, getting the mass over N-dimensional distributions requires integrating an N-dimensional region, which means N integrations for N dimensions. Each integration is, intuitively, a kind of sum. Integrating out many dimensions happens recursively; looped or recursive addition is multiplication. So on some level, to take the probability mass of a region in N-dimensional space, you need to "multiply" a density.

Since the total probability mass is fixed (1.0), adding more dimensions means you need to "multiply" the density by a larger number to get the mass, which means you need to divide the mass by a larger number to get the density, which means that despite the density peaking at the mean, the available density at any given point gets smaller as the dimensionality rises.

nabla9 · on Nov 4, 2017

> it's quite easy for me to imagine that in 2 dimensions

It starts to fail really badly when dimension grows.

Two simple examples:

1) Consider 3 dimensional unit sphere centered at origin and unit cube centered at origin. Cube is clearly completely inside the sphere. Now generalize to n-dimensions. Hyperdimensional volume of hypercube with side length 1 moves almost completely outside the n-sphere with radius 1 when n-grows.

2) Alternatively almost all volume of n-sphere is close to the surface.

These are all very counterintuitive, yet simple to check toy examples. When you start to integrate over more complex multidimensional function, things get weird really fast.

tzahola · on Nov 5, 2017

>Alternatively almost all volume of n-sphere is close to the surface.

How does this go against intuition?

Intuition from 1/2/3d tells me that the volume of an N-ball is O(r^N), and indeed it is the case in higher dimensions. Therefore it’s easy to see that the difference between the volume of an N-ball of radius r and an N-ball of radius (r + epsilon) will grow exponentially with N.

vbuwivbiu · on Nov 6, 2017

isn't this just because we're comparing n-dimensional objects by a 2-norm ? i.e. the dimension of the space grows but we're keeping the dimension of the norm fixed, but if we used the p-norm of the same dimension as the space, then maybe that would return intuitive results ?