I think that’s a pretty apt comparison. A latent variable (or latent factor in PCA terms) is (basically) a direction in a n-dimensional space, where n is the length of the vector. The direction is correlated with some type of variance in the input data. Oftentimes this represents something that has some useful meaning (“dogness” vs “catness”, for example), but it could also just represent a correlation that has no interpretable meaning.
This is probably a dumb question but if we’re talking about language embeddings, are the latent vectors deterministically out of vocabulary? Is there any possibility of collision with an in-vocab n-gram’s vector?