Nice yea I would agree with the vast majority of this, only thing I would add is that Adam/gradient methods are still useful in a graphical model e.g. to get a MAP estimate (and then you can get a rough posterior estimate using variational methods or Laplace approximation once you find the MAP). But I agree I wasn’t clear about what I mean when I say graphical models since I think most people would understand graphical models to mean a full MCMC sampling of the posterior and marginalization over hyperparameters. I would say it’s useful to understand why people do that and why that is useful, but many times that is (1) overkill and (2) inspires overconfidence in the result because once we marginalize over our prior distribution people tend to forget that our prior may have been a complete fudge. I just mean graphical models as a tool for model building, understanding how different models relate to one another, and as a recipe for deriving a loss function.