A couple of years ago I was just learning Python and was playing around with matplotlib. Running simulation of a dice roll 100, 1000, 10,000, 100,000, and 1,000,000 times started to show how the distribution starts to catch up with the expected 1/6th probability of each face. I was thinking how good it would be to teach young students this way.
Definitely! Also, not just young students. If you can get over code-phobia, doing random experiments in a class can be really illustrative. When I teach hypothesis testing, I always teach it both from a simulation perspective and from a traditional perspective.
For one, by doing the simulation part directly it's easier to see the "under repeated sampling..." logic inherent in frequentist procedures. Additionally, it's possible to do simulation-based procedures where traditional methods break down (think: permutation tests).
In an effort to reduce screen time, I recently tried to instigate a game of classic table-top Dungeons & Dragons. And I swear, kids were even more interested in the BigInt N-sided die function I cribbed in a python shell than any demons or demigods ;)
Seeing Theory interactivity is very interesting. I think if there is one canonical example to tie it all together it would be something akin to "estimate the likelihood of an extremely rare event". Say, you're a top astrophysicist at NASA and you have to give the President a briefing on the improbability not impossibility of an extinction level asteroid event. And you must justify how those beliefs are informed by and change with data. It ties everything together: physically based world models, event spaces, conditional probabilities, monte carlo sampling and entropy estimation. And would be really fun to boot!
I had a similar idea to teach physics, well mostly scientific thinking and process by using a physics game engine for our world or some esoteric one and then asking them to perform some actions by creating theories of how the system behaves. Moving from a qualitative analysis of the system to building concrete quantitative theories and then in later stages having their simple theories fail as we deal with more complex interactions with the system and having them adapt their models of this world or rethink another.
Imagine how much their thought process would change if they intimately understood how scientific modelling works.
A couple of years ago I was also a great fan of this paradigm where you try to convince yourself that you understand a math concept by coding/simulating it (a procedural, rather than conceptual understanding, if you will). Here for instance I studied the so-called "Secretary Problem", using the tools you mention:
Generative models map well to programming concepts. Mixtures are quite similar to composition, and hierarchical models can be understood as inheritance. Lots of classical models like HMM, LDA, etc are quite similar to those presented in the GoF book in the sense they combine composition and inheritance in some particularly interesting manner.