Hacker News new | past | comments | ask | show | jobs | submit login
Going Critical (meltingasphalt.com)
989 points by bkudria on May 14, 2019 | hide | past | favorite | 70 comments



This essay feels like an instant classic. It's a very thoughtful blend of prose and code, in service of teaching some wildly relevant lessons about networks.

One eye opener is the extent that the "Degree" parameter reduces the critical threshold.

> The degree of a node is the number of neighbors it has. Up to this point, we’ve been looking at networks of degree 4. But what happens when we vary this parameter?

And here's the power of this interactive essay, you can try it yourself. It's a toy model but makes a visceral argument. It adds up to the kind of media we dreamed of for the world wide web.

The degree parameter has exploded via social networks, greatly lowering the critical threshold of idea transmission. Our cultural DNA is being revised at a supercritical pace. This piece helps make a little sense of it in a way that static words couldn't.


Real networks are not lattices; they have varied degree distributions. The famous result of Marián Boguña and Alessandro Vespignani shows, that for scale-free networks (which closely resemble the networks we see in the real world), the epidemic threshold could be arbitrarily low: https://arxiv.org/pdf/cond-mat/0208163.pdf.

I have also published some research among these lines: https://arxiv.org/pdf/1403.5815.pdf


Real social networks aren't even networks, they are simplexes.


Ok. I don't know the word simplex, but after brief googling, it looks like simplexes are a type of network.

Could you explain what you mean a little bit more?


Why are they simplexes? I'm reading about simplexes, and having a hard time visualizing how they map to social networks


The hyper volumes between vertices can have single weights. Say you have a pump that is contaminated with cholera. You can either model the pump as a node on the graph with every person drinking from it being connected by an edge, allow an arbitrary number of edges between any two nodes, or allow for a relationship that can connect an arbitrary number of nodes.

I built a rather sophisticated simplex based trade analyser for one of my contracts for a broker trader. From what I've heard it's given them an edge since no one even knows about it. It's been three years so my NDA and non-compete are finished. I might get around to writing it up if I don't get hired to do another one.


With hypervolumes you mean all the (n-1)-dimensional 'faces' (n > 1) formed by a vertex and its neighbours in an n-dimensional simplex or a mesh of those? To assign unique weights to all interactions between a vertex and up to n-1 of its neighbours, I assume.


Yeah. And because they are sparse the representation is both tiny and stupidly powerful.


Would you care to share some pointers to materials where one could learn about simplicial representations of things traditionally modeled by networks?


There aren't any I could find, I had to do everything from first principles and the notebooks were left with my employer.


Maybe you mean something like a simplicial complex? I don't understand how you can do very much with a single simplex (which is a generalization of a triangle and a tetrahedron in the language I'm used to).


I think he is talking about a simplex tree.


Your comment informs little.

Do you mean that real social networks are to networks as tensors are to vectors? More dimensions?

If so, what do those dimensions represent?

Or am I missing your point (which is ... not very clearly made).


Regarding the importance of the degree distribution, we have pretty solid theoretical results on how it's actually the spectrum of the adjacency matrix which acts as a global metric for how well epidemics spread. I always like to recommend Ganesh et al.'s article [1] for how we came to understand this phenomenon but also Prakash et al.'s paper [2] for their theorem which holds for a very large class of epidemic models.

What's pretty interesting is that the largest eigenvalue of the adjacency matrix of an undirected graph lies between the average degree and the maximum degree so the gut feeling you get when playing with the degree of a graph is legitimate.

I will jump on the opportunity do shamelessly self-plug our most recent work [3] on how to modify the topology of a network to have the epidemic go subcritical and quickly disappear.

The basic idea in our paper is to keep the maximum number of edges from the original graph under the constraint that the adjacency spectrum is bounded. Since that's a NP-hard problem we go for an approximation algorithm.

In any case, Melting Asphalt's essays are really an example to follow! A gold standard for expository material!

[1] Ganesh et al. https://ieeexplore.ieee.org/abstract/document/1498374

[2] Prakash et al. https://link.springer.com/article/10.1007/s10115-012-0520-y

[3] Bazgan et al. https://link.springer.com/chapter/10.1007/978-3-030-04651-4_...


"Instant classic" is exactly how I felt - I "knew" all the parts of the article but it has a new context and direction that gives a moment of clarity.

Cannot recommend this enough.

And on a slightly related note a recent Talking Politics podcast had Sir David King who was UKs chief Scientific Officer to Blair. Early on there was a terrible Foot and Mouth outbreak and Blair was at a loss how to prevent the outbreak spreading from farm to farm. But King understood SIR / SIS networks (and experts in this who wrote the books) and said "give me carte blanche and we will fix this - by day X we will see a tipping point" And the Army shut down every farm, and on day X the infections stopped and he had sufficient political capital to push hard on things like Paris climate treaty (which UK has a lot of impact on)

In other words understanding this article lead to a global first on CO2 reduction.

Science Works Bitches


link to that podcast?



One of the more influential papers on my early thinking of distributed systems was the Xerox paper on Grapevine, their distributed naming service that used a viral model for propagating updates across their networks.

The most interesting part was transmitting 'dead' information. In the linked article things are alive or not. Now expand this article so that you have a series of things that are being transmitted (maybe blue, red, and green dots to represent each). And then you want to 'kill green' so that green is no longer considered a valid state. Networks need to retain the notion of a tombstone (in epidemiology a residual antibody) to quash flare ups of previously killed information (which can happen when a node that has been down for a while rejoins the network).

It makes for some great coding exercises!


You can use the concept of criticality to revise and revive the Sapir-Whorf hypothesis.

In the blunt form: language limits what we can say; Sapir-Whorf runs contrary to every-day experience. Sure, if ones native language contains le mos juste, it is easy to speak ones mind. But if not, the burden is not great. One must speak at greater length, using more words, and forming the intersections and unions of their meanings, to obtain the exact nuance that you intend. This is the routine craftsmanship of every wordsmith.

Early in the essay Kevin Simler posses a challenge "Here's an SIS network to play around with. Can you find its critical threshold?" What is most interesting is not the numerical value, lets just call it x. What is most interesting is that it is fairly sharply defined. If two idea are both fairly hard to transmit, and hence both close to x, we could easily have a situation in which the burden imposed by a missing mos juste makes all the difference. One idea has a transmissibility just above x and becomes an established staple of the culture. The other idea has a transmissibility just below x so it crops up from time to time but always dies out.

One looks around, admiring the cultural landscape. One idea is present, one absent. Why? Language! While it is wrong to claim that "a person's native language determines how he or she thinks", we have to take account of network criticality.

The much weakened Whorf-style claim that "a person's native language burdens their communications with trivial inconveniences" is plausible and unimportant at the individual level. But we may never-the-less find that "a social network's native language determines which thoughts die out and which ones take over most of the network."

Compare and contrast with Beware Trivial Inconveniences https://www.lesswrong.com/posts/reitXJgJXFzKpdKyd/beware-tri..., which claims that trivial inconviences have real world potency without needing the leverage provided by network effects.


Honestly, weak form of Sapir-Whorf absolutely does reflect my every-day experience; I figured it out way before reading about it, by pondering my "inner dialogue" - I run it bilingually, switching from English to Polish and back on sub-sentence level, always using the language that makes it easier to think a particular thought.

> But if not, the burden is not great. One must speak at greater length, using more words, and forming the intersections and unions of their meanings, to obtain the exact nuance that you intend. This is the routine craftsmanship of every wordsmith.

This is a nice way of putting it, but I question how "easy" and "routine" it is. People can do this, which is why strong form of Sapir-Whorf sounds too strong, but it's not free - and like "Trivial Inconveniences" article shows, that's enough for it to not be done, especially if alternatives like "picking up a similar but not-quite-right word" or "not thinking the thought at all" exists.

I feel this could be especially impactful on imagination (the problem-solving kind), which can be viewed as a randomized reverse-lookup[0]. The brain suggests you things connected to what you're thinking about, and - at least in my experience - they usually come up as words or phrases. If you don't have a word for a concept, you may not think of that concept, and concepts related to it. Not that you couldn't think of it, just you usually and initially won't.

One could think of language as a cache of those "intersections and unions of meanings" that have proven themselves to be useful. Viewed like this it's an optimization trick, but we observe that everything we do and think is time and energy-constrained, so such optimizations can be the difference (especially on a population level) between how precisely you think a thought before you accept it as "good enough".

--

[0] - Meta: the way I figured out this idea actually involved the brain suggesting me the word "reverse-lookup", and me going out from there. My native Polish language doesn't have a word for "lookup", and especially "reverse lookup", so I wonder what would I came up with if I didn't know English?


Sapir-Whorf may run even deeper; Simler's book [0] argues persuasively that social signaling is the primary motivator not only of language, but of cognition itself. So perhaps Sapir-Whorf puts the cart before the horse: we actually think in networks, based on our predictions of what will be socially tractable, and the language/vocabulary is merely an evolving toolkit to that end (with occasional innovation of a viral metaphor/portmanteau/etc).

[0] http://elephantinthebrain.com/


mot (not mos)


I am struck in particular by the 'expert --> idea' simulation. It suggests that an effective strategy for beating the competition to the punch in making a breakthrough discovery is to concentrate a diverse collection of expertise (it also explains why it pays to be very social in your career as a researcher).

As mentioned in the article, putting specialists together in the same room is one way to accomplish this, but I can imagine the same happening in the mind of a single polymath, who, though perhaps being mediocre in several subjects, connects enough dots to beat the competition to combining them in a novel way. It might also make sense to recruit a few such polymaths/generalists to be put in your room of distinct experts, since they might serve well as a sort of 'interconnect bus' between them.


> I am struck in particular by the 'expert --> idea' simulation. It suggests that an effective strategy for beating the competition to the punch in making a breakthrough discovery is to concentrate a diverse collection of expertise (it also explains why it pays to be very social in your career as a researcher).

That's what Think Tanks and Research Divisions used to be, You'd put a bunch of smart people from different disciplines together, give them some money and a vague direction and stand back.

Building 20 at MIT is another example I can think off, or the Collosus project during WWII (Tommy Flowers and Alan Turing where from radically different backgrounds - Turing was pure theoretical genius, Flowers was an apprentice mechanical engineer who put himself through night school to learn electrical engineer and then worked for the post office - together they (and others) built the Collosus Mk1).


I vaguely remember a short story (I think by either Asimov or Clark) where there were specialists who's entire job was to read and write in a couple of different fields, like, say, particle physicist and farmer. They didn't perform any experiments, but instead kept up with current research in their fields and searched for general patterns that united them. I've never been able to find the story again though...


I think the conventional term for that strategy is "the creative process."


Fair point! :-) What can I say, my training is in math, where we take great pleasure in squinting at obvious things until we feel insightful.


Positively pleonasmic! :)


I'll echo a lot of the praise about the format of this post. It was fun clicking and watching the figures animate as I changed parameters.

I was sad that the author missed a number of chances to get into more detail about the classic reaction-diffusion problem [1]. I was reminded of a small project I did which produced similar animations, though with periodic boundary conditions, for learning about the Gray-Scott model. These websites are pretty helpful [2][3].

I haven't ever taken a class on systems so I don't know, but after reading this I wonder if the propagation of "scientific bullshit" and "truth" through a network can instead be modeled chemically as in a reaction-diffusion model. The last figure shows real knowledge fizzling out because it turns fake. It also lacks a slider so I can't play with the parameters but there should be some point where they oscillate back and forth, i.e. a Hopf bifurcation or a Turing bifurcation. Adding a bit more complexity might add some more depth to this post. I hope there will be a sequel!

[1] https://en.wikipedia.org/wiki/Reaction%E2%80%93diffusion_sys... [2] https://groups.csail.mit.edu/mac/projects/amorphous/GrayScot... [3] https://www.karlsims.com/rd.html


Is the SIR model a Markov model? The article doesn't mention whether or not infection can be modeled as a Markov process, but based on the graphics it looks like it.

If I understand correctly the probabilistic infection rate is "history-less"; in other words, the probability of infecting an adjacent neighbor in the current state is not determined by the state transitions of any previous iterations.

It looks like you could model this naively with a discrete time Markov chain using a 3x3 stochastic matrix and three states: healthy, infected and deceased. I would guess you could do the same thing for the SIS model using states susceptible and infected with a 2x2 stochastic matrix instead.

In either case, modeling the epidemic as a Markov process would let you estimate the probabilities of criticality using the limit of the stochastic matrix. In fact, I think the critical threshold (probability of the epidemic going critical) will be given by left multiplying the initial probability vector by the limit of the stochastic Markov matrix.


> It looks like you could model this naively with a discrete time Markov chain using a 3x3 stochastic matrix and three states: healthy, infected and deceased.

Diffusion is a Markov process, yes. But you'd need three states per cell (not sure if that's what you meant by three states).


If you liked this article, here's another one about the spread of information in social networks: https://ncase.me/crowds


This is a really interesting topic and the article is nicely written.

What bothers me, though, is the effort to link the mathematics to 'real-world' applications. I agree that forest fires, disease outbreaks, and the spread of ideas might be good candidates for this sort of modelling. But I think you'd need an awful lot of solid evidence to back that up.


Without the real-world applications a lot of this could seem dry and boring or irrelevant.

Many years ago I went through teacher training, and one of the biggest things you learn is to always make material relevant to students by linking abstract concepts to real-world applications they actually care about.

It is true that in writing an article like this, you need to be very careful with your wording to distinguish between things that "appear like", "are similar to", "suggest", or even "is a first-order approximation of", versus stating that this is the model of epidemiology, forest fires, etc. (which needs citations, etc.). But in this particular article, the examples seem fairly straightforward as first-order approximations -- curious what sentences you're specifically objecting to?


> curious what sentences you're specifically objecting to

Lines like this: If we're simulating the spread of measles or an outbreak of wildfire, SIR is perfect. The author could have said "this kinda-sorta looks like it might be useful in simulating wildfire but I haven't checked", but of course that would be less convincing and less exciting.

Which is fine as far as it goes. The problem is what happens when someone takes one of these models seriously without actually checking the details, or without being qualified to check them.


I think the author does a pretty good job of highlighting that there are imperfections and shortcomings while eliding over those details. They may have slipped up in some places such as your quote but I'm definitely not concerned of anyone taking any of this too seriously and extrapolating on it dangerously.


> I'm definitely not concerned of anyone taking any of this too seriously and extrapolating on it dangerously.

Not this particular article, no. But there's a whole genre of excited mathematical modelling literature where the author demonstrates a gee-whizz concept that looks like it could be really useful. The trouble is that once you start digging down into the specific details, they turn out to be really hard to get right, and at best you end up with a model that's brittle, for want of a better word.

An example that I have in mind is the literature on power law distributions. A little bit of theory showed how power law distrbutions could arise via a process known as preferential attachment, and everyone got excited and suddenly people were spotting them everywhere. The literature on this topic is massive.

The thing is, it turns out that it's quite hard to check that a given dataset follows a power law. This paper [0] showed that many of the claims were sloppy, and the researchers hadn't been careful with their statistics.

The crux of what I'm saying is that establishing that a model fits well is hard, whether it's a diffusion model, a power law distribution, or anything else. If someone wants to claim that some mathematical widget can be used to model X, they'd better be able to back that claim up with a real demonstration and carefully laid-out details. Otherwise they're just waving their hands in the air.

[0] http://www.cse.cuhk.edu.hk/~cslui/CMSC5734/Clauset_Shalizi_N...


What happens when someone uses a hair dryer in the bathtub?

As someone who spent a fair number of hours teaching mathematics to engineering students, exciting is what you want. What use is it to state the complex hypotheses of a theorem if the student cannot feel the result coming?


> The problem is what happens when someone takes one of these models seriously without actually checking the details, or without being qualified to check them.

That's a problem with absolutely any kind of statement you can put on-line. There's only so much an author can do to prevent stupid from misreading them; the responsibility is not all on author here.

> The author could have said "this kinda-sorta looks like it might be useful in simulating wildfire but I haven't checked"

Why? Maybe they actually checked. I did, and keeping in mind that I'm not a medical scientist, it does seem that SIR model with modifications is widely used to study disease spread, and it also does look like a good first-order approximation for measles. It also does look like a good first-order approximation of wildfires.

I actually did a thesis on using cellular automata for simulating indoor fires and used some of the same ideas presented here. SIR is essentially what I'd get if I ignored the details of heat convection, conduction and radiation - which at the scale of forest fire is something you can do (unlike my work, where modelling these details was kind of the entire point).


> SIR is essentially what I'd get if I ignored the details of heat convection, conduction and radiation - which at the scale of forest fire is something you can do

How do you know? To validate this sort of model surely you need to know an awful lot about statistical modelling and about forest fires.

The main sort of insight this model would give you is that setting up clear spaces like roads through a forest would hinder the spread of fire. If you took the model literally, you might end up ignoring the fact that sparks can be carried by the wind to areas that are far away from the trees that are currently burning.

> it does seem that SIR model with modifications is widely used to study disease spread

Fair - there's a whole wikipedia page [0] about this sort of model. But like I'm saying, that page is big on theory and light on evidence. Those models are full of parameters like transmission rate that are not typically known until after the fact.

The world is full of theorists writing down academic models that are, frankly, a little useless. What I would find more convincing is a writeup of how a big practitioner health organisation like the CDC or the WHO used one of these models to gain a new insight that they couldn't have found any other way.

[0] https://en.wikipedia.org/wiki/Compartmental_models_in_epidem...


> How do you know? To validate this sort of model surely you need to know an awful lot about statistical modelling and about forest fires.

First-order approximation and all. You validate according to your needs, but yes, you'd also need to know a lot about statistical modelling and the domain. I'm guessing the author just read that SIR can be used for some diseases and forest fires, and that what they read was written by people who do know that. FWIW, our CA models were validated by our supervisor (who had access to firefighters) to roughly the level of "reproduces what happens on recordings of real fire" - which was good enough to show that the model has a potential to give real-time insights, but not something I'd like an actual firefighter to use on the scene.

> If you took the model literally, you might end up ignoring the fact that sparks can be carried by the wind to areas that are far away from the trees that are currently burning.

That's a very good point. However, like with all models, you need to be aware of the limitations. Maybe the author should have guarded the text for this, but I doubt the government epidemiologists and firefighters were the target audience here. I expect these professionals to understand the models in greater depth before using them (though maybe I'm hoping for too much - given how our industry is full of "professionals" mindlessly copy-pasting code from SO).

> that page is big on theory and light on evidence

Fair. I was actually going to link to a Wikipedia page in my previous comment, but I noticed a surprising lack of citations for the claims made. So instead I went and looked around Google Scholar before saying that "it does seem that SIR model with modifications is widely used to study disease spread".

> Those models are full of parameters like transmission rate that are not typically known until after the fact.

Sure. Again, I hope that no epidemiologist uses the article to develop disease spread models. But it works as giving general overview and intuition about network models, with focus on the phenomenon of criticality.

> What I would find more convincing is a writeup of how a big practitioner health organisation like the CDC or the WHO used one of these models to gain a new insight that they couldn't have found any other way.

I would absolutely love to see that.

BTW. I hope we're not just arguing about whether the word "perfect" was used in the article correctly.


> BTW. I hope we're not just arguing about whether the word "perfect" was used in the article correctly.

I guess I'm trying to make a point about theory vs. practice. I think theorists tend to be pretty cavalier about writing down models and waving away the fine details, whereas practitioners tend to appreciate how hard it is to get the details right.

As an example, the literature on finance is full of this stuff. There's a whole body of literature on how to create an optimal stock portfolio under various constraints assuming you know the joint distribution of individual stock returns. It turns out that fitting that distribution in a sensible way is extremely difficult to do. The theorists came up with a `clever' model that's mostly useless in practice, but everyone still insists that it has `applications' in finance.

Personally I find theory really interesting, and beautiful in its own right. It just annoys me when the usefulness of that theory gets overstated.


I took calculus at the same time as advanced placement physics in highschool. Calc was taught dry and contextless, I wasn't doing well. Then we started learning calc in physics, and wowee it made so much more sense, would you believe my grades started going up in calc too.

This article is a great example of that for diffusion I think.


Fantastic blog post! The interactive demos show network effects, including spreading activation, diffusion, and game-of-life style movements.

The real-world examples of epidemiology and city technology expertise are perfect IMHO. Kudos to the author!


Very cool!

As a small translation aid, when physicists talk about criticality, they tend to talk about "dimensions" instead of "degree".

In a one-dimensional system, a line, you can have at most two nearest neighbors, in a two-dimensional system 4, 3D has 6 neighbors, and so on.

Physicists have no trouble talking about fractional dimensions either, which can be realized in surfaces, fractal-like substances and so on.

Dimensions higher than 3 are achieved when interactions between non-nearest neighbors are relevant.


While these examples use lattices for simplicity, the terminology comes from network science and graph theory, where there are degrees and connections, and "dimensions" mean something else (multi-dimensional networks is a rich theory on its own).


If you want to learn the underlying math for calculating these, I highly recommend Robert Gallager's "Stochastic Processes".

It was one of the more eye-opening math books I ever read.


I can't recommend his book with Robin Hanson, The Elephant in the Brain, enough, it will change the way you view the world forever


For more on percolation, the first exercise in this course is pretty cool:

https://www.coursera.org/learn/algorithms-part1


Very nice.

I do not believe the spread of knowledge works this way, however. For one thing, knowledge is productive. Consider a population of agents each having a body of knowledge some kind of prolog-like set of facts and implications. Clearly the receipt of the same new fact p by different agents will have different logical closures. If for some agents but not others there is the fact p |= q, then the spread of q might appear to work quite differently than the epidemiological model.


Of course I used the comments on PhDs to tease my friend. In reality, this article, and my friend (advancing use of VR, etc in education) are two great examples of using technology to discover and spread knowledge. Of course, network effects/diffusion are critical for the spread of knowledge. In both cases, by using accessible technology and by making knowledge digestible/approachable, you can increase the transmission rates.

This article takes a complex set of ideas and presents them by using appropriate symbols, building a base of knowledge, and then adding to it incrementally. I thoroughly enjoyed it, and I hope many of you take the time to enjoy it as well.

(One final thought - one of my early JavaScript projects a couple decades ago was to write a simulation like this using Conway's Game of Life and some GIF files I created in MS Paint of red/blue dots, two types of trees, or the front/back of pigs. So there's some nostalgia boosting my enjoyment here!)


See also this recent Numberphile that considers the eventual guarantee of extinction based on the rate of reproduction (using the example of surnames) https://m.youtube.com/watch?v=z34XhE5oRwo


FWIW, regarding his analysis on academia being important, I remember seeing an analysis that research has the most positive externalities out of any industry, i.e. more than healthcare and education


Oh man. I needed this.

I'm working on a map generator, and the previous version generated land by placing tiles at random. I implemented the SIR algorithm, and now the map looks much more like an actual map.


I quite like the way in which it shows the similarities between infections and knowledge. It has a model that explains how self interest hinders knowledge in more ways than simply reducing the resources available for research. It also explains ideas spread like viruses generally, and reminds me again of the idea of the mind virus.


What does this say about a network such as Hacker News?


Anyone can recommend JS toolset for crearting essay like this? I can see React components used, but maybe there is something less complex?


I haven't used it, but there's idyll[0], which was used, among other things, to make an excellent interactive article about the JPEG format[1].

[0] https://idyll-lang.org/

[1] https://parametric.press/issue-01/unraveling-the-jpeg/



Great, great article. Very informative and interactive. I loved learning about networks, and everything made sense to me intuitively!


Absolutely loved the interactive part of the article as a none science / math person. food for thought and deeper research....


Nice to see a melting asphalt classic blogpost


Why is the percolation threshold lower than 25% for a square lattice?


Each node has a link to itself and can re-infect itself on the next iteration, so a node has 5 different nodes it can infect in the next iteration rather than 4.


Hmm, so is the critical rate 22,5 then for between 20 and 25.


should add a reference to this https://www.nature.com/articles/srep08665

which presents the first method which accurately quantifies the spreading power of all nodes in a network.

It's based on applying statistical physics to diffusion over a network, which is why it outperforms all prior approaches such as degree/k-shell/page rank/ etc etc.


This is an awesome interactive demo. In my head I never really connected the spread of wildfires to vaccination and how similar their network effects can be.


So a few anti-vaxxers aren't really that damaging and should be left alone...


You're trolling or joking, I presume, but on the off chance you aren't, do please explain your reasoning.


No, they should be burned with wildfire!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: