Hacker News new | past | comments | ask | show | jobs | submit login
US Military Scientists Solve a Fundamental Problem of Viral Marketing (technologyreview.com)
164 points by tchalla on Sept 17, 2013 | hide | past | favorite | 66 comments



>"For example, with the FourSquare online social network, under majority threshold (50% of incoming neighbors previously adopted), a viral marketeer could expect a 297-fold return on investment"

This is illustrative of what bothers me about the paper: they're already trying to market the algorithm, and they're not being honest about its limitations. Now I'm uncertain of whether to trust them on the network analysis part at all.

Take the cited example: What is the return? It will vary wildly and take into account factors like how related the message is to the product/service, previous exposure, quality and likelihood to share, and website quality/conversion rate. Second, what is the investment? Marketing costs can range from 0 to many tens or even hundreds of thousands.

To those who say the "message is all that matters": relevant, quality content fails to go viral all the time. It's easy to think it doesn't happen if you don't work in marketing, since you'll never see it. Identifying key people effectively really can have value in a marketing campaign. That said, connected people (also most people) will likely ignore you if your message or content is shit (uninteresting, unsurprising, unclear, etc.)


Shakarian isn't an unknown in the field of network analysis. The work he did specifically with network shaping to increase fragility was pretty well received (and also an interesting strategic innovation along with the algorithm).

With regard to the 297 figure, my guess is that it had a context that didn't make it into the article. His papers are pretty well reasoned and quite upfront about the capabilities of a given process or algorithm or strategy, so I find it unlikely that he's gone all used-car salesman this time around.

*Picking nits: the author of the article misspelled Shakarian's name.


You give them far too much credit. They didn't formulate a problem, make some assumptions, and then come up with a solution based on it.

What they did figure out "slick approach to a well-known problem [in graph theory]" then sell it as something that viral marketers could actually use. I see no evidence that this graph theoretic problem, although it was motivated by various real world problems, is actually a computational problem that people try to solve in the real world.

In fact, I'm pretty sure viral marketers would simply target products at people with lots of facebook friends who pass some kind of "not fake" test.


What's interesting, and not covered by the article, is why this is interesting to the military. I imagine that by identifying this group you could also stop virality of ideas, eg radical Islamic terrorists. Presumably the US has lots of data connecting phone records of these people, and they could use this network analysis to figure out who to target to reduce future flow of new ideas... could be powerful!


It could also be used offensively to start a rebellion or change political outcomes.


I smell political econometrics a brewing.


All edges derive their value from the other participants not having them also. Handheld programmable calculators once made Thomas Petterfy, Steve Fossett, Joe Ritchie and others very rich men. When people saw their success with black scholes in the options pits and followed them into the pits with calculators, the pits died and all of the trading volume moved off the floors.

When everyone got computers that were faster than calculators, hedge funds and bank trading desks bought mainframes and colocated them next to or with the exchanges' servers and all of the trading volume moved to high-frequency trading-friendly dark pools and the exchanges...

In the context of political influence, what does everyone think all of the campaign money goes toward these days? A lot of angry people start revolutions and there is nothing technology can do to placate someone(s)who is being abused for any reason and certainly not at scale!

All of these marketing firms are chasing the mass market consumers/voters that have had diminishing spending power for over a decade now. Here's a tip: build something really expensive and desirable for someone really rich because they have lots of spending power and few things to do with the money.


It also helps in containing real virus.


As a marketer, my view on content is that there are three necessary elements:

1) Quality/On-Topic

2) Correctly Timed with current events

3) Luck

It's actually quite similar to startups in general when you think about it.


4) Breaking the rules of communication or as I like to call it blasphemy.

For instance this WV commercial

http://www.youtube.com/watch?v=HnL-7x4n4d8

would never have been possible to do by WV themselves. Yet it is exactly this kind of commercials that have a high chance of going viral.


I wonder if the 297 number is essentially the ratio of the entire network to the seed population? So, you pay to reach x people who then spread it to 297x others.

Still, dubbing even this ROI would be a huge stretch.


That's the crux with most quantitative studies in social sciences. They are able to deliver impressive sounding claims, but in the end the external validity is more than questionable.


I'm somewhat bemused by the entire thing - as effectively they're just going "look, memetic drift imitates genetic drift!" - which isn't really all that surprising. If you want to propagate a gene through a population - start with an isolated group, let them breed, then unleash them on the wider population. This is how stuff like lactose tolerance got going.

If you want to propagate a meme through a population... you get the picture.


This is a pretty clever hack. The problem is if the tipping point theory is correct (#) (that you will perform behaviour X if a sufficent number of your immediate circle also are performing behaviour X) you still need to find a subset of the network to start the ball rolling.

The solution is simple - say the tipping point is 20 friends must do X for you to do it. So walk the whole network and find everyone with more than 20 friends. Remove those with the most friends (say the those > 99th percentile).

Now walk the network again, and find those with more than 20 friends and again 99th percentile goes. Eventually you remove the 20th friend from everyone and those left have 19 or fewer friends.

Now tell all these seed group to do behaviour X.

Now put back the very last set you removed. And they are guaranteed to have at least 20 friends, and all those friends will be doing behaviour X. Now put back the penultimate group, and because the most recent arrivals are now also doing behaviour X ....

Problems:

1. For any social network at a given point there is just one seed group. Right now (or 6 days ago) this group is being identified. And sold.

2. the tipping point theory as a whole is a bit dodgy (see below).

3. Feasibility - they only mentioned orders of magnitude smaller subsets. Lets be kind and suggest that its 3 orders of magnitude. for LinkedIn that leaves a seed group of 100,000's. Not the size you can just invite into a focus group.

Overall, really a cool hack, and I swear its worth ponying up on and selling to excited digital agencies

(#) Good article on Debunking of tipping point theory - http://www.fastcompany.com/641124/tipping-point-toast

Edit: Wanted to rewrite my below comment that got a bit confused


In terms of #3, it's even worse than 100,000's - for LinkedIn it would likely be millions:

"In general, online social networks had the smallest seed sets - 13 networks of this type had an average seed set size less than 2% of the population (these networks were all in Category A). We also noticed, that for most networks, there was a linear realtion between threshold value and seed size"

Though, for a company with direct access to their users via the UI, it would be a fairly trivial task to reach a significant subset. LinkedIn could reasonably push UI updates only to the target population. Given they have full access in the first place, I'm uncertain as to why they would want to engage in this form of marketing, though.


>> I'm uncertain as to why they would want to engage in this form of marketing, though

They might not want to on their own, however if some advertiser really had the desire to try to hit the entire seed population then LinkedIn could sell that target population at a higher CPM because of the relatively high projected value of those individuals.


> 2. the tipping point theory as a whole is a bit dodgy (see below). > (#) Good article on Debunking of tipping point theory - http://www.fastcompany.com/641124/tipping-point-toast

It seems to me that the article doesn't debunk at all the existence of a tipping point in social networks phenomenons.

In fact, it clearly states that it is the law of the fews, "that rare, highly connected people shape the world", that seem to be inexistent in Watts experiments.

If trends are really like forest fires, then there is a tipping point ; it is just not required to have these highly connected people on board to reach it.


It makes an interesting case for Facebook's paid messaging service. Combined with some well crafted / personalized messaging, you could probably get a large number to click through. Even if the seed group is a million users, at a dollar per user that's only a million dollars. Large marketers spend that much on advertising without even blinking. If it gave them a reasonable chance at going viral, no doubt many would consider it well worth the investment. The only problem would be the potential backlash when people noticed that it had started in such an artificial way.


a rewrite after reading the paper got lost in my noprocrast setting (no fault of server I just gave up).

anyway they use % base not absolute threshold (obviously really) but there are some lovely big questions coming out of this - if anyone is at PyCon UK this weekend and interested in throwing some thoughts around please shout.


The "tipping point" theory popularized by Malcolm Gladwell's book was proven to be too simplistic and flawed by Duncan Watts, a network researcher currently at Yahoo. His book "Everything is Obvious - Once You Know the Answer" debunks this popular misconception. He describes many studies, simulations and actual experiments that show that how fast something spreads virally has less to do with where it starts (influential groups/tipping point), than how susceptible a person is to being influenced (ie how infectious the idea is to begin with). The book is great btw.


Thank you for this bringing this up. It was the first thing I thought about when reading the article. I also recommend the book.


Thank you for the pointer.


Some may not realize the context of this research. Consider the funding and authors. This solution is meant for and will be, what can effectively be called, weaponized. There are many efforts that will be quite interested in this research and will build it into their solutions that are sold to all the players we all so well know at this point, and others you have never heard of and never will.

Consider a future that is beyond the present in which social network analysis is used for identification, targeting, and disruption of social networks of all kinds ...something like OWS if you will...; when the subject type of research is implemented to understand how to prevent opposition by those who's interest it is that you and those around you don't oppose, cannot organize, and are disrupted faster by knowing exactly who the linchpin is that has to be neutralized to disperse any organization.


I am inclined to believe the reason for the lack of any meaningful protests and quite strange vanishment of OWS is in great part due to these new tools by the people in power. Meaningful protests would be against the using of taxpayer money to gift to private industries and the sellout ones digital life and rights to foreign countries.


Assuming the use of the technique spreads will it have a self defeating affect whereby having everyone 'spam' these core agents that people will begin to change their behavioyr in some way?


That's exactly what will happen. The weakness of much social science is people make judgments based on no external influence. Once the influence happens, the outcomes change. This is also why market inefficiencies disappear after they've been published. In more specific terms, you might accept 4 spams from a friend, but not 20 Candy Crush invites.


Well, if I would want to communicate a concept, FOSS, for example, I would create lots of various packaging for that idea: stories, infographics, videos etc. to prevent saturation.


That's a good point. Most marketers would only want to target a segment, say "Women under 40 who have children" which would be a different seed group than "Males 18-25". BUT - I can imagine a good deal of overlap since the algorithm seems to select people with highly diverse friends.


Finding the group and interacting with it inherently means they change the dynamic of the group. That's why this is such a hard thing to do and why the sensational headline is silly and wrong.

"...Solve the Fundamental Problem of Viral Marketing" Nothing is solved. This is just another tool to use for a while.


One of my fears about the meta data collection is that when coupled with research like this it will enable monitoring and control of a small number of important people (important in the sense of spreading information, effecting change). In the same vein you take it a step further and detect those who self police, they can be effectively ignored. I am sure the machine is trying to efficiently figure out who the "do the right thing" boyscouts are so it can remove them from the system.


This doesn't seem useful to me. It doesn't matter if you "find" the optimal people to send a message to. The important thing is the message, and whether they will find it interesting enough to spread in the first place.

If it was so easy to get something viral, then a blanket message sent to a large group of people would automatically result in viral marketing. ie - spam. And we see how often spam becomes viral....


It is useful because it means you can strategically identify a much smaller set of users to focus on. If you get the seed group to trigger, you get massive payoff. And so you can target initial messages to them, and analyse who takes action and who doesn't and why, iterate, and retarget on the seed group rather than wasting lots of time on the much larger full network.


True. I guess I'm just a bit skeptical about whether this would really work. And as others pointed out, if marketers start using this type of targeting, then this "seed" audience will be over-saturated.


> It doesn't matter if you "find" the optimal people to send a message to. The important thing is the message

Well, the message matters and the initial set of people also matters.

A good message has a higher probability to be transfered by a peer to its friends. So yeah spam emails don't go viral because they have a very low probability to be transfered. But given two messages of the same quality, they may or may not go viral depending on where you inject them.

To clarify, the actual scientific paper do not claim that they "solved the fundamental problem of viral marketing" they just propose a new heuristic to find good seeds and show that it's good.

Edit: clarity.


You could target this smaller group with freebies with the hopes they turn their contacts into paying customers.


Here's a direct link to the PDF, which is far more valuable than the article itself:

"A Scalable Heuristic for Viral Marketing Under the Tipping Model"

http://arxiv.org/pdf/1309.2963v1.pdf


In other news: democracy is dead. It's been hacked.

(That's actually been true for a long time. The exploits are only getting better now. It's a bit like SSL.)


Isn't producing content that people want to watch and share the "fundamental problem of viral marketing"?

Obviously, being able to identify high value "seeds" is paramount, but it appears to be more related to cost-reduction (not having to contact more seeds than necessary).

Centrality measures, along with propagation simulation algorithms, already helped identifying seeds...but without "the proper content", I doubt that good seed classification can, alone, "solve the problem".


Everyone seems to be reacting to the "viral marketing" framing of it, when basically this is just a graph theory problem.

Reading this, it appears that finding seed sets is an old problem. Normally people focus on finding minimum-size seed sets, but here they're just focusing on small ones. However, they don't appear to have actually proven any upper bounds on the sizes of the seed sets found this way; they've just observed that empirically it's small. Which is still useful.


What a great way to stifle dissent - find the people most efficient at spreading it and neutralize them via arrest, National Security Letters or "other means." No wonder the military is funding this. They aren't interested in marketing - they are interested in the control of the flow of information, period.


Cool to see Paulo up on Hacker News. One of our employees wrote a book with him and his wife recently, examining the proliferation of cyber espionage and "cyber war" in the last decade or so. We think it's one of the only sober, thorough and technically accurate textbooks on the topic: http://www.trailofbits.com/books/#cyberwar


How to effectively spread propaganda?


That is most likely the funders goal of the original research.


And when you use "Flickr, FourSquare, Frienster, Last.FM, Digg (from Dec 2010), Yelp, YouTube and so on," you participate in the research.


I was going to be more cynical, and say one could use it to determine ringleaders in a network of adversaries.


I was too skeptical. Any technology can be dangerous in the wrong hands.


This seems to be a sort of iterative clustering of a graph to find highly connected nodes.

I suppose a similar result is reached by sorting users in a sub-graph by the time they spend online on that social network, since more time spent online probably means more "friends".


They remove the most connected nodes at each step. The one thing this algorithm is not usefull for is for finding hightly connected nodes.


More time online does not mean more "friends," quality of engagement while online does.

Troll for 10 hours and not interact...0 new friends. Interact for 1 hour in a quality way...2 new friends.


It seems like they're looking for a large amount groups of people who are tightly connected.


Now the US can influence better all the "Facebook and tweeter revolutions". NSA can analyse the social network graph and try to seed their news or manifestations. Or even better: they can turn off the influencing nodes that are against american policies.


Given this algorithm. What does the Seed Group end up looking like? What do they have in common?

It seems to me that in the real world, this group would be people with the most friends, or people with very diverse friends, kind of a no-brainer.


Not really - its finding those who connect most networks - the routers if you will.

From the article they remove those who have the most connections first. Lets say that if more than 10 of your LinkedIn links have photos, then you will upload one too. So they find everyone who has >10 friends, order them and remove the top 20% of most connected individuals. Then repeat. Stop when you have a group of people none of whom has >10 (extant) friends.

This group of people can then be given a virus (behaviour/whatever). Now put back the most recently removed group of people. It is guaranteed that each of those new people will all have 10+ friends all of whom exhibit this new virus.

Its pretty clever. Now I need to grab graph-tool and start playing !

Also:

> Lastly, we find that highly clustered local neighborhoods, together with dense network-wide community structures, suppress a trend's ability to spread under the tipping model.

That matters and frankly is the future of the internet. We shall most likely see geo-physical mesh neighborhoods. Always on, mobile or not, connectivity to the people around you. It probably will make a resurgance of democracy and community, likely to solve enourmous caching problems, and utterly destroy loads of business models. And yes ! its Maths and Science that proves its !


Not the people who have lots of friends. They will be removed from the set early. Only people with few but diverse friends remain.


It seems implausible to turn this into a business. So, you have this seed group per network and, what, charge some amount to reach this group on the assumption that they will care enough to propagate the message? Seems that quality and relevance of content are still the overriding factors.

Also, wouldn't many of the same people be members of the seed group, leading to message fatigue amongst their connections?

Seems like something out of which you may raise the hopes of many a marketing department, but which ultimately proves impracticable.


The way social marketing works in the real world is a marketing team assigns a budget of X to give to celebrity Y to mention product Z in their Twitter feed. The only measures of interest are how many followers does that celebrity have, how much does each mention cost, and how many leads did it generate. Applying this network analysis could help identify hidden celebrities that would charge less, and how soon before someone launches a bidding service based on that?


The fundamental problem of viral marketing is developing an efficient, repeatable process of content production that is very similar to venture companies: few pieces of content will have incredible roi,some will have ok roi and most will have negative return on investment. While this research will help if applied correctly it doesn't solve the fundamental problem. BTW, buzzfeed has some success using this approach.


This suffers from one fundamental flaw - if it is indeed "guaranteeable" and they sell everyone on the algorithm, it no longer becomes effective. In other words their test cases, as far as I can see, do not take into consideration the behavior of other people using the same algorithm. It reminds me of algorithms in economics.


> It is based on the idea that an individual will eventually receive a message if a certain proportion of his or her friends already have that message. This proportion is a critical threshold and is crucial in their approach.

The question I had was how to find the tipping point. Is this done through tests on a smaller group?


This is interesting though this biggest problem is their seed groups are in the size of 1%-3% of the networks population. If you can affect that much of a large enough network, to constitute 'viral', in the first place then you probably don't need an algorithm like this anyway.


I wonder what the worth of a node is, i.e. how credible they are, and how that's altered when you start using them as an attack vector. It seems possible to me that the people with small numbers of friends in multiple graphs would just find themselves put on ignore lists.


Couldn't anyone who's ever designed a router also sell their algorithm to marketers?


First the "critical threshold" is described as a ratio, then as an integer. Am I misreading it somehow?


In one reference, a critical threshold is described in generic terms, in another it's described as s certain number of acquaintances that put you over the threshold. They're not incompatible -- one represents an example of the other with a specific number attached.

It would be like my saying, "25% is sufficient", and in a later sentence saying, "eight out of thirty-two is enough". They both make the same statement.


I was confused about this as well. Check p.5 of the linked paper, it's got the actual algorithm in pseudocode, explains it a lot clearer.


Next step: people getting cold called for "post this on your Facebook and earn X" schemes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: