Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: I may have been spun. Is there anything I can do about it?
32 points by spokey on July 21, 2010 | hide | past | favorite | 38 comments
So in the shadier parts of the SEO world there's this concept called "spinning" where one takes a single article and programatically creates hundreds of small variations of it, largely by substituting words and phrases with synonymous terms. This is meant to work around Google's duplicate content detection algorithm). It isn't hard for a person to notice the similarities between the documents, but it is subtle enough (or computationally difficult enough) that search engines seem to be fooled.

I think that one of my competitors has taken some article content from my site, "spun" it (possibly by hand) and reposted it on one of those post-your-content-with-backlinks sites you sometimes find in search results. (Except they posted it with backlinks to their site, of course.)

If this was a direct, verbatim copy I'd contact the hosting site to notify them of the copyright violation.

But that's not what this is, it's more like a section by section, sentence by sentence paraphrasing of my content, with some other slight modifications (which is why I think it may have been done by hand, but I'm not familiar enough with spinning to know what's normal and what's not). I'm pretty sure that my content was the source document for this, since there are some unusual phrases that carried over to the new document, and the structure of many of the sections, paragraphs and sentences of the new document are identical to some on my site.

Both the site on which this content was posted and the site to which it is linking are legitimate business that I assume would respond to something like a DMCA take-down notice.

I'm well aware of the degree, nature and much of the mechanics of plagiarism on the web, and while I'm annoyed (possibly flattered) that it seems a competitor more or less plagiarized content from my site, but my bigger concern is this is the tip of the iceberg and I'm about to see dozens of close copies of my content floating around the web.

Is there anything I can or should do about this?




A derived work is just as much protected by copyright as the original is. If you translate a book, you cannot distribute that translation without the original author's consent. So the circumstance that they didn't republish your article verbatim, but in a modified way, is of little relevance. So you can do the same things you would if it was a verbatim copy.


All the people saying it is copyright violation are probably right.

But, I don't think it's worth the effort and money (perhaps you will have to get a lawyer...). If at all, just contact the site where they posted it and report the copyright violation. Describe what you think what happened and wait what (and if) they answer. Most of the time, they won't bother and just remove the text.

Much better use of your time is to write another, even better text for your website.


All the people saying it is a copyright violation are in fact wrong.

Copyright protects a specific expression of an idea or group of ideas. It does not address the use of a copyrighted work as a template for generating other copyrighted works that use alternate expressions.

If you create a copyrighted work that consists of the phrase "The rain in Spain falls mostly on the plain." and someone comes along and very carefully creates a work of parallel intent that includes the phrase "In Spain, precipitation occurs mostly on the lowland prairies."; they are NOT violating your copyright. What they are doing may be sleazy, dishonest and lazy, but it isn't legally actionable.

This is actually a fairly deep topic, our legal system is constrained to working with tangible expressions and cannot identify the similarity between two expressions of an idea that share the same semantic structure (meaning) and yet have completely different linguistic surface (text).

Now if you can show that their works were mechanically derived from yours, that may be a different kettle of fish.


Really? So how do you align that with e.g.http://en.wikipedia.org/wiki/Derivative_work or the 'without prejudice' clause in art. 2.3 of the Berne Convention?


A mere claim that something is a deriviative work doesn't make it so; you'd have to show priority, connection and derivation. All of which are trickier than you'd think. If a company sends out a press release and two bloggers write stories about that company that are substantively similar but not word for word identical and one of them accused the other of copying, that would be a difficult case to prove.

Generally speaking a derivative work implies the overt appropriation of elements of the original; for instance if you write a novel about the rich inner life of Ernst Stavro Blofeld, that would be derivative of Ian Fleming's body of work. If however, you write a novel about a spy with a sex addiction problem who works for the Dutch Secret service and is blond and you very carefully avoid duplicating any precise plot points from any of the James Bond novels, it may be derivative in the sense that it's obvious you were emulating Ian Fleming, but it would not legally be a derived work.


> A mere claim that something is a deriviative work doesn't make it so

Obviously. But there is no way to tell from the OP's story. The way the OP presented the facts, the interpretation is that it's a derivative work. Someone took his text and shuffled it around just enough to make it not word for word identical. How the dice will roll in court is always tricky to say, and impossible in a case where you only have a general description of the facts, and only from one party at that.

The point is that a derivative work is protected just like the original. The evidence is a whole other matter, one that nobody, in this particular case, can judge from the information given.


> If this was a direct, verbatim copy I'd contact the hosting site to notify them of the copyright violation.

From your description it is a derivative work. This is still a copyright violation. I can't comment on what you should do, though.


I think there may be some confusion here. If the person truly and skillfully paraphrased your writing, to the point that there are no duplicate sentences between the two documents, it would be hard to argue that it is a derivative work. They have effectively created an entirely new document espousing all of the same ideas as yours, and on which you would not have any copyright claims.


> They have effectively created an entirely new document espousing all of the same ideas as yours,

IMO, that describes the situation well

> and on which you would not have any copyright claims.

That is also my understanding, unless I could somehow prove that they violated my copyright in the construction of this work. For instance, if one could prove that my copyrighted content was fed into a spinning algorithm to generate this new document, I think you could argue that it is in fact a derivative work.

But it's kind of an academic question anyway, as it would be very difficult to prove that in the first place, and likely not worth the time and trouble if you did. It may even require new legal precedent to win that sort of case, but I'm not that familiar with the entire scope of copyright law.

Just as an intellectual exercise, suppose I fed one of the Harry Potter books into a spinning algorithm to come up with a book about Larry Kotter, a pupil at the Cowpimple Academy of Sorcery. If my new book is just substituting synonymous terms in the original work, I'm pretty sure I'd lose a copyright claim. But how different would it need to be to become legal? To become undetectable? Suppose I was synthesizing multiple works for my spun article? How sophisticated would my spinning algorithm need to be before it really was creating new works?


First, you need to stop thinking in terms of 'algorithms' and 'spinning'. It's irrelevant and confuses the discussion. I understand how tempting it is to frame new concepts (law) in terms of things you understand (computers) but it's very dangerous territory. It's also (this is not to bash you personally, just a general remark) why 95% of legal chatter amongst technical people on the internet is meaningless to lawyers ('meaningless' as in 'jibberish', 'flux capacitor'-style nonsense) - because the fundamental assumptions and reasoning methods are so vastly different that it's almost impossible to reconcile the two in the limited (in terms of expressiveness) environment of forum posts. FWIW I've been a programmer for over 10 years and hope to finish my law degree in a few months, so I have some inside experience on the two worlds, and in how the two interact (or rather, seem not to be able to interact).

Furthermore the question whether something is derivative does not depend on the form of representation, or the amount of similarity (on the textual level) between two works. If I write a theater play based on a book, I may not use a single sentence from the book; it's still a derivative work.

'Derivative' is casuistic. It's meaningless to argue about 'levels of similarity', in a legal context. The question is whether the second author based himself on the artistic values that the original author put in the work. The originality of a work is not necessarily in the wording, it's in the creative effort that went into the work. So if someone blends two works into one (without permission), he's violating 2 author's copyrights.


> It's meaningless to argue about 'levels of similarity', in a legal context. The question is whether the second author based himself on the artistic values that the original author put in the work.

That's an excellent point, and I understand that. But as a practical matter, it seems like "levels of similarity" would be at the heart of an argument that document B was "based on" document A. Having direct evidence that B was based on A seems a less likely scenario, although a legally much more straightforward one.

To take your play example, I may assert your script was based on my book, but if you deny it I'm stuck with trying to demonstrate that the similarities between the two are above and beyond what we'd expect from mere coincidence, right?

> So if someone blends two works into one (without permission), he's violating 2 author's copyrights.

This is more in the realm of philosophy than legality, but what I was getting at is this: I think there's a pretty strong and conventional argument to be made that much of what we call "creativity" is really a synthesis of pre-existing works and pre-existing ideas. At some point we start to consider this synthesis more inspiration than plagiarism. Intellectually I think it is interesting to consider where that line lies. (And, taking us back to the world of algorithms, whether you could claim, for instance, that a sophisticated enough markov-chain, given a large and diverse enough body of input, is eventually creating new works rather than simply synthesizing existing ones.)


> First, you need to stop thinking in terms of 'algorithms' and 'spinning'. It's irrelevant and confuses the discussion.

This is HN, not a court of law. It is entirely appropriate to use technical language here. I think if you understood what 'spinning' was you'd see that it was relevant to the legal situation even if these exact words may never be used in a legal action (although the very pre-existence of concrete spinning programs would actually be relevant as evidence that a deliberate process of copying had occurred).

> why 95% of legal chatter amongst technical people on the internet is meaningless to lawyers

I'd argue that a similar percentage of lawyers' technical discussions are meaningless to the reality of technology. As we move forward I'll wager that the lawyers will be the ones needing to do a greater percentage of the adjusting if they are to remain relevant.


> It is entirely appropriate to use technical language here.

Of course. I'm not telling anyone what to do. What I meant was, that to understand the legal issues, the explanations in these terms are irrelevant and obfuscate the thought processes that lead to the legal understanding of the situation. I understand perfectly what spinning is. But 'algorithmic spinning' is irrelevant in this context. It's not about the level or nature of mechanical changes. One shouldn't think about legal issues in quantitative ways. It's orthogonal to the system.

> As we move forward I'll wager that the lawyers will be the ones needing to do a greater percentage of the adjusting if they are to remain relevant.

Thanks for making my point for me. It's this technocratically warped world view that lies at the basis of the bigger part of misunderstandings about the legal field and how it relates to technology. The technical details are seldom relevant in most technology legal issues. or at least in a very different way than technologically oriented people look at it.


> One shouldn't think about legal issues in quantitative ways.

What about evidence? Presumably the copyright 'deriver' isn't just going to honestly detail how they stole the work. Surely an important part of the evidence is going to be what process was used to derive the work? If it can be established that an existing software package would produce an identical derivation surely that would be an important part of the evidence. Without understanding the processes of transformation (i.e. the spinning process) it may not be possible to accurately gauge the likelihood of the work having been derived versus a coincidence. This all seems to be firmly in the realm of quantities to me.

> It's this technocratically warped world view that lies at the basis of the bigger part of misunderstandings about the legal field and how it relates to technology.

I think the 'technocratically warped world view' is based on the notion that the function of the legal system is ultimately to serve society - not the other way around. Given that technological change is the major driver of social change it will necessarily also drive changes in the legal system. Situations where ancient laws and ignorant judges determine cases represent failures of the legal system and the more it fails the less relevant and powerful it must become for society to flourish. Inevitably it will be the legal system that needs to 'warp' itself towards the technocratic one rather than the other way around. Of course what I'm talking about is a medium/long term trend. Anyone expecting this to happen in any individual legal case would be making a big mistake.


1. Don't jump to the conclusion that the output of a spinning algorithm doesn't infringe the copyright in the original input. I haven't done any research on the question, but I'd be willing to bet a six-pack of Lone Star (beer) that a decent copyright lawyer could make the case that it is indeed an infringement. (I don't do litigation anymore, so I'm not that lawyer.)

2. You should immediately look into registering the copyright for your work -- if you're a U.S. author, you can't file suit for infringement without first registering the copyright.

3. GENERAL NOTE: Anyone concerned about having their copyrighted work ripped off should register the copyright NO LATER THAN three months after first "publication." Otherwise, you may well forfeit your right to have the judge order an infringer to pay your attorneys' fees (which can be no small matter). The U.S. Copyright Office has on-line registration capability at http://www.copyright.gov/eco/.


Did you invent the term "spinning" or are you copying it from someone else? Your explanation of it is very similar to some other text I have read, with just some different paraphrasing...

Big Picture: You don't have a copyright claim. (I am basing this definitive verdict on your complete failure to supply us with any facts.) If this "issue" has taken more than 30 seconds of your time, I predict mediocrity or failure for your business. You can't afford to waste your time on it. Seriously.


As an amateur onlooker, it'd be useful to see both pages for comparison, although I can't say I know anything about 'spinning'. I'm curious what the result looks like, though.


What's the damage in monetary terms? Is it worth spending your time and money to get the spun article taken down?


I don't know exactly. What's a tiny bit of link juice worth? What's a tiny edge in Google ranking worth?

If it is simply sending an email, then, sure I'll spend two minutes doing that. If I need to engage a lawyer, no it's probably not worth that.

But again, my bigger concern is that if this content is now part of a spinning database then it's not one spun article, it's hundreds.

This hasn't happened yet, so maybe this just borrowing trouble I don't yet have, but even as an academic legal question, suppose this sort of sophisticated plagiarism happens to your content on a broad scale: can anything be done about it?


I doubt a single email would work. The content farm would actaully hold on the said article, hoping that the issue escalates to a public flame war and they get links from the blogosphere. Bad publicity is still publicity for those operations.


Without reading your article and the "spun" article it would be hard to tell whether or not your copyright has been infringed. Could you post the links?

Perhaps, open up a channel of communication with the "legitimate businesses" that run the site instead of sending them a DMCA notice.

For more on spinning, I found this article helpful: http://www.plagiarismtoday.com/2009/06/16/spinning-spamming-...


I'd rather not promote this content by linking to it.

Thank you for that link, it's a great overview.


A lot of my competitors are using spin articles too (although I don't think they are using my content as the basis). The main problem seems to be that it appears to work, I can see them rise in the organic search results.

I wish there were some way to report this to Google, it would improve the user experience and save everybody a lot of work. Until then it's very tempting to join in on it.


Going to reiterate what some people have said here already. Chances are you can't do anything. This is the internet after all. There's a small chance the person will take down the content just from a polite email from you. Assuming this won't work the only other course of action is spending $$. This route will be a money sink for you so the only reason to do this is ego or you just have money to burn.


I found this short video from a lawyer to be very helpful in understanding the different options you have when pursuing copyright violation prosecution: http://houchinlaw.com/?p=660


You can't do pretty much anything. Welcome to the internet. Just get over it.

By the way, you can also translate from english to another language and then back to english. Now just fix broken grammar and you might have what you called a spun article.


spin your content and use it for links first.


Please don't. Same content from multiple domain names pointing back to the original is against Google ToS, and you might lose your PageRank unless you're Jason Calacanis in which case it's fine.


Believe me, I won't, but note that the "same content from multiple domains" problem is precisely what spinning sets out to solve. It's not the same content, certainly not at a superficial level.


s/solve/cause/


Would you provide the relevant ToS for that claim please?


It's from the 'Google Webmaster Guidelines', located here:

http://www.google.com/support/webmasters/bin/answer.py?answe...

"Don't create multiple pages, subdomains, or domains with substantially duplicate content."


substantially isn't really defined. and if you submit them to article directories and the like... whose responsibility is it?

Oh yeah, don't point to the original article, build it to your landers :)


If you own the article directories yourself, you're making additional domains. If you don't, that's fine, as I understand it.


It would be a pain in the ass to make a bunch of well ranked article directories just to spin your content for 1 product. I have to assume just submitting it to other people's sites.


A lot of spinning going on here. Your competitor is spinning your content for his gains (not cool), and you are spinning your wheels wondering what you should do.

There is no better way to put a competitor on the defensive than by innovating. So innovate. Stop wasting your energy on this guy and do something awesome that will clearly differentiate your business from his.

This competitor isn't worth worrying about if he's doing weaksauce tactics like stealing your content.


I thought someone dosed you with LSD and you were looking for the antidote.


I don't know about copyright laws and I am not saying copyright is unimportant but from your post, I don't think your site got hurt in anyway. Aside from giving your competitors' a free lunch, why do you care? Especially if your site is the first one when people search for the title or keywords of your article. If you are still #1 on google search and did not have a drop in the # of visits, you might as well thank the competitors. Otherwise, I agree it's unfair.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: