Hacker News new | past | comments | ask | show | jobs | submit login
Xkeyscorerules100.txt (ndr.de)
224 points by peterkelly on July 3, 2014 | hide | past | favorite | 79 comments



Related to https://news.ycombinator.com/item?id=7983124

Additionally, /. has a pretty good summary of what this is [1]. -- "If you search the web for communications security information, or read online tech publications like Linux Journal or BoingBoing, you might be a terrorist. The German publication Das Erste disclosed a crumb of alleged XKeyScore configuration, with the vague suggestion of more source code to come, showing that Tor directory servers and their users, and as usual the interested and their neighbor's dogs due to overcapture, were flagged for closer monitoring. Linux Journal, whose domain is part of a listed selector, has a few choice words on their coveted award. Would it be irresponsible not to speculate further?"

[1] http://yro.slashdot.org/story/14/07/03/1846215/nsa-considers...


Considering that XKeyscore was designed to slurp all the things, all the time, this code actually just looks like a very small, tor-specific subset of simple tagging rules.

One would assume that those tags can be used to later analyse the data to pluck out things like "find users from syria who visited tor/tails websites and is related to xyz".


Knowledge is power, and the NSA wants to track those with power.


That is also implied in the boingboing.net article[1]. There is one paragraph in particular which should ring a familiar bell...

One expert suggested that the NSA's intention here was to separate the sheep from the goats -- to split the entire population of the Internet into "people who have the technical know-how to be private" and "people who don't" and then capture all the communications from the first group.

And why would that be familiar? First they came for the terrorists, but I said nothing for I was not a terrorist...

[1]: http://boingboing.net/2014/07/03/if-you-read-boing-boing-the...


which would be a good exercise if done in a controlled environment to then catch bad guys when they get that clever (hint, they aren't)

the problem is that some incopetent person paid with our money choose to list the Linux journal as a extremist publication just so he didn'thave to bother creating the system for his training. while also gaining information illegally to further help him be a lazy mofo. all while being sanctioned by even more incopetent (or ill intentioned people) above him.

anyway, off to scare some kids from my lawn now...


Linux Journal seems out of place. They must have had an article about Tails?


http://www.linuxjournal.com/content/nsa-linux-journal-extrem...

I saw it on the HN homepage some moments ago, I guess the mods hid it for some reason?


More than one link to the Linux Journal article with comment thread has been locked and hidden, and a previous link to the link we're commenting on now. I'm sure this one will be dead soon too. For example: https://news.ycombinator.com/item?id=7984456. hckrnews.com shows dead threads in gray. Weird, no?


Yes they did[1], and a link to that article was published on an 'extremist' forum, triggering the inclusion of the domain in the filter.

[1]http://www.linuxjournal.com/content/linux-distro-tales-you-c...


So if the link to the NYT article is published on the extremist forum, and you job is to search for the extremists, you'd search for the NYT readers instead of the readers of the forum?

Your explanation has no sense, sorry.


What is an "extremist forum" exactly? Ars Technica?? That link was in the Ars Technica article.


Where does it say that's how they ended up in the filter?


/* These variables define terms and websites relating to the TAILs (The Amnesic Incognito Live System) software program, a comsec mechanism advocated by extremists on extremist forums. */

That's an interesting definition. It might cover legitimate extremists, but a quick look at Wikipedia tells me that TAILs has also been used by some pretty respectable Pulitzer Prize winners.

These comments provide an interesting and ultimately disheartening insight to how the people designing these surveillance systems view privacy software (and, by extension, privacy?).


That doesn't make sense to me. It's a definition that relates to the people they're looking for. That it's explicitly more specific than "all Tails users" really doesn't seem disheartening to me.

That Tails is used by people other than extremists doesn't invalidate the comment, by someone interested in "comsec mechanisms used by extremists", that it is, in addition to anything else it may be, a "comsec mechanism advocated by extremists in extremist forums".

It's not like "legitimate extremists" have some totally parallel universe of software that's only used by them. It's fundamental to most of these tools that they'll be used by different people in different ways, and that some of those people will be by some standard or another "bad guys".


The description is not exclusive, I'm not arguing that. Your last point makes a lot of sense; most people/groups who do nasty things do it using off-the-shelf components. My issue is with the description painting TAILs in broad strokes as comsec for extremists. Yes, they've got National Security in the name, so they're looking at the software from a national security perspective. There is a wide gap, however, between describing software and its dangerous potential and describing software only in context of its dangerous potential.

If an analyst who hasn't heard of TAILs reads that description, it would sound to them like the program is something that's passed around extremist forums in much the same way malware toolkits are disseminated in warez forums, rather than what it is, which is a Debian fork that routes things through Tor. I say this because that was my first impression, which seemed off, leading me to google, then to wikipedia, and then back here in a huff.

Now, some examples (in order of ascending silliness) of why describing something in the context of one use case is harmful when many use cases exist:

* A lot of people use nmap to explore their home networks or as part of their jobs, potentially in the computer security industry. A lot of crackers also use nmap to case out potential targets. Calling nmap a "network scanning utility advocated by computer hackers" makes illegalizing nmap sound a lot more attractive than it actually would be, even if the statement is true.

* In the real world, certain products are systematically abused for less-than-kosher purposes. Still, we never refer to canned air as a household inhalant without mentioning its dusting use-case first. Potassium nitrate is fertilizer first, rocket fuel second, and only tangentially mentioned as an oxidizer for explosives. Other oxidizers, even the ones that are illegal for consumer sale, are written about the same way.

* Reductio ad absurdum: There's a lot of general purpose software everyone uses. I wouldn't be wrong if I said "Microsoft Word is a text management tool used by terrorist groups to hatch evil plots" or "SMS is a communications technology used by insurgents to detonate bombs" or, extending the idiom, "The Quran is a book used by militant Islamist groups to justify killing and brutalizing civilians." These descriptions are all, however, deeply misleading.


I say this because that was my first impression, which seemed off, leading me to google, then to wikipedia, and then back here in a huff.

I'm not sure I understand why you think an NSA analyst (who, inexplicably, is editing an XKEYSCORE rule file regarding Tor and Tails while being completely ignorant about Tor and Tails) is incapable of doing this same kind of information-gathering.

I'm not going to act like I think the NSA only hires the best and the brightest, but your example presumes the existence of an analyst that's all of: grossly undereducated for his duties, too mentally incompetent to be aware of it, and so far on the literal-minded end of the autism spectrum that they could be replaced by a shell script.

I believe any such analyst, if they existed, would have been promoted to management before they could cause any serious harm.


I'm sure a legitimate extremist would want to use something like TAILs.

I'm more worried about the fact they are tracking people who access articles about tails by the media than I am about that particular component. If you read one article on linuxjournal containing 'tails' you are flagged:

$TAILS_terms=word('tails' or 'Amnesiac Incognito Live System') and word('linux' or ' USB ' or ' CD ' or 'secure desktop' or ' IRC ' or 'truecrypt' or ' tor '); $TAILS_websites=('tails.boum.org/') or ('linuxjournal.com/content/linux*');

So they aren't really "monitoring extremists" they are monitoring anyone with an interest in security [even if its just a hobby].


We have no context for what "flagged" or "monitoring" means, though. There's no indication that these rules are taken to mean anything in isolation beyond what is frankly common sense. The idea that just Googling Tails means you're monitored is a narrative leap.

If I write an email about Viagra, the recipient's spam filter is undoubtedly going to consider it to have a non-zero probability of being spam. That doesn't mean it will be blocked, in the end. Nor does it mean that the spam filter is wrong to take notice.


I think the fact that they can tag us based on what we searched or read is chilling. It's even more chilling that now the argument has been flipped from being appalled that our right to privacy has been utterly destroyed online, to arguing if it's right to be flagged for further review for simply searching or reading about specific subjects.


Yes. But do you honestly expect them to provide that context so we can be sure?


My point is that they (the authors of the article) don't have that context. They're just covering up the lack of information with FUD.


Yes, and the NSA would refuse to give the context...so you have to make assumptions about information about them.


I think those assumptions are not supported by the available evidence[1], and that the authors they know this. Who cares what the NSA says? I'm looking at the same thing the authors are presenting and drawing different conclusions.

[1] Including the abundance of evidence already available about XKEYSCORE.


You have every right to disagree. :)

I think erring on the side of the assumption that is bad for everyone is more likely to be true. Especially since its consistent with the NSA's overreach in other areas.

For instance:

http://thehill.com/policy/technology/318515-nsa-admits-analy...

or

http://www.nytimes.com/2013/08/16/us/nsa-often-broke-rules-o...

If they break the law over 2,500 times a year with no apparent consequences, do you honestly think its likely you are right? If so, good on you.

Personally, I'd rather assume things are more in line with the other abuses than dismiss them out of hand.

In what way is the assumption that they are tagging/tracking the sessions not consistent with:

http://www.nsa.gov/public_info/press_room/2013/30_July_2013....

?

The fact some are US-based? But we already know they violate that regularly, 'accidentally'.

The volume of data? We already know they have datacenters large enough to store and analyze it.

I'm genuinely curious why you draw those conclusions.


I don't have a good reason to believe that this is real, but if it is the most surprising part to me is the "mapreduce" rule definition in there. As far as I know the only group with a C++ mapreduce implementation called "mapreduce" that also uses protocol buffers (the "proto:" block is protocol buffers) is Google. This seems to say to me that the NSA is using a Google implementation of map reduce. That can't possibly be right, can it?


Well, if the NSA were among the group of organizations with a C++ mapreduce implementation (developed in-house) the code probably wouldn't be on github or otherwise divulged to the public... seems like a rather large leap of logic to assume they are using Google's code.


They're not using the Google implementation; it looks like they rolled their own.


Your profile says you work at Google... are you saying that the code snippet here is inconsistent with Google's mapreduce?


I used to work at Google - this looks absolutely nothing like a Google MapReduce specification or code would look like.


Not a Google employee - but the linked file looks more like a simplified DSL for analysts.


That's because they're using a derivative of Accumulo (which was developed internally at the Agency).


Interesting video about them using openstack https://www.youtube.com/watch?v=NgahKksMZis


While mapreduce isn't my area of expertise, my guess is it's probably easier for them to use mapreduce/ and Hadoop than invent their own wheel in some cases.


In other news: the NSA developed a DSL with embedded C++. Is this the most horrific revelation yet?


Not compared to the possibly in-house developed object-oriented event-driven C framework for async communication:

http://www.securityweek.com/kaspersky-lab-duqu-framework-lik...

"“It is possible that its authors used an in-house framework to generate intermediary C code, or they used another completely different programming language,” Soumenkov explained.

For reference, Stuxnet was written entirely in Microsoft Visual C++.

The Kaspersky researchers say certain “slices” of code in the Payload DLL may have been initially compiled in separate object files before being linked in a single DLL, but the slice in question is different. “This slice is different from others, because it was not compiled from C++ sources. It contains no references to any standard or user-written C++ functions.”

But there a few things the researchers do know about the mystery code: It’s object-oriented and event driven, and performs its own set of related activities ideal for network applications.

The highly event driven architecture points to code which was designed to be used in variety of conditions, including asynchronous commutations."

For the reference, New York Times, June 2012:

http://www.nytimes.com/2012/06/01/world/middleeast/obama-ord...

"Eventually the beacon would have to “phone home” — literally send a message back to the headquarters of the National Security Agency"


Would I be wrong to gather that they also built an in house map reduce implementation? What year is this code from? Most of the other documents have been from 2007-2009, when did Google first implement map reduce?


Google's paper from 2004:

http://static.googleusercontent.com/media/research.google.co...

"We wrote the first version of the MapReduce library in February of 2003"


There's a nice bit in here that automatically collates a list of Tor bridge nodes from snooped e-mails. The full list of bridge nodes isn't public, and one of the ways the Tor project attempted to prevent someone from building a complete list was by requiring people to use a valid GMail address to request them, effectively piggy-backing on Google's account verification to stop people from using a swathe of fake accounts to request nodes. Unfortunately, that failed to take account of the fact that the NSA had completely compromised Google's internal network.


Actually it was Britain's GCHQ that tapped Google's datacenter links and shared the data with the NSA. I only mention this to remind peeople that it's more than just one country and agency that's doing this.


    // START_DEFINITION
    /*
    The fingerprint identifies sessions visiting the Tor Project website from
    non-fvey countries.
    */
    fingerprint('anonymizer/tor/torpoject_visit')=http_host('www.torproject.org')
    and not(xff_cc('US' OR 'GB' OR 'CA' OR 'AU' OR 'NZ'));
    // END_DEFINITION
I was surprised to see that they actually tried to exclude Five Eyes countries. The cynic in me wonders if there is "bug" that neuters the restriction.


Of course not; we're just all in bed together so hard that they get direct access from our respective agencies anyway.

Why spy through clandestine means when your buddies hand you their own info anyway?


What's interesting is that torproject is misspelled in anonymizer/tor/torpoject_visit. I suppose bugs are a universal constant.


Well, we have already learned that their idea of origin country is IP ranges, even if the data clearly indicates otherwise.

That said, they have a variable for directory servers in FVEY countries and one for non-FVEY countries. So I guess they collect everything and the distinction is only here because theres different legal boilerplate for the analysts to follow.


I'm wondering if "FVEY" meant something before it was rendered as "five eyes"?


The "Five Eyes" term has been in use since 2006. Strangely enough the original Wikipedia document archived on the Wayback Machine seems to be unavailable, however we do know it pointed to "USAUK_Community" whose relationship was extended to include other allies which is now known as the "Five Eyes".

Before 2007 the amount of searches for "FVEY" was near zero. The relationship between these five nations wasn't named publicly until 2006 or so, before that the relationship was there - just under another name.

However there are documents dated to 2001 available on mors.org which have FVEY references: http://www.mors.org/UserFiles/file/82nd-Symposium/Form%20712...

These are public so you can assume internal naming was used well before then.


FVEY is the accepted shorthand for AUSCANNZUKUS classification, and has been used since the 50's when UKUSA was expanded.


how about merely 50% confidence on whether or not an individual is considered "foreign?"


From the filename, I thought this was some kind of X window config file for key remapping.


OK, I am going to play a devil's advocate.

- The job of three-letters agencies is to find out all various enemies of the state. That includes terrorists, but also various gangsters, officials of sort-of-enemy states (like Russia). That's why they exist.

- Following US nation interests has higher priority than rights of citizens, especially in other countries.

- People, that US have a reason to spy on, probably will use Tor/Tails, or at least try to find something about it. It makes sense for NSA to filter those people and focus their spying on them specially.

- Not all Tor interested folks will be evil, but the percentage there will be much higher than in just random internet. So it makes sense to focus on them. Just like it makes sense for a local police to be in a neighbourhood, that's known for a higher criminality.

So I understand why NSA does this, and why do they single out Tor-interested folks.


>The job of three-letters agencies is to find out all various enemies of the state.

Within the confines of the law and US constitution.

If all you care about is rooting out enemies of the state, then you're left with organizations like the SS or Stasi.


Does the US constitution matter with respect to non-US entities though?

Why should US-based agency care about rights of German citizens, for example?


While I think it should apply to non-US citizens, the fact that it technically doesn't is irrelevant as long as they're still willfully violating the rights of US citizens too.


The job of a government is towards it's country. A country is also made up of multinational companies and corporations.

Modern say spying is commercial and industrial in nature also... it's not just used to catch bad guys, it's used to get an advantage.

Whilst industrial spying does not trigger emotions as much and infringe their own citizens rights as much, it's arguably an even worse applications of our intelligence agencies.


Presumably reading hacker news also puts you on one of their lists.

Just remember should there ever be a problem between us, we know everything about you.



lol, nobody cares. flails arms in hysteria


Does anyone have any context or information as to what this is?


https://news.ycombinator.com/item?id=7983124

The article commented has the whole context.

For us programmers it's interesting, among other bigger issues, to see that the rules contain the pieces of code in C++.


The article commented has the whole context.

It really doesn't. In spite of repeatedly claiming things like that people searching for Tor are "monitored" or users are "tracked", it's completely vague about what that those terms actually mean and provides zero examples.


Do we read the same article? It actually has 5 pages and it attempts to explain the different sections of the file.

http://daserste.ndr.de/panorama/aktuell/NSA-targets-the-priv...

http://daserste.ndr.de/panorama/aktuell/nsa230_page-2.html

http://daserste.ndr.de/panorama/aktuell/nsa230_page-3.html

...

Maybe you've read only the first page and missed the remaining four?


Yes, I read it, and I stand by what I said. The article explains a set of rules used to filter a set of information out of another set, but it does not support claims about what is done with that set of information.

There is a lot of insinuation, but no example of any individual user of Tor or reader of Linux Journal, etc. being monitored or tracked simply for doing so.


You can always demand even more proofs, but it's not probable that they'll come easily -- it's obvious that such leaking is extremely dangerous. This is still significantly more that public knows now than it knew before the article.


It makes more sense to me that they actually use these in AND statements. For example, uses TOR and searches for JIHAD could be traffic that would be interesting. If I had to guess, the Linux Journal stuff was just something a geek put in there during testing.


That's what I thought—these rules obviously aren't the entire pipeline, and the results obtained from them may or may not be interesting in and of themselves.

That said, using Tails probably does increase my XKeyScore rating. Is there anything published as to the scale of the rating? Something along the lines of "Once you get a rating of (say) 500, we're gonna come and beat down your door, wife and dog, and not necessarily in that order".


There's no indication that there is such a rating exists, or that any such decisions are made based on automated rules. It's a tool for selecting some traffic out of all traffic, not for replacing human analysis or decision-making.

http://en.wikipedia.org/wiki/XKEYSCORE



Here are details for all of the IPs in that doc:

  $ curl -s http://daserste.ndr.de/panorama/xkeyscorerules100.txt | grep -Eo "([0-9]+\.?){4}" | xargs -I% curl -s http://ipinfo.io/%
  {
    "ip": "193.23.244.244",
    "hostname": "No Hostname",
    "city": null,
    "region": null,
    "country": "DE",
    "loc": "51.0000,9.0000",
    "org": "AS50472 Chaos Computer Club e.V."
  }{
    "ip": "194.109.206.212",
    "hostname": "tor.dizum.com",
    "city": null,
    "region": null,
    "country": "NL",
    "loc": "52.5000,5.7500",
    "org": "AS3265 XS4ALL Internet BV"
  }{
    "ip": "86.59.21.38",
    "hostname": "No Hostname",
    "city": null,
    "region": null,
    "country": "AT",
    "loc": "47.3333,13.3333",
    "org": "AS3248 Tele2 Telecommunication GmbH"
  }{
    "ip": "213.115.239.118",
    "hostname": "No Hostname",
    "city": null,
    "region": null,
    "country": "SE",
    "loc": "62.0000,15.0000",
    "org": "AS2119 Telenor Norge AS"
  }{
    "ip": "212.112.245.170",
    "hostname": "No Hostname",
    "city": null,
    "region": null,
    "country": "DE",
    "loc": "51.0000,9.0000",
    "org": "AS24900 QSC AG"
  }{
    "ip": "128.31.0.39",
    "hostname": "belegost.csail.mit.edu",
    "city": "Cambridge",
    "region": "Massachusetts",
    "country": "US",
    "loc": "42.3646,-71.1028",
    "org": "AS3 Massachusetts Institute of Technology",
    "postal": "02139"
  }{
    "ip": "216.224.124.114",
    "hostname": "No Hostname",
    "city": "Aptos",
    "region": "California",
    "country": "US",
    "loc": "37.0082,-121.8777",
    "org": "AS40231 Ethr.Net LLC",
    "postal": "95003"
  }{
    "ip": "208.83.223.34",
    "hostname": "No Hostname",
    "city": "San Francisco",
    "region": "California",
    "country": "US",
    "loc": "37.7749,-122.4194",
    "org": "AS40475 Applied Operations, LLC",
    "postal": "94159"
  }{
    "ip": "128.31.0.34",
    "hostname": "moria.csail.mit.edu",
    "city": "Cambridge",
    "region": "Massachusetts",
    "country": "US",
    "loc": "42.3646,-71.1028",
    "org": "AS3 Massachusetts Institute of Technology",
    "postal": "02139"
  }


The problem I have with this selector source code is that it is incredibly complex to execute.

How is the NSA able to do this in realtime for all their interception points?!


XKeyScore runs nearline, not in-line - and it's distributed.


I believe this "source code" is made up, invented for the masses. In fact, the more and more I see of these surveillance reports and reveals, the more I believe this is all purposeful deception, and that while it may be true they're doing all these things, they aren't leaks, but announcements.

On a board where most of us should be familiar with the concept of not trusting user input, I think we should all take a step backward and treat these "leaks" as just that: input from an untrusted source. This could all be a fabrication we're buying into.


Poor enough you use a throwaway, but I see what you're trying to tell.

I personally believe the NDR/Spiegel to be reputable, trustworthy media. It may be that we have another Hitler's diary scandal there, but the other Snowden/NSA material they have published so far has held up examinations, the NSA iirc even acknowledging some of the docs as actually valid.


I'm not disputing that the NSA stuff is real - of course they're doing it, it's what the agency is expected to do.

I'm disputing whether these are leaks or announcements. The media might not even know they're part of the plan. Is it that farfetched for the NSA to say "Hey, we want to make this information public for some reason, and we're going to do so by using a whistleblower. You're going to release all these documents to a bunch of media sources and live abroad for a while and be hailed as a hero"? There's a long history of leaking information, factual or otherwise, for a number of purposes. There's potentially no end to the rabbit hole. Maybe Snowden doesn't even know he's part of the plan.

It's my opinion that the NSA, some group, or some individual, is letting us know they're watching closely. Maybe they're acclimating us to the idea of being spied on, or worse, distracting us from or preparing us for something else. I think the big picture has yet to be revealed.


Something worth considering is the conditions necessary for having both subjective and objective expectation of privacy. If nobody expects to be able to avoid the NSA, nobody can expect to have subjective expectation of privacy.

NSA will certainly know everybody's reaction to the Snowden leaks.

I figure the absolute worst case scenario imaginable is that USGOV is producing a list of sysadmins and privacy conscious individuals for extermination. Most "coding" seems to be happening now several layers up from linux systems. If the Government wanted, they could kill everyone who knew how to actually use a computer. Then they could "Teach low income Americans to program" and then just conveniently forget to teach them the full stack. Note, I don't think this is happening, but it isn't unimaginable.

In addition, the narrative has been very pro CIA since the very beginning. There have been a lot of people making the clear distinction between machine intelligence and human intelligence, which is definately a CIA line.


If it is an intentional leak then we should ask ourselves what benefit or what effect does this information have?

For example - would it reduce the number of people using TAILS, or increase? Would it make technologist laugh at incompetence and make the NSA less scary, or would it lead to more intense scrutiny and wariness?

That is what we should be asking, not disregarding the information. We should look at the effects. The nature of the comments in this thread is one of those.


A magician (conjurer) uses misdirection of the viewers attention and a strong narrative as his primary tools for deception.


So the score may not be IPv6 compatible?


It probably would have been a lot smarter to post this somewhere inside the continental US.


What language is this?


Obviously the custom language (which uses $ like Bash or Perl) but which allows the programmer to include the pieces of C++. Hints: static std::string and boost!

The Boost, for which Google writes:

http://google-styleguide.googlecode.com/svn/trunk/cppguide.x...

"Cons: Some Boost libraries encourage coding practices which can hamper readability, such as metaprogramming and other advanced template techniques, and an excessively "functional" style of programming.

Decision: In order to maintain a high level of readability for all contributors who might read and maintain code, we only allow an approved subset of Boost features."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: