I am always intrigued by a lot of these announcements that clarify that only 'some' of the user information was obtained by the infiltrators.
I would have assumed that if a database was breached, then the bad guys could access the entire user or password hash table? Would 'only some' data mean that they detected a 'SELECT * FROM users' query being run and shut down the connection before it could complete? Is it the database sharding they use which means the entire table is not visible at one time?
I'd be interested to hear more about technique or technologies available to prevent global queries to scrape entire tables once someone has gained access to your database.
One very simple option would be to kill all queries which take more than a few hundred ms, or which return over n rows, and send an alert. Such queries are almost always slow, and tend to stand out amongst normal traffic.
Doing so keeps your db responsive against programmer errors and limits data exfiltration. I've been doing this for the first reason for years.
Not every user has access to every table or database. So if a company separates such things based on various categories, brandnamecars.com's hacker exploited account might get all the car users' credentials, but not the case for brandnametrucks.com, who had a separate table or database, and restricted the brandnamecars.com user account properly with permissions so it couldn't get that info, even if the same server handles both the databases or tables.
Say there's two tables, users and user_preferences. Someone goes in, takes the contents of users (hashes and salts and all). Only some of the user information was obtained!
I get it about normalised data spread across multiple tables - but usually (from how I interpret it), they seem to be talking about number of rows - i.e. "We think only 10,000 users had their information compromised...".
I believe in the case of the LinkedIn breach, they said that something like "less than 20% of their user passwords were leaked". I take that to mean that not all rows were exposed, but only some - that's why I am intrigued as to whether the query was shut off mid stream, or the bulk download of exported data was detected and cut off or similar?
This is the case when attackers don't get access to the database itself - imagine they were able to listen to connections between users and front-end servers, and extracted authentication information. This would only concern users connecting during a specific timeframe.
In this post for instance, they indicate that attackers got 'sync users’ passwords' while storing only 'encrypted/hashed data'.
Other possibilities: they accessed a partial backup (or prod data used in dev), a caching system, a message broker (Kafka)...
if the news has been any indication on TV in the last couple decades I've experienced, it's often hard to tell if things are actually getting bad vs. if things are actually just getting reported a certain way and with more volume.
Since the person to post this comment also posted that submission, probably because they're unhappy that their submission didn't catch on yet it's the exact same subject. Sadly, shit happens.
But the person has to realize that this won't help him/her at all, only comes off as being a little bit whiny, maybe even a little bit desperate. Yeah, it happens, but it's only virtual points on an Internet site we are talking about... who cares who submitted it in the end.
EDIT: Hope I haven't offended anyone, I am usually not the judgemental type, couldn't help wondering in this case.
Personally, I think that there should be minimal dupes. Therefore, I felt the earlier submission which did not gain traction should be deleted from HN and/or merged. I'm not sure of the protocol.
I wasn't aware this came across as desperate for virtual points. I really don't care.
I would have assumed that if a database was breached, then the bad guys could access the entire user or password hash table? Would 'only some' data mean that they detected a 'SELECT * FROM users' query being run and shut down the connection before it could complete? Is it the database sharding they use which means the entire table is not visible at one time?
I'd be interested to hear more about technique or technologies available to prevent global queries to scrape entire tables once someone has gained access to your database.