Hacker News new | past | comments | ask | show | jobs | submit login
Using Deep Learning and Google Street View to Estimate Demographic Makeup of US (arxiv.org)
145 points by phodo on Feb 27, 2017 | hide | past | favorite | 48 comments



Dammit AI people, wake up! This is SO backwards!

AI-based profiling of anything is bound to filter off outliers. Censuses are made in the field for a reason. The goal is to gather data so that you can make statistical reasoning on it. Not the opposite!!!!

Here, they just gather some car data, and infer demographic data from it. And then what? We've just created a population which matches the car sample. We can't draw ANY conclusion from that data, beyond the type of cars that are found in such and such neighborhoods.


In my parts of the world (Eastern Europe) you can infer a lot more about the people inhabiting a certain area by looking at the cars they drive, I mean, compared to census and income/tax data. That's because there are lots of way of not telling the authorities how much you earn, but once it comes to purchasing a car you leave aside such niceties and you get the car that best fits your real economic status (with few exceptions).


Cars are funny. In the usa it's easy to get a lease on even an expensive car.

Determining "stuff" on car ownership may be misleading.

Examples. Mercedes acquired by co worker. He said family was so poor he had to wear his big brothers clothes.

Another person I am working with now used to live out of his car. Within his 2nd paycheck he leased a new BMW.

Cars were used to "feel good" and perhaps elevate class self perception.


Yes, of course there is obviously some correlation between demographics and car ownership. But this paper makes it seem that their method is good to replace a in-the-field census.

Basically, replacing all demographic data that the census brings with the single factor of car ownership. In the region where the actual correlation is less than 90%, the data generated is completely useless.


> replace a[n] in-the-field census

That is not what I read. They wrote, "complement," and never suggested replacing.


> But this paper makes it seem that their method is good to replace a in-the-field census.

Ah, I missed that, sorry! To be honest I only skimped through the description, for other lazy people like me here's the relevant line:

> Our results suggest that automated systems for monitoring demographic trends may effectively complement labor-intensive approaches, with the potential to detect trends with fine spatial resolution, in close to real time.

which I also find as a little bit crazy.

Otherwise I find the project quite interesting. I quite like to lose (too much time) on GStreetView, and at the same time I developed an interest in 20-30-40-year old cars, so I wrote a small Chrome extension that allows me to locally save images of old cars that I find on my country's roads while "taking a walk" on GStreetView, along with the relevant info (lat-lng, address, and the make and model of the cars which I input by hand using said extension). I had also noticed that there's a distinct correlation between a city's economic status and the cars you can find on its streets, and I was wondering how hard would it be to do the car "recognition" using some AI-thing and try to draw some conclusions from that. Glad that someone actually did it.


> Ah, I missed that, sorry!

I don't think you missed it. If so, I've missed it after reading the paper twice.


Must be why I've often met with astonishment and incredulous remarks when I meet new colleagues. "He's got to have a lot of money; why does he drive that crap car?"

I always answer: that's why I have a lot of money.


You tell people you have a lot of money?


No. But I'm running the company they work for or are vendors to. Or I'm representing my consulting company to a client. They guess the rest.


That's weird. Because I've met plenty of people who are rich because of entrepreneurship and drive a luxury vehicle. Why are you rich because of the car you drive?


Is this just trolling now? Because saving money by not spending on luxuries is generally understood to conserve money, making one wealthier. Saving on cars can obviously save 10's to 100's of thousands of dollars.


Sorry, you're not rich. 10's to 100's is very middle class.


> can't draw ANY conclusion

So, you don't trust surveys? You don't A/B test your marketing?

Estimates are useful. Hell, your brain is just estimating what's in front of you based on a sample of photons coming in through your eyes. If you trust your vision, you're trusting statistical sampling. I suppose it's possible the USS Enterprise is cloaked and somehow your eyes are deceiving you, but the photons are certainly correlated with what's in front of you.

Some of the best statistical innovations are simply discoveries of good proxies for something that's difficult to measure. Google found that what you search for is a good proxy for what you want to buy. Facebook found that what your friends click on is a good proxy for what you'll click on. Of course you won't like everything Netflix suggests, but its suggestions will (hopefully) be better than just picking movies randomly.


Of course that estimates are useful. But when you use proxy data such as cars to infer demographic data and present these as the result of a census, you can see why it's problematic. You're basically using the number of pick up trucks to estimate the degree of education in the neighborhood, then you use that same number of pick up trucks to evaluate the wealthiness of that neighborhood, and then you draw a conclusion on the correlation between wealth and education, which is the very input that's hidden in your AI model.

You may use census data to train a neural net to draw some conclusions, not to produce brand new census data.


You're assuming a great deal of incompetence in the use of the data.

Why not appreciate the proxy for what it is?


> Our results suggest that automated systems for monitoring demographic trends may effectively complement labor-intensive approaches


It doesn't matter if it's only for complementing. Census data is supposed to be the source data, if you start to mingle and add some model in order to "enhance" your source data, you're tainting it with wrong samples. Then from there everything is biased.

My point is that there is a hype around AI that makes it look that it can do things that it can't. What they do in this paper is equivalent to using a correlation matrix, and a bunch of observations to generate a completely random population that matches what we input in there (ok, the neural net may find some non-linear relationships, but you get the idea). Yes, there is a significant correlation between car ownership and demographics, so the results do look good from far away, but the reality is that they will only map the car ownership factors on to other traits, and in terms of information, it is way poorer than actually doing a census.


Also keep in mind that census data is what people self report. Pickup trucks (in this case) are what they actually drive.


If you had multiple points of data, say street view from around the time of the last census, and an updated street view within the last few months. Could you not infer changes in the demographics? With some level of significance?

I am curious, because at first glance this seems really useful. But trying to see where the limits actually are.


Well, for one thing, if you're worried that your census is misreporting, and this method gives you clues about areas to double-check, and they turn out to have been good suggestions, then it's useful.


yes, definitely!


Everyone jumps to the dystopian scifi implication. But how about lets try thinking like real technologists?

Example use: I know of three small but vibrant towns in Massachusetts where my wife and I would like to open a Bed 'n Breakfast. The problem is they're too expensive. So I feed the street views of these downtowns into the AI, and filter by <$250K home price. And it returns 20 beautifully "quaint" towns that I never could have found just by traditional census / business database searching.


A major example: data in emerging markets.

For most emerging markets, there's nothing comparable to the per-track census data in the US (and other developed countries). Some fixed-line ISPs do market assessments by driving through neighborhoods and counting the number of cars and AC units in each house, as a proxy for income, and it is unlikely that better data will be available by traditional means any time soon.

I once had to compile a lot of per-city info on Indonesia, it involved a friend actually going into the national statistics office, spending several hours drinking tea with them, getting the actual raw survey data recorded into a CD (this was a couple years ago, not a couple decades) so I could eventually process it and get some results (despite the abundance of gaps and errors in the data).

In other words, Street View data is more pervasive, updated and readily available than data from the national statistics office.


In terms of jumping to the sci-fi dystopian implication, I think it's quite fun to think about and certainly part of being a real technologist...:

After all, doesn't the kind of of digital truth (or digital behaviourism) epitomized by this study threaten to undermine the foundation of ideas like individual emancipation or of the common?

Like let's say you move to a quaint town that the AI found for you. Who moved in and is now a member of that community, you and your wife or your digital doppelgangers? And where did you move? The town or it's statistical analog warehoused in a data-center somewhere.

When a new shop opens up in the town and you find your favorite brand of wine from your previous town on the shelves ... what's going on?


This is a great piece of work.

I wonder if it would be possible to use a similar approach but on shop signs. Intuitively there should be some kind of relationship between the type of shops in different areas, but I guess it would be much less dense data than car type.

I guess the next step is to make a completely end-to-end version without the manual feature engineering.


There was a similar discussion[1] here about that a while back using data about different business types to assert gentrification levels.

1. https://news.ycombinator.com/item?id=10391753


Or you could just order that data from Experian Automotive, which has vehicle registration data from all US states.


But (I assume) that only has the home address of the car. This method can be clearly extended to do things like measuring the socio-economic features of people parked at events etc.


But I need to get a job at one of the big tech companies, and so I need to show them I can do deep learning.


Doesn't matter! You'll still be asked to write perfect code, with awesome variable names, on a whiteboard, with eyes closed.

Don't forget to do handstands to stand out from the crowd.


Don't forget to practice doing math like 36^5 in your head. No pencil or paper allowed!


Well, let's see, 36 is 00100100 in binary. Binomial coefficients are 1 5 10 10 5 1. 2^25 + 5 * 2^22 + 10 * 2^19 + 10 * 2^16 + 5 * 2^13 + 2^10. 10 is 5 * 2, so bit shift those middle two. 5 is binary 0101, so add copies bit shifted by 2. 2^25 + 2^24 + 2^22 + 2^22 + 2^20 + 2^19 + 2^17 + 2^15 + 2^13 + 2^10. Comes out to

  0011 1001 1010 1010 0100 0000 0000
  0x039AA400
Which is a perfectly cromulent numeric response. If the interviewer asks what that is in decimal, they ought to be glad I don't have a pencil, because it would end up in someone's eye on my way out the door.


Me as interviewer: have you heard of the XOR operator?


Yeah, it looks like a plus sign with a circle around it. It's not in the ASCII character set.

Sometimes, programming languages will use the caret character to represent XOR for boolean types and bitwise XOR for integral types, perhaps using two asterisks for exponentiation, or leaving it as a named function call, like Math.Exp or Math.Pow. But you have to be careful there, because sometimes Math.Exp is actually the inverse of the natural log function, rather than exponentiation.

If caret means XOR, then 36 ^ 5 is 0010 0001, or 33.

And that is the point where the interviewer marks me as "do not hire", because they can't stand it when they can't show off how smart they are to the candidate.


If you would need to practice that, the job is not for you.


It seems like the idea is to use this method places outside of the US, which may not have the same level of data available.


I once attended a talk in which the speaker presented a system for inferring the orientation of a mobile device in space. This was before smart phones and the system used accelerometers, some computer vision, and fancy algorithms which were running on a server because the device didn't have enough computing power. One person in the audience asked: Why don't you simply use a compass? The answer was silence.


Nah, that data is weeks old. Some states are as slow as 3 months to hand the registration data to Experian. Better to scrape classifieds ads to see what people are selling ;-)


Let's ignore building racist AI for a sec. This is hugely flawed because most street view data is NOT updated on a yearly bases. Many images are years old. Doing yearly door to door will get fresher data.


> We focus on the motor vehicles encountered during this journey because over 90% of American households own a motor vehicle

This is a clever hack, but I sorely hope no actual policy decisions are made on the basis of a methodology that wilfully ignores the 10% who choose not to own, or can't afford, a car.


Can there possibly be a methodology that is 100% correct?

If you try to interview 100% of people, then the results will be out of date by the time you've reached everybody. (Plus the Hawthorne effect suggests people will start changing their answers as people comment on the questions they are asked.)

So you create samples, and necessarily some of those samples will be unrepresentative because they will miss some nuance. Catch 22 - unless you interview 100% of the people, you can't know what all the nuances are.

I guess the best we can hope for is a result that merges lots of different results using lots of different methodologies, and base decisions on that.


From the paper on collecting Street View images: "This was done via browser emulation and requires only the latitude and longitude of each point"

My question is does the TOS for Google Maps or Street View allow for this?

I'm not trying to diminish the research - it is SUPER cool. I'm just thinking it would be good to have access to the dataset they captured if publicly available.


At least parts of it will be publicly available. Our goal is to make the entire code/data publicly available. We might have to provide URLs and bounding boxes/other meta data for each image. But we're trying to figure out how to do this properly.


I want people to use technology to find what those people who drive a car into pedestrians have in common. There must be something unique about them, the media says they all have mental health issues. Suddenly since 2016 people with metal issues got access to cars and trucks.


Extracting life quality related data from streetview does not seem to be a new thing: https://hackernoon.com/machine-learning-our-cities-617ce005b...


does anyone know how they download this data ?


Weev is working on a similar project: https://youtu.be/ZMptVkyZWE4




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: