Hacker News new | past | comments | ask | show | jobs | submit login

Big Data in the Automotive Test Industry

Ever log every single sensor output (as well as 200 more) every 50ms of a vechile over an 18 hour road test while recording audio and thermal video from 2 dozen different points? How about on fleet of 40 vehicles daily, for 2 months, in the middle of Alberta or Death Valley.

Have a good way of querying, analyzing, processing, and securing all this time series data in a way that can handle literally getting >100TB per hour? And can keep up with the expected geometric growth? (I've got a decent solution for this actually currently needs to be vetted). (Also security has to provided on a per-channel basis, not per-test, T1/T2 companies need access to their test data, but not global data).

Contact me. I'm a member of the ASAM standards committee that recently met to discuss how basically every auto-producer and tier 1 has NO CLUE how do implement this. And easily 2 dozen companies are just waiting to throw money at this problem.

Currently one doesn't exist, and the solutions that do exist manage paths to raw data blobs, not actual records/data points.




I think I work for one of those 2 dozen companies. I have exactly the same problem at the moment.


If you want to know how to capture and handle automotive test data, look into process automation systems for manufacturing industry. I used to work at petrochemical facilities in 90s as process control engineer, we not only received data from Thousands of I/O inputs (for example, from temperature, flow, pressure, vibration, rpm sensors) every second to every tenth of a second, but we processed them, made decision and sent out I/O signals to controllers (for example, flow control valves). We used combination of programmable logic controllers and distributed control systems.

Talk to companies like Honeywell (I believe they do something similar to what you are trying to do for aviation industry), ABB, Foxboro, etc. These guys have been doing this since 80s.


They do measurement, and live analysis. If there a method of storing and collating data after the fact they haven't announced it to the industry at large.


I could go on for hours about all of this, but long story short, I'm the head of product of a company that builds a data system designed exactly to handle this kind of scenarios (we provide data collection services to Pioneer to mention one).

Feel free to get in touch.


I would LOVE to chat about this for hours. In fact, I applied at Treasure Data just a few days ago for that very reason.

My email is in my profile if you'd like to chat. :) Hope to hear from you!


forgot to add a link: https://vimeo.com/122691639#at=0


Called into the company, and I suggest looking into an operator. If you press 0, you just keep repeating the same menu. Submitted an information request.


>Have a good way of querying, analyzing, processing, and securing all this time series data in a way that can handle literally getting >100TB per hour?

Is this data logging the changes or the states of your sensors? If it's the states, then I am guessing most of this is highly compressible. If it is actually 100TB of changes logged, then that's a pretty difficult problem.


CloudFlare has an approach that may scale to the levels you are looking for; rather than storing the logs, they analyze and rollup the expected responses in realtime, and store additional detail for items that appear anomalous. John Graham-Cumming performed a talk on this topic earlier this month at dotScale:

http://www.thedotpost.com/2015/06/john-graham-cumming-i-got-...

Here is the related HN thread: https://news.ycombinator.com/item?id=9778986

-G


If you have no working model of what is/is not correct how do you determine anomalous responses?

To Expand:

This method is already used in data logging compression (slightly) where one stores channel delta's/time stamps. Reconstructing the value ad-hoc when necessary. This is a good way to compress non-violalitle datasets.

I've actually watched the talk already. And while it seems to apply the problem is it doesn't. Every data point is important, because the real problem is comparing different tests, with time between tests to attempt to get an idea of how hardware ages. Or to test componenet swaping, where a known test is performed on several different items and in post processing the results are compared. To use the suggest method your storage solution requires knowledge of whats being stored.

:.:.:

The goal is to unify these storage solutions, and present a unified front end for querying/report generation.


> a good way of querying, analyzing, processing, and securing all this time series data

Could you expand a little more on what sort of features you would like to see in a solution? I have some relevant experience and I could see myself taking a stab at this problem


Having looked at just the raw data from OBDII on a few vehicles; data format standardization there is none. Not between car makes, models, years or versions (ie 2014 Suzuki Swift vs 2014 Suzuki Swift S) - this would require either months (or years) of reverse-engineering, or unfettered access to auto maker's internal documentation (for some I know it's minimal). (Likely require partnership with the auto industry to avoid litigation.)

If you pulled that together and offered it in a usable format.. Wow.


Do you have such data that I can have a look at


I'm currently working on VC backed solution to this problem and I'd love to chat in order to get your thoughts and requirements. PM me and we'll chat!


I'll throw an email at info@koalitycode.com

You have no contact listed...


we're having similar issues in aerospace. time series data out the wazoo.


I used to deal with radar data on roughly this order of magnitude, which had to be synced to the output of other sensors.

I really miss the challenges of real-time signal and data processing. I really want back into it.


Video data from multiple cameras FTW.


So I'm listening to this Radiolab episode [0] about drones with cameras that can solve crime (traffic and other societal ills) and it occurs to me that with everyone soon to be walking around with DSLR-quality smartphones, couldn't we triangulate all this video/ audio data and provide substantially better resolution to daily life? Think of it as continuous Meerkat/ Periscope localized around an event in four dimensions.

[0] http://www.radiolab.org/story/eye-sky/


I was involved in a project conceptualizing real-time video streams from smartphones and synchronizing, adjusting/correcting quality before having it be presentable... in real-time!

Think of a soccer stadium, with fans taking "video" of the game. All the feeds would be gathered, synchronized, quality adjusted and put online for anyone to view, from any angle.


Yeah ... there is a ton of research out there on multi-sensor fusion; super-resolution & synthetic viewpoint reconstruction.

Even the buzzword-du-jour is getting involved[0].

[0] https://www.youtube.com/watch?v=cizgVZ8rjKA&feature=youtu.be


Formula 1 does some interesting things here...that sport is all about data now..in real time.


I couldn't see a way to contact you. I was about to email you some ideas but looking at your angel.co page it seems like you'd already know how to do this?


I just updated this.

I know the technical side/requirements, and the people. My business skills and design are lacking. Also a whole front end for interfacing/formating/report generation needs to be created.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: