Ever log every single sensor output (as well as 200 more) every 50ms of a vechile over an 18 hour road test while recording audio and thermal video from 2 dozen different points? How about on fleet of 40 vehicles daily, for 2 months, in the middle of Alberta or Death Valley.
Have a good way of querying, analyzing, processing, and securing all this time series data in a way that can handle literally getting >100TB per hour? And can keep up with the expected geometric growth? (I've got a decent solution for this actually currently needs to be vetted). (Also security has to provided on a per-channel basis, not per-test, T1/T2 companies need access to their test data, but not global data).
Contact me. I'm a member of the ASAM standards committee that recently met to discuss how basically every auto-producer and tier 1 has NO CLUE how do implement this. And easily 2 dozen companies are just waiting to throw money at this problem.
Currently one doesn't exist, and the solutions that do exist manage paths to raw data blobs, not actual records/data points.
If you want to know how to capture and handle automotive test data, look into process automation systems for manufacturing industry. I used to work at petrochemical facilities in 90s as process control engineer, we not only received data from Thousands of I/O inputs (for example, from temperature, flow, pressure, vibration, rpm sensors) every second to every tenth of a second, but we processed them, made decision and sent out I/O signals to controllers (for example, flow control valves). We used combination of programmable logic controllers and distributed control systems.
Talk to companies like Honeywell (I believe they do something similar to what you are trying to do for aviation industry), ABB, Foxboro, etc. These guys have been doing this since 80s.
They do measurement, and live analysis. If there a method of storing and collating data after the fact they haven't announced it to the industry at large.
I could go on for hours about all of this, but long story short, I'm the head of product of a company that builds a data system designed exactly to handle this kind of scenarios (we provide data collection services to Pioneer to mention one).
Called into the company, and I suggest looking into an operator. If you press 0, you just keep repeating the same menu. Submitted an information request.
>Have a good way of querying, analyzing, processing, and securing all this time series data in a way that can handle literally getting >100TB per hour?
Is this data logging the changes or the states of your sensors? If it's the states, then I am guessing most of this is highly compressible. If it is actually 100TB of changes logged, then that's a pretty difficult problem.
CloudFlare has an approach that may scale to the levels you are looking for; rather than storing the logs, they analyze and rollup the expected responses in realtime, and store additional detail for items that appear anomalous. John Graham-Cumming performed a talk on this topic earlier this month at dotScale:
If you have no working model of what is/is not correct how do you determine anomalous responses?
To Expand:
This method is already used in data logging compression (slightly) where one stores channel delta's/time stamps. Reconstructing the value ad-hoc when necessary. This is a good way to compress non-violalitle datasets.
I've actually watched the talk already. And while it seems to apply the problem is it doesn't. Every data point is important, because the real problem is comparing different tests, with time between tests to attempt to get an idea of how hardware ages. Or to test componenet swaping, where a known test is performed on several different items and in post processing the results are compared. To use the suggest method your storage solution requires knowledge of whats being stored.
:.:.:
The goal is to unify these storage solutions, and present a unified front end for querying/report generation.
> a good way of querying, analyzing, processing, and securing all this time series data
Could you expand a little more on what sort of features you would like to see in a solution? I have some relevant experience and I could see myself taking a stab at this problem
Having looked at just the raw data from OBDII on a few vehicles; data format standardization there is none. Not between car makes, models, years or versions (ie 2014 Suzuki Swift vs 2014 Suzuki Swift S) - this would require either months (or years) of reverse-engineering, or unfettered access to auto maker's internal documentation (for some I know it's minimal). (Likely require partnership with the auto industry to avoid litigation.)
If you pulled that together and offered it in a usable format.. Wow.
So I'm listening to this Radiolab episode [0] about drones with cameras that can solve crime (traffic and other societal ills) and it occurs to me that with everyone soon to be walking around with DSLR-quality smartphones, couldn't we triangulate all this video/ audio data and provide substantially better resolution to daily life? Think of it as continuous Meerkat/ Periscope localized around an event in four dimensions.
I was involved in a project conceptualizing real-time video streams from smartphones and synchronizing, adjusting/correcting quality before having it be presentable... in real-time!
Think of a soccer stadium, with fans taking "video" of the game. All the feeds would be gathered, synchronized, quality adjusted and put online for anyone to view, from any angle.
I couldn't see a way to contact you. I was about to email you some ideas but looking at your angel.co page it seems like you'd already know how to do this?
I know the technical side/requirements, and the people. My business skills and design are lacking. Also a whole front end for interfacing/formating/report generation needs to be created.
Ever log every single sensor output (as well as 200 more) every 50ms of a vechile over an 18 hour road test while recording audio and thermal video from 2 dozen different points? How about on fleet of 40 vehicles daily, for 2 months, in the middle of Alberta or Death Valley.
Have a good way of querying, analyzing, processing, and securing all this time series data in a way that can handle literally getting >100TB per hour? And can keep up with the expected geometric growth? (I've got a decent solution for this actually currently needs to be vetted). (Also security has to provided on a per-channel basis, not per-test, T1/T2 companies need access to their test data, but not global data).
Contact me. I'm a member of the ASAM standards committee that recently met to discuss how basically every auto-producer and tier 1 has NO CLUE how do implement this. And easily 2 dozen companies are just waiting to throw money at this problem.
Currently one doesn't exist, and the solutions that do exist manage paths to raw data blobs, not actual records/data points.