Hacker News new | past | comments | ask | show | jobs | submit | mmasters's comments login

I was part of an experimental neuroimaging group that tested Pachyderm OSS years ago and at the time we were really impressed with the versioning capabilities it provided. For us at the time it made it easy for each researcher to grab and change data as needed for their own development without requiring support from eng.


How well does that work when you datasets are a sizeable percentage of available storage capacity, though? Is there some sort of deduplication at work?


Pachyderm does a ton of data deduplication, both for input data that's added to pachyderm repos as well as for output files.

Pachyderm's pipelines are also smart enough to know what data has changed and what hasn't and only process the incremental data "diffs" as needed. If your pipeline is just one giant reduce or training job that's can't be broken up at all, then this isn't valuable, but most workloads include lots of Map steps where only processing diffs can be incredibly powerful


This is super cool, thanks for pointing that out. Is the hard part done by Pachyderm or as some layer over container file systems?


Pachyderm does it. It's like half of what pachyderm does, manage the versioned data, and schedule workers to run your containered processes against them.

FYI, it's ridiculously easy to get going playing with Pachyderm if you just want to check it out. You can run it on Minikube.


> You can run it on Minikub

Thanks for the tip. I just started down the k8s path from bare metal cluster and will try this.


A recent innovation we thought you should be aware of is related to the ability to provide for increased spatio-temporal resolution of the underlying EEG data set as a pre processing step prior to feeding the recorded EEG data into machine learning algorithms. TRUUST has pioneered this and is seeing fantastic results Pre-Clinically on MEA's in drug discovery research for Fragile X and Epilepsy indications. The technology was developed with Epilepsy in mind however. If anyone would have an interest feel free to reach out to us info@truustneuroimaging.com and below are related publications and resources. We thought it made more sense to enhance the data quality first rather than trying to optimize the crap out of algorithms given the problem generally results in better outcomes when better data goes in; garbage in garbage out sort of deal.

Published paper in Journal for Neuroscience Methods: https://www.clearslide.com/view/mail?iID=3f3TTfMPJNBRhXhRDJD...

Published Poster with Scripps at SfN for Fragile X: https://www.clearslide.com/view/mail?iID=C5dp3gjmMWnMxKktk44...

Cool video showing what is possible with recorded EEG: https://www.youtube.com/watch?v=rhRwpAA1KeA


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: