My version of this is called Dogsheep, and involves building tools that suck my data from various sources into SQLite databases so I can query it all using Datasette.
I enjoyed reading this author's essay on why he doesn't think loading everything into relational databases is the best approach: https://beepb00p.xyz/unnecessary-db.html
I think I have a workarounds for most of the issues he describes there. One example: I mostly create my base table schema automatically based on the shape of the incoming JSON, and automatically add new columns if that JSON has keys I haven't seen before.
Very thought-provoking writing (and software) here.
Ooooh, I totally forgot to add your link, I remember you commenting, sorry! Will amend now.
Ah I see, so your approach is using databases as an 'intermediate' layer, rather than the main source of 'truth'.
My main objection to databases is the fricton with the maintenance and the speed of prototyping, but if it's automatic and used as a cache, I agree it's reasonable. Although I'd still be worried about normalising something wrong, but this is my personal sort of anxiety :)
I've got something similar [0]: if you have a sequence of NamedTuples/dataclasses, you can get a database 'for free'.
It downloads your data from social media (or other online sites) into a single sqlite database. It can even draw relations between various accounts/people. I use it to back up my Google Photos.
I've focused on Google Takeout exports so far. Only gotten Hangouts and Voice working at this point.
I feel like all of us need to be working on a common codebase for all these parsers. I just found out a week ago that the Hangouts format changed significantly and totally broke my parser (which I actually adapted from https://bitbucket.org/dotcs/hangouts-log-reader/)
Yep, agree about the common codebase, that's why I'm trying to keep everything modular and thinking a lot about it. I describe this bit of my philosophy/design here [0].
I've only written parsers/exporters if I haven't found any (or any that wouldn't be a complete pain to adapt) so far. Google Takeout processing is a part of HPI, but I was going to extract it in a separate repository, so perhaps we both could benefit from it.
Hmm it's interesting how diverse the data sources people are actually using, but there's definitely overlap. I also have tried searching before I write anything myself, and often I find something (both Hangouts and Voice were like that). But it's funny how my use case is always significantly different, so I end up having to put in plenty of work on top of what they've already done.
It sounds like a ‘schema-less’/document database like Mongo might work better for API results, which are often non-relational but sorta-hierarchical instead.
(Shakes fist at the RTM API which would return one and many child tasks in different ways, so it took me an extra half-hour to write the convoluted Jq query.)
I have been toying with a similar idea for a very long time.
I want to have an all-encompassing personal history for the same reasons I want good access and error logs. I want to see when something wrong happens, and I want the means to diagnose the problem and fix the damage.
A few good examples:
- Did I forget to write down any tax-deductible expenses?
- What is my net worth? Is my current lifestyle sustainable?
- Which parts of my city have I yet to explore?
- Can I search all my conversation with John across 3 messaging apps, email, SMS and phone?
I took a few jabs at these problems. Invariably, the services I consumed data from shut down their API or hid my data better. In some cases, I wanted to use a different service, but didn't feel like updating my scripts.
I'm currently looking into this again, but this time it's also a matter of privacy. I want to own all that data I create. I'd rather integrate tools under my control than services who actively try to hold my data hostage.
> Can I search all my conversation with John across 3 messaging apps, email, SMS and phone?
These are totally pain points for me / questions I'd love to have some answers for... Anyone tried to get the historic iOS location data to answer the first question?
There is a open source digital forensics project called APOLLO (https://github.com/mac4n6/APOLLO) -- that could be used for personal use to get location information off your phone. Here's a blog post by the author, with a little bit of explanation:
> What is my net worth? Is my current lifestyle sustainable?
YNAB's method is excellent for this. TLDR: you record your expenses for some time, and then you begin planning them ahead for each month semi-automatically, putting in money for costly purchases in advance and leaving some for when shit happens.
I've been kinda looking for an OSS/self-hosted implementation of the method, but none seem inviting so far.
(G2A seems to have a Steam key for the old version of YNAB, under ‘You Need A Budget’—which version could sync via Dropbox, without its own cloud service. But the key costs quite a bit more than what I've paid for the app back then.)
> Which parts of my city have I yet to explore?
I'm using desktop QGIS to mark my trips, while OsmAnd on the phone shows the exported GPX on the go (among other pleasant features). In theory I can use OsmAnd itself to record the trips with GPS, or any of the other dozen apps for that, but personally I don't believe that Google won't suck the location up regardless of my settings.
Only if you think of "maintaining your system to improve your life" as work. I would think of it more as play. It's not time wasted if you enjoy it.
Hm. I actually don't agree with that sentiment as written. It's not time wasted if you enjoy it and have something to show for it? That's not it either.
Maybe this. I view the work we're discussing like a programmer's equivalent of woodworking in your garage. Sure, it's not necessary. But it's creative, fulfilling, and enjoyable. There's something immensely satisfying about using something you made with your own two hands, even if it's not perfect. As long as all these things remain true, I wouldn't call it time wasted.
To me your critism reads something like, "if you wanted to canoe, why did you spend a year making your own? you could have bought one and spent that year doing what you really wanted to do!" Well, I wanted do that, but I also enjoyed making something to do that too. I feel my life is better for having made it.
> It's not time wasted if you enjoy it and have something to show for it
Yep, agree! I really enjoy it, my only problem is that there are too few hours in day :)
I guess it depends on the goals though -- it's perfectly fine to build something just for the sake of it, as long as you have fun. One of my goals through is to stop/pause the active phases of building and spending more time using it. Partly because of the lack of time, partly because it means iterating and reflecting on whatever I've already built. So ultimately I agree that it's important to improve your life instead of building something that may improve it one day.
I mean, do I want my time back? Yes, sure, like with any learning in hindsight it always feels that I spent more time than necessary om this.
But the thing is, I did improve my life while building this! I've learnt so much: obvious things like building better and more resilient software to my preferences in terms of tools and services. I've started the blog to share this system and the related things, which made me appreciate writing, and I feel like I'm getting better at it. The whole quantified self thing was super useful: it motivated me to think and learn how my body works, to eat healthy, etc. However stupid it is, self-tracking gives me extra motivation for the regular exercise and trying to pish my limits.
Every new bit of data I'm adding is easier and smoother, so I can easily imagine myself using this system for years (and fixing minor bits that break once in a while, just like people fix things in their homes). Sharing it also means I can potentially improve others' lives, which makes me regret the time I spent building and researching much less.
> "if you wanted to canoe, why did you spend a year making your own"
I feel it's more like "you can rent/borrow a canoe, but you only have a spoon instead of a paddle which you can't switch. Oh and it can also collapse anytime. Why would you spend a year making your own?"
Starting on a project to improve one's life by engaging latent gifts is one of the smartest things one can do. It is aligned with the best advice psychology can offer. If that means maintaining a measurement system works for you--great. It's not as if we lack data showing that measurement systems are beneficial to human development.
Today abstraction is no longer that of the map,
the double, the mirror, or the concept. Simulation
is no longer that of a territory, a referential being
or substance. It is the generation by models of a real
without origin or reality: A hyperreal[1]. The territory
no longer precedes the map, nor does it survive it.
*It is nevertheless the map that precedes the territory—
precession of simulacra—that engenders the territory.*
Fantastic job with this! Feels like I've stumbled upon a [vastly more productive] clone of myself.
The best way I can describe this type of thinking is a very strong, overriding compulsion to think and live in an augmented, external, hivemind-like capacity. It's beautiful, but it certainly isn't normal.
It makes one wonder what the etiology of such thinking is. Were I to guess, based on personal experience: hypomania, ADHD, obsessive tendencies, indignation at tech platforms sucking so bad, technology fracturing our fucking minds. Any combination thereof.
>I'm not willing to wait till some vaporwave project reinvents the whole computing model from scratch
First off, there is something deeply satisfying about seeing your stuff properly categorized and easily accessible when needed so I definitely understand the impulse. And the result looks like it does exactly that. Kudos.
There is a part of me that dislikes this level of insight someone can have into my life and habits ( yes, it is intended for owner use, but the data is there, neatly organized for someone to access ).
It is a weird comment for me, because I absolutely see the benefit of this project.
I can see your concern! I guess I've got several thoughts on it:
- One is along the lines of post-privacy:
Lots of the stuff that I'm collecting one can find online anyway if they deliberately search for it. Tweets/reddit comments/instagram photos, etc.
Some of it isn't public, let's say, Amazon purchases. But would I mind sharing it?
Maybe not, who cares what I buy? Some ML algorithm that would show me ads? Whatever, I don't click on them anyway.
The more stuff I don't mind to be public in the first place, the less is the chance someone can use it against me.
But I understand it's ultimately something very personal.
- Second is that I agree that the security professionals would handle the data much better than me.
I don't even mind Google keeping my data, if only it was easier for me to access it when I need it!
So generally I would happily pay professionals for a service to keep my data safe, being able to access it and integrate together, and for the infrastructure around it.
Same way I'm using a bank to keep my money safe.
But such infrastructure for data simply doesn't exist at the moment, because there is no demand for it from people.
And silos only benefit the companies that keep this data, so the change isn't going to be come from them.
E.g. if you can easily access your tweets through a nice and fast local app, why would you even visit bloated twitter.com, which is trying to 'entertain' and 'engage' you?
I'm hoping that with projects like this, I can inspire people to think differently about their services and tools, and demand (from professionals, i.e. engineers/designers/product managers) for better means of using their own data.
> There is a part of me that dislikes this level of insight someone can have into my life and habits ( yes, it is intended for owner use, but the data is there, neatly organized for someone to access ).
Well this is already happening. Google and facebook for example earns basically all of their money from this. They replaced advertisment middle men because they can track and quantize human behaviours and identities. They do it because they are able to aggregate the data and provide API for marketing people.
This is a tool that brings it closer to you as the creator of the data. it doesn't do anything new just exposes what's going on.
If you are able to open pages through EU check out data processing partners from GDPR consent you have to accept. Lists on every professionally built services are long. There are literally thousands of companies accessing, aggregating, processing and reselling the data and their analysis.
The tools you are afraid of already exist and are commercially available.
Polar opposite - Does anyone enjoy digital amnesia? I feel like I might be a little weird here, but every so often I'll just go in and delete all my notes, reminders, apps, photos, etc and start again.
Wondering if anyone else does this?
I like doing it because it helps clear my head, even though the files or data are digital they take up some space in my head. I do it every few months, just have a fresh slate. I wonder if that says something about me - I'm not really sure. It's probably just another form of procrastination.
I once read Stephen Wolframs post about how he has a key logger that backs up every keystroke he types and its all indexed and he has a little front end over his enormous amount of data and it made me feel anxious just even reading it.
I think most people would want amnesia for certain memories like traumatic events. E.g. a romantic interest rejected you, or football quarterback throws an interception and needs to forget it to fully concentrate on the next play.
For most things, I wish I had more digital documentation of my past life. There were places I visited once that I wish I had photos for. There were people that were important in my life but I forgot their names and never wrote them down. I started keep a daily journal for the last 10 years and it's been so useful that I wish I had been doing that since I was child. If I had a git repo for everything single piece of code I wrote since I was a kid, that would be cool for me to revisit.
Back to your point, another type of purge that's useful are open projects that just nag you. For example, I had a "home automation" project on my todo list for years. In addition to taking space in brain, there were half-assembled open boxes of electronics on the shelf for years as physical reminders of that unfinished task. I finally decided to be realistic about my enthusiasm for that project and just abandon it. I sold the electronics and I feel better now. If I later decide to tackle it again, there will be newer and better technology anyway.
I also had a bookshelf of foreign languages for years that I thought I'd learn for foreign travel. After the COVID-19 crisis, I decided to give them away since I won't be traveling overseas for years. Now their absence no longer remind me of things I never got around to.
Purging bad emotions and unrealistic projects -- yes. Purging life data -- generally no.
I do something halfway in between - i routinely get frustrated with everything that’s not relevant to my present concerns, and just throw it into a folder called ‘archive’ and then move THAT into the big central archive folder.
I never really look back through any of it but when I do it’s quite nostalgic.
I do the same thing too. I’m not sure if I’ll ever look back through any of it but it’s nice to know that if for whatever reason I feel the need to review old data, my future self will be relieved that it’s not gone into the digital ether.
"Polar opposite - Does anyone enjoy digital amnesia? I feel like I might be a little weird here, but every so often I'll just go in and delete all my notes, reminders, apps, photos, etc and start again."
I think there is a happy medium and you can achieve it by keeping your inbox inside your trashcan.
This is not my idea and I have heard of it, in various forms, from different corners.
The idea is that all of your "important" papers get placed, as you receive them, into your larger (deeper) than normal trash can. I use a deep tray for this. It is basically a trash can that takes 6 or 8 months to fill.
When you fill the trashcan, you remove the bottom 1/4 of it and actually throw it away - since it turns out it wasn't important or necessary or actionable.
You can (and I have) created digital metaphors for this:
For instance, I dump my profiles directory, with all of my browsing history and bookmarks and so on, every month or so and then reset my browser. Then I delete those tarballs later when it's clear they were not important.
The key is this:
You can retain maximal optionality while still minimizing processing and sorting time. Just chuck it all into the "trash" bin and if it's really important you always now where to look.
I think I'm with the sibling commentators: deleting something when I could keep it feels like an obliteration, a destruction, a waste; on the other hand, realistically I don't need it and am almost certainly never going to look at it again. Almost. But if I did and wanted to and couldn't, I'd probably feel loss.
So I chuck it in an archive. Gradually expanding storage at low prices means never having to think about deletion.
I don't use any of the systems for sharing tabs or history across browsers, because each has a different set of almost abandoned tabs. Sometimes "tab bankruptcy" is forced on me by software failure, and it's frustrating.
Yeah, I have no problem not having my IMs from 15 years ago. If it doesn't fit in my brain or crucial enough to commit to backup then it's best to let those bits disappear for me.
I do it with tabs in the browser... if there's like 30-40 of them (usually for "reading later and saving any good info"), yeah they get closed. I've yet to miss something...
Notes, not really, there's a ton of them in Keep so I look up some stuff from time to time.
I have a massive collection in OneNote notebooks, and once a year it seems I check them and remember some good stuff, but mostly just how dumb I was before :D
I can relate. It normally happens when I’m so overwhelmed with my system, PKM (Personal Knowledge Management), todos etc. Maybe I should just embrace the discomfort and work with the existing data by extending/hiding/refactoring instead of destroying.
Kudos - this looks like a great project and is fully usable now.
I have a similar project, DL, that's unfinished. Mine revolves around using a custom API in both Rust and REST to aggregate all my digital life events using ActivityStreams 2.0 and extensions to that, in a manner that is decentralized and ranked/categorized through machine learning. I am still working on it and releasing it Open Source is one of this year's goals.
My motivation is that the amount of information I receive from Twitter,Mastodon,Facebook,Reddit,HN,various Stack exchanges,blog postings,etc. has gotten to the point where it's too easy to miss things.
Jeremie Miller, one of the creators of XMPP, had something similar revolving around the Telehash protocol. As far as I can tell, that effort is discontinued, or at least no longer Open Source.
Hey, we are building something similar. Please checkout https://metamate.io/blog/most_advanced_hackernews_api. For now, we concentrate on reading public data from various social sources. We just released a HackerNews service and going to add more over the next couple of weeks. Here's a little application build on top of a MetaMate http://showcase.metamate.io/hackernews-activity. Please let me know what you think :)
This is awesome, I'm working on an personal, self hosted, IoT System from scratch where I have several fetch agents in huggin to get data and store it in "The Archive" (DataStorage Server)
Than I have different API's to that my IoT devices call to fetch the data from The Archive to trigger events and generate dashboards.
This might be a great way to optimize it and save my some development time for things that I don't have yet implemented. (Twitter archive, messages, etc)
Some examples of things already implemented:
- Raspberry Pi Touchscreen on my desk with dashboards
Maybe something in the spirit of Home Assistant? But for general service-interaction, not only home automation.
That would be so useful, as also kinda dangerous. It would be also a damn amount of work, just for managing it. Maybe building some general framework and API and let people build it decentral?
I list my most common usecases here [0], but if I had to name just one thing, it would probably be promnesia [1].
It's an extension that I'm working on, with the purpose to actually use all this information and integrate it in my browser, like a remembrance agent.
Cool project. I've been toying with and thinking about similar ideas lately but more focused on my productivity as a freelance software dev/ops person.
My approach is basically to view myself as a distributed system that receives various inputs (jira tickets, PR comments, slack messages, emails) and performs work that produces outputs (PRs, emails, PR reviews, deploys). I'm trying to model workflows for all types of work I do, and write plugins that process various inputs and outputs so tasks can enter and progress through these workflows.
Cool project. I have been on the lookout for something like this.
The idea of exocortex seems to be something that many people want, and almost universially re-invent because the way in which structuring makes sense is different for everyone. I like the approach of making it modular so people can pick and choose parts and do their own integration.
What I learned from thinking about this problem is that this kind of integration is really hard and almost impossible to tackle without an evolutionary approach, and it made me go meta and focus on incremental development of the integration layer first. My interpretation of the idea of exocortex might be a bit different in that the focus is not so much about information per se, but more about systems that interact with the physical world. Here's a writeup in case anyone is interested in where it took me: https://github.com/zwizwa/erl_tools/blob/master/doc/exo.org
I don't mind merging new modules at the moment, so would be happy if you try it out and contribute!
But it's also possible to use your module without merging in the upstream HPI package [0]
In the long run, I'm not sure how sustainable it is, since there are many different data sources and ways of representing them.
This is also particularly difficult when you don't use the service, e.g. as a maintaner I wouldn't have any means of testing Traktv.
So far I've kept all HPI modules close, because a monolith is easier for prototyping and refactoring. But my fear is the fate of oh-my-zsh, or spacemacs, which are overwhelmed by the pull requests.
Ideally I think it should be a simple core, only containing the common utility functions, extraction helpers, error handling, caching, logging, that sort of thing; and make the rest
third-party.
It's possible to achieve this in Python, thanks to the namespace packages. The only problem I see is managing these small individual packages, and declaring dependencies between them. This is possible with PIP and setuptools, but there is certain overhead involved, I feel like this step is ought to be simpler, especially to people who don't want to fully dive into Python.
It's the Python plugin system that was spun out of pytest. I'm really impressed by it - it's a very clean design, and integrates great with Python packaging.
I'm using it extensively for Datasette, which means myself or others can add new features to the core software without needing to ask permission and in a way which supports trying out crackpot new ideas without sullying the design of the core software.
I've been thinking about usi it for my Dogsheep personal analytics suite too, which is currently split up into a bunch of separate tools in separate repos (since the only unifying interface is that they all spit out SQLite databases).
Thanks, looks great, I'll check this out!
So far I haven't even done proper research of potential solutions, because I feel like it would be overengineering at this stage, I'm still figuring out the core.
When I decided to add plugins to Datasette I asked around and pluggy was pretty much the universal recommendation - with hindsight it's worked out really well so I'm happy to pass on the recommendation.
> since there are many different data sources and ways of representing them
I've been working on something quite similar for a while (in Rust).
I decided to normalize data to schema.org types where available and store JSON-LD documents in a key/value store. While the schema.org types are far from perfect, it at least morphs data into a standardized format.
Yep, most sources are org-mode, there are few ipython notebooks too.
Hakyll in HTML sources is actually just an artifact left in my template, from the times when I did use Hakyll. Should probably remove it.
I found it a bit overkill for my purposes and often overly restricting me, so I've switched to a Python script to generate everything: https://github.com/karlicoss/beepb00p/blob/master/src/build....
It's a bit ad-hoc, but ended up the same length as my old Hakyll code, and allows me to experiment much faster.
Reminds me of the Feltron report[1], in the self-quantization realm. The person did it for 10 years and stopped in 2014. The accent was put on the visualization.
I am putting as much as i can into my TODO+archive database (see https://github.com/andrey-utkin/taskdb/wiki/Live-demo#workou... ), and it is pretty neat already for analysis with querying and visualization. But your stuff is orders of magnitude bigger. Possibly I will set up HPI for myself some day.
Looks cool! HPI is more about accessing data, so it should be possible to combine the projects.
I was thinking of using Grafana too, and adding an automatic influxdb integration (via https://github.com/karlicoss/cachew, which I'm already using for sqlite), so let me know if you set up something! :)
This is incredible. I've been slowly mapping out a similar idea, and gaining the programming chops to build it, so I thoroughly appreciate all the work this has to have taken to make real.
Heres an older post I made with thoughts on this topic of centralizing data and what could be done with it: https://news.ycombinator.com/item?id=21673846
>HPI is a Python package (named my), a collection of modules for:
- social networks: posts, comments, favorites
- reading: e-books and pdfs
- annotations: highlights and comments
- todos and notes
- health data: sleep, exercise, weight, heart rate, and other body
metrics
- location
- photos & videos
- browser history
- instant messaging
I've been studying the concept of "life management software" for decades so thanks for sharing your project and I enjoyed reading your thought process.
The concepts I always think of that the idealized software would manage:
+ timeline: [past] --> [present] --> [future]
+ The Big 2: time & money
+ reflection vs growth : mining the past data for patterns and metrics -- vs -- managing and prioritizing a future wishlist to deliberately design a future life
I've come to the conclusion that it's very hard to come up with a good schema that unifies all aspects of life that's important to me. For example, looking at your list above, it is heavily populated by digital artifacts (things that happened in the past). One of the exceptions in your list that is "future" oriented would be "todos" and possibly "notes".
I'm also interested in life "situational awareness".
E.g. the other aspect of life data is money which means financial budgeting/planning. Another aspect of life data is time so a digital calendar of future events -- and goals -- is essential. Yes, I keep a list of books I've read (the "quantified self") -- but also books I plan to read in the future (designing a future version of myself a.k.a personal "growth").
There's nothing wrong with your project. I'm just explaining my observations after seeing various attempts at this over the decades. This includes 1980s software like Borland Sidekick, 1990s PIMs (Personal Information Managers) like Lotus Agenda, late 1990s PDA (Personal Digital Assistants) like Palm Pilot, 2000s software like Evernote, and a hundreds of 2010s SaaS "todo/calendar/notes" websites, or uber-geek tools like Emacs Org-mode.
None of the above really do what I want so for now, I just split my digital life across various tools and files. My daily notes in "journal.txt". My financial planning (and bank & credit card data downloads) in "budget.xls". Saved webpages in ".mhtml" files. Planned programming projects in "projects.xls". Etc.
Yep, totally agree it's hard to incorporate the future, especially when it's in free-form like org-mode notes. Past data is much easier becuase it's at least somewhat structured and easy for the computer.
I guess my take on this is that subjective metrics, like the one people use to define 'success', 'growth', etc., are orders of magnitude harder to grasp than objective metrics (e.g. hard data, like sleep or excercise logs). Yet, we don't even have good means of incorporating the hard data in our lives, so I chose to start with an easier problem.
Yeah you're right, my future planning is all in my org-mode notes, I reflect on it with my brain, but so far not sure how I can use software (apart from organizer software) to aid me with it. It would certainly be an interesting area to explore.
Hey, thanks. I'm listing it as once of the links, and I tried using it, but I kind of struggled to see if it can integrate with the existing infrastructure.
I wish to see some kind of encryption and token access for each module. A quick glance of the setup.py, it seems to rely on appdirs to store user data unencrypted?
appdirs is just a module to find the config directory in a portable way, the data can be stored anywhere you point the config to, but you're right that there is no encryption involved.
Note that it's meant to be running on your own computer, and it's using the filesystem to access the data, no network interactions involved. Ultimately it's about how you're protecting the data on your disk and whether you trust youself with it. Of course, my code has to be trusted too, but it's at least possible to run it in a sandbox/use Docker, etc.
The modules run untrusted Python code, so if you keep your token on the disk, they can potentially steal it token too, unless you use some elaborate authentication system.
Is there a myDigitalLifeInAGlimpse function that would generate a full synthetic report, ready to be shared on social networks ?
If you can add some social scoring, it would be great too. Maybe if you have a way of certifying it too, it would be great indicator to put on resumes to show how much better you are than those other candidates.
What about the next step ? Have you considered just putting screen recording (+ body cam if you want to extend it to all life monitoring and not just digital) and store all the footage. Then it's just some video indexing. You can always post-process the data as new deep learning algorithms become available.
Hey, I've got a crude timeline of events (simply rendered as giant HTMLs), but it's private at the moment. I'll probably share some of it in the future, especially considering that some of the data is public anyway (e.g. online comments or tweets), it makes sense.
Next step: yep, I throught about it a bit [0], agree that it would be great as a service-agnostic way of accessing the data, the same way we can see it with our eyes.
But figured I should start with strictly easier problems for now, ones that I can solve at least to some extent. For now my goal is simply enjoying using it, making the setup simpler (to make it accessible to more people), making sure it's robust to the changes, flexible enough, etc.
I want to inspire people to own their data. In many ways, I'm advocating an approach to owning it and working with it, rather than a specific set of tools, formats, etc. It would be great to have more people experimenting in the same area and trying to loosely integrate with existing efforts.
Optimizer: to some extents, yes! Not sure about 'general happiness', but I'd happily let the computer dictate me what to eat, how to exercise and how to sleep. Maintenance is probably one of the most annoying things in my life.
I don't like your project. Data aggregation is powerful but very dangerous and I feel that you are handling it carelessly. I also get a strong Borg vibe here.
In the current state, people don't know how to handle their data correctly. You are just making it easier for some people siphon other's people data even more. And having their data played against themselves.
Most people are not quants. And becoming one is not something trivial. Having people play unguided with their own data is like giving uranium to little kids. They will have a great fun !!!
> Everything is local first, the input data is on your filesystem. If you're truly paranoid, you can even wrap it in a Docker container.
> There is still a question of whether you trust yourself at even keeping all the data on your disk, but it is out of the scope of this post.
> If you'd rather keep some code private too, it's also trivial to achieve with a private subpackage.
This seems as safe, if not safer, than the average service where a rogue employee can read/write to your data with little to no visibility.
> You are just making it easier for some people siphon other's people data even more.
The data is already being siphoned off of them -- usually through vague legal agreements and opaque security infrastructure they can't audit.
At least with this model, a user can see the whole scope of data a service is collecting and act accordingly: whether by limiting the data collection as needed or by switching to another service.
> making it easier for some people siphon other's people data even more
I guess you mean someone could easily steal your data, etc? I agree, it's a tradeoff, would be interesting to explore how to make it more secure.
> Having people play unguided with their own data is like giving uranium to little kids
Well, handling Uranium is a bit different. Can you elaborate on some specific examples?
I've seen people misusing statistics, and drawing misleading conclusions, sure. But even more people are using anecdata and broscience, which I think is worse.
Not even stealing some data is needed. Just sharing publicly some data, may leak information that can be used against you. Bots do build profiles and will happily ingest any information you give.
If your top ten "most interesting Slate Star Codex" contains certain keywords, they will flag you in their database.
>Can you elaborate on some specific examples?
-People having sport injury from those sport performance training app. Their knowing of their data made them push-themselves past their limits.
-Little kids getting bad marks, and assuming they are just bad and not persevering. Their knowing made thing worse.
-Hypochondriacs becoming ill from the stress of all the illnesses they think they have.
-People doing weight loss badly focusing too much on the scale.
-Musk tweets :)
-People censoring themselves from knowing that having an unpopular position will cost them karma points.
As soon as you make a system self reflective, you impact it in profound ways. You generate some reinforcing feedback loops. That tend to either break the stability resulting in chaos, or tend to lock it into a specific behavior from which it's hard to escape.
Ah I see what you mean. Yeah, absolutely, an aggergate system would enable such behaviours.
I guess it comes down to a personal level of paranoia. And to the old question "Is ignorance bliss?", for which people had very different answers long before the software.
It’s infantilizing to imply people are incapable of handling the responsibility of having access to the personal data they’re creating, as if it’s better to just trust it with private companies that leverage it to monetize us.
I’m sure achieving whatever your sufficient level of “quant” isn’t trivial (congratulations presumably) but a command line interface took shared with the Hackernews community is fairly self-selecting.
I think this is a super interesting project, looking forward to seeing how this gets developed!
https://dogsheep.github.io/
I enjoyed reading this author's essay on why he doesn't think loading everything into relational databases is the best approach: https://beepb00p.xyz/unnecessary-db.html
I think I have a workarounds for most of the issues he describes there. One example: I mostly create my base table schema automatically based on the shape of the incoming JSON, and automatically add new columns if that JSON has keys I haven't seen before.
Very thought-provoking writing (and software) here.