Hacker News new | past | comments | ask | show | jobs | submit login
I record myself on audio 24x7 and use an AI to process the information (roberdam.com)
788 points by roberdam on Nov 15, 2022 | hide | past | favorite | 400 comments



This is known as life logging with adjacency to sousveillance and it’s a fascinating topic.

https://en.m.wikipedia.org/wiki/Lifelog https://en.wikipedia.org/wiki/Sousveillance

We in general don’t want to be watched by others, but a managed record of our own activities can be extremely valuable, and even more so if you find yourself wrongly accused. Further it can be used to shine a light on corrupt officials, one example of this is the nycplacards exposes on twitter.


The trouble with any such footage is that it can be used against you ("as the the defendant's own records show, they were present in the murder area") but they generally won't extricate you when produced by you, since you clearly have a motivation to use it selectively. So you showing a picture of yourself reading a book during what you claim is the murder night is not an alibi, because it could have been produced at any other time, and you will have a massive uphill battle in the court to authenticate that image, and risk even sink you further if you fail ("the defendant even prepared an alibi").

The only way I would accept a commercial product performing this always-on archiving is if:

1. It's encrypted by default with a strong key that can't be subpoenaed or circumvented.

2. The encoder generates its own random key upon installation which I don't know (recoding effectively random, undecodable data), and I then have to manually change the key if I expect to ever read the recording.

Number 1 allows me to review and footage and only release it if it's in my interest, and number 2 affords me plausible deniability, if I don't release the key I can claim I did not know you need to set it manually.

Sure, as long as you are the only nerd doing this, you don't need this complex setup, and you will probably get to use it the unencrypted footage only in your favor. But when it becomes widely accepted as a social norm (say, everyone wearing Google glasses), you can expect law enforcement will become aware of it as a cheap source of self-incriminatory evidence.


I‘d be interested in knowing of any cases where someone who recorded their own activities used it as an alibi. Right now it's all theoretical. Dash cams are really strong evidence in traffic court, but this isn’t criminal so it has a lower bar. From the other end, body cam footage is powerful when worn by police, and cell phone evidence by bystanders are also strong evidence.


I was falsely accused of some serious crimes and abuse.

It turns out, having security camera footage of the entire period of time showing that not only did you do nothing of the sort, but fundamentally could not have done anything of the sort, and was actually just playing with your kids the entire time, and the accuser had to have known so, changes the dynamic of such a case quite quickly.

Unfortunately, no one seems to take Perjury seriously in California, at least when the accuser is a woman. Despite it being a felony with penalties of 2,3, or 4 years in prison, and airtight evidence of multiple repeated occurrences.


That works now but it won't be long until they can find someone to deepfake an alternative camera showing you red handed and then the court has two conflicting stories backed up by conflicting proofs.


Possibly, but chain of custody and credibility of claims matter. In my case it was all originally stored on a cloud security video storage system which I had emails going back years documenting existed, evidence the video came directly from it, and that it had been installed years before.

Also, the footage I provided was not showing anything unusual or hard to believe. I also had significant history available showing that this was normal and par for the course for me, not that anyone asked.

Their claims were quite extraordinary, and they had zero concrete evidence to back them up.

It still took awhile for the case to resolve, and they tried to make all sorts of other claims - including that the footage was from cameras that were installed without anyones knowledge and was illicitly recording, and was therefor illegal and inadmissible. so I dug up the email where the accuser had asked me for my permission if it was okay if they had it installed because it recorded audio too. That said, it was visibly, blatantly, and obviously installed in the open, and there were witnesses who could testify that had always been the case, so they had no leg to stand on anyway.

Even if I had the opportunity to fake it (which I could have, I guess, or at least edited it or something) it’s pretty hard to fake all the other circumstances which support the validity of it. They never tried to impeach the video itself.

If someone provides an unsourced security video showing the person murdering the president, and the other person shows security video from a confirmed third party using a system that had existed for years, showing they were at home watching TV at the time - and the president appears to still be alive at the time and unharmed - it’s not hard to figure out who is faking it.

I’m having a hard time imagining how they could have produced a video supporting their claims with the same level of even apparent credibility where it wouldn’t have fallen apart immediately with any investigation.

The courts already have to deal with people lying all the time and being disingenuous - it’s why the procedures exist and everything is so painful, IMO.

Knowing the rules and keeping documentation does work, generally.

I would recommend being careful to avoid situations which could be easily twisted or misinterpreted from the evidence however, especially after this situation. And never proactively provide data or show your hand to someone trying to attack you this way, as it can provide them more means to try to twist things and make life harder.

Good attorneys are key here.


Time is really critical. Alot of police investigative work is about stitching camera footage together in a timeline.

I served on a jury where a case was built around camera evidence immediately before and after an event. Of about a dozen relevant data sources, only one had verifiable, correct time. The defense was able to impeach that evidence, and the whole case collapsed. A dude got away with manslaughter.


"A dude got away with manslaughter." He got away with it only if he did it.

If the presented evidence failed to prove his guilt (brd), then he didn't so much 'get away with it', as sufficient proof was not presented to prove his guilt.

Sorry to put words in your mouth, but it sounds like you thought he did it? Which, isn't necessarily a bad thing, as there are balances in the system that prevent you from being the sole judge, jury and executioner regarding the situation.


Absolutely true in theory and I accept the outcome, although it bothered me for some time. It didn't meet the standard of proof; however the family successfully obtained a civil judgement later. But there is no doubt in my mind that the guy was guilty of hitting a fallen pedestrian, probably by accident, and driving off.

The thing that broke the case was a rhetorically talented defense attorney breaking down a (poor) expert witness who was inexperienced and unprepared. Certain traffic control devices use NTP to set their clock, and the expert wasn't able to articulate responses to the questions competently... so the video was admitted into evidence, but the metadata was not.

The loss of the time source placed events depicted on a series of cameras from different sources, all with incorrect clocks, into a 10-minute window (as opposed to a 2 minute window), which broke the case and led to a dismissal. The attorney did a good job by her client, I'd hire her in a second.


The body cam footage is a good example, it's deeply hated by the police and a frequent source of incriminatory evidence against the wearer.

Since "you got nothing to hide", as the old saying goes, why not bodycam yourself and offer the authorities a great source of evidence they can use against yourself?


I think that’s right, it’s a double edged sword. You would have to ask if you’re more likely to be wrongly accused or to be caught doing something wrong by your own recordings.

This guy has been recording himself publicly since 2002 after ending up on a no fly list. https://en.m.wikipedia.org/wiki/Hasan_M._Elahi

I think that’s taking it too far and would rather encrypt than publish it publicly, but doing it publicly does strengthen the alibi


You're missing an option: Wrongly accused on the basis of your own recordings. Imagine this real life situation occured, but it was your own recordings instead of surveillance cameras

> A key piece of evidence in the case is video surveillance footage showing Williams’ car stopped on the 6300 block of South Stony Island Avenue at 11:46 p.m.—the time and location where police say they know Herring was shot.

> How did they know that’s where the shooting happened? Police said ShotSpotter, a surveillance system that uses hidden microphone sensors to detect the sound and location of gunshots, generated an alert for that time and place.

(The defense argued ShotSpotter makes up data and hides behind opaque AI they refuse to rigorously test. Instead of responding, the prosecution dropped the case.)

https://www.vice.com/en/article/qj8xbq/police-are-telling-sh...


If I follow, the police would need access to the recordings to make the case, which would mean at least probable cause for a warrant. If Herring had a camera or mic on him running at the time of the shooting wouldn’t that contradict shotspotter? It seems more likely any data you have would create doubt rather than bolster the police case.

In general, the shotspotter and surveillance cameras already exist, so what do you have to counteract that? Doing things like leaving no paper trail because you pay everything in cash, or no location data because you keep your phone in airplane mode, leaves little crumbs for your defense, and may create the appearance of hiding something.


> If I follow, the police would need access to the recordings to make the case, which would mean at least probable cause for a warrant.

PC is a very low bar.

> In general, the shotspotter and surveillance cameras already exist, so what do you have to counteract that

It's not binary. Lots of areas, including the inside of your house, probably aren't covered by surveillance cameras.


Having done so, it completely changes the dynamic.

Especially when you know the criminal code, and can ask questions such as ‘officer, I’m pretty sure they are currently committing felony <blah blah> against me. I don’t want them to go to jail, but I do want them to stop committing felonies against me.’

All the sudden, it goes from ‘nothing we can do’ to action.


Bodycams are actually an example of a reform successfully cooped by police bureaucracy.

https://www.mercurynews.com/2021/05/16/police-pr-video-machi...


The American Jurors have convicted people based on a man's interpretation of a dog signalling that a dead body was on someone's property 5 years ago. Once people are that gulliable, they are beyond help.

https://www.science.org/content/article/should-dog-s-sniff-b...


"A jury of your peers consists of 12 people who were not smart enough to get out of jury duty."


Keep in mind that they're in an artificial environment designed to lead them to that decision. One of the judge's jobs is to ensure experts are appropriately qualified. Another of the judge's jobs is to restrict what the jury is allowed to hear.


Why not just have the device occasionally send sha256 sums of chunks to a third party service, like Twitter tweets, where it's clear you can't forge the date of the message? If you need to produce some chunks, the matching hashes provide an independent time stamp showing you didn't just produce the content at any time. This sort of trick is already commonly done to demonstrate prior knowledge of something at a later point in time without having to reveal it just yet (if ever).


Great idea, but are courts tech savvy enough to accept this already?

Even with a tech expert to explain it, I worry the opposition would just get their own expert and make a whole mess of it, confusing both the judge and jury enough to cast doubt.

Perhaps I have a very wrong view on how both such evidence is presented and accepted though.


Courts are generally more tech savvy than techies like to give them credit for. But it's worth mentioning that in recent years several US states have already passed legislation expressly forbidding courts from denying such evidence (even/especially if put on Blockchains rather than a more traditional non-decentralized network/database), and you can find similar stuff in other countries around the world (even China). And of course in lower courts (like small claims) or even mediation the standards are a lot looser, there's not even a jury.

If you want a more thorough view of the rules in the US (which states deviate from to some extent), you might like to browse https://www.law.cornell.edu/rules/fre


You could queue it up to send chunks of reassuring alibi video (or checksums of same) while you are doing your murders or crimes.

Put another way, the possibility that you could have done that renders the timestamping useless.


f4d4cb1c06b690f5d6ee5d9012f743c330511effdb24e55b04e602f87d38d89e

I could have produced this hash at any point in the past, but after HN's edit window expires, one can be very certain I did not originally produce it at any point after the timestamp. This is far from useless, even if it doesn't cover all possible objections. The objection in the original comment that it does cover is: you just made up this footage/other evidence post-hoc to establish an alibi now that you've been accused, as people frequently do, or you can't prove you didn't make it up post-hoc so we want to get it tossed out as non-reliable.


I had an awful breakup with a woman with tortuous tendencies and a false sense of how to abuse personal injury law and she keeps google home devices and car devices recording practically every moment of her life. as well as apple watch broadcasting her location at all times.

I observed she would always accuse me of things i never did in front of these cameras to get me to give false confessions. thankfully i am as blissfully honest to all people i meet, sometimes to my own detriment, so i never admit to fabricated stories.

given that context, can you point me to judicial precedents where plaintiffs had their self provided footage weakend due to the idea of fabricating false narratives with devices, selectively natrowing contexts etc.


> can you point me to judicial precedents where plaintiffs had their self provided footage weakend due to the idea of fabricating false narratives with devices, selectively natrowing contexts etc.

I was part of an organization that had communal space we wanted to install a camera in because one of our members was leaving a mess and we wanted to know who.

I’m not a lawyer, nor a judge, nor an expert but I’ve hired a lawyer about the above cameras before installation.

First of all, there’s no “selectively narrow contexts”. If you can submit footage into evidence, the courts incl the other lawyers can request different moments in time if you have it (eg for surveillance cams).

Our lawyer advised us to never start/stop the camera, never delete footage or generally mess with it in any way without a paper trail of why. Any footage should auto delete after a period of time, and not be manually done ever. Basically any manual manipulation of the footage could be suspicious if a crime were to occur where people might suspect the footage to have captured it.

I assume if our lawyer warned us that much, there’s probably a case history of trying to plant a narrative with footage.


> Further it can be used to shine a light on corrupt officials

Little Brother surveillance. It would be nice not to be surveilled at all, but since that's not an option the answer to "quis custodiet ipsos custodes?" is us.


Little Brother is one of Cory Doctorow's books: https://craphound.com/littlebrother/about/

I read it and it was as fun as it was chilling.


In times past, it was obvious that it wasn't an option to avoid pervasive violence (by orders of magnitude compared to today).

It was, though.


Also it was obviois that ut wasnt an option to avoid pervasive slavery.

Or a class system where lords and kings have more right than you do, although we are kinda bringing that back.


As a kid in the 90's in a city of 150,000 people it was stupidly easy to do most of an entire day's adventures anonymously and with either very little or entirely no record of my presence and activities. Yes, there were some cameras here and there, but you knew where they were and could avoid them. Cash was still accepted everywhere, you didn't have a cell phone and definitely didn't need a cell phone to go about your daily activities[0].

I definitely was able to walk/bike/bus/carpool wherever I wanted all day long as young as 6 years old and no one gave a shit except my parents. If I was lost (which happened often!) I'd just ask an adult to help me find my parents. Adults generally interacted with me either only in a professional context (as a clerk/ice cream truck salesman/etc) or if I went up to them and explicitly let them know I wanted their help.

Not all my friends had the same level of freedom but most had enough to go to any of the parks in the nearby neighborhoods and play with friends until their individual family's established "curfew".

If kids stayed out past their curfew, one parent would call each of the other parents and the other parent would drive around the area parks looking for their kid. Another kid or parent would generally be able to point them in the right direction and clear it up within 30 minutes.

Occasionally my parents might forget to pick me up from sports practice and I'd sit outside the ice rink or school for 2-4 hours until they figured it out, usually in cases where I spent all my payphone quarters on food or arcade games.

0: https://en.wikipedia.org/wiki/The_Scoots


Who watches the watchmen? The watched.


How many Watchmans would a watchman watch if watchmen could watch Watchmans?


None, because justice is blind ;)


Good thing the executive and legislative branches are not blind.

They are keenly observant of the benefits coming their way.


It's the same show no matter matter how many times the watchman watches it, so one.


I was thinking of the small television that Sony produced: https://en.wikipedia.org/wiki/Sony_Watchman


a managed record of our own activities can be extremely valuable

I've thought of this as a hardware product: A device that records your own voice and non vocal sounds, but which does not record the words of others. (That, plus maybe location and a video stream, provided one is in a location without "a reasonable expectation of privacy.")

Perhaps it doesn't even have to be hardware at this point! Maybe this could be installed as an app on an older smartphone?


a throat mic would do the trick


Except, if it still faintly picks up the speech of others, then that's a violation of the law, and it opens one up for lawsuits.


Since everyone is interested in the hardware:

https://www.aliexpress.us/item/3256803349510543.html

https://www.aliexpress.us/item/3256803085687061.html

the particular choice was for the battery and the other for the size, both are generic and come with the same software and bios, several vendors, if I could buy something better I would look for one that can have a lavalier microphone


I wanted to do this exact project - record audio all day and then have AI process it - to identify behavior outburst of my autistic toddler.

It's critical information for early diagnosis and treatment, but it's really hard to capture the data while also dealing with the actual situation. Being able to send the sounds he makes to his therapist could also be usefull when then are trying to get him to mimic sounds and talk.

With that said, is the audio AI open sourced? The part that analyzes the audio stream?

Thanks for the links to the hardware, also a really important part!


I would guess that they're using OpenAI's Whisper, which is open source: https://github.com/openai/whisper

It does speech-to-text, then you can use the full force of all the text analysis tools that are out there.


I've thought about this a lot.

My 8 y.o is Autistic and when he was little, I was struggling to catch evidence to provide to Speech and Language Therapy. I wanted a way to always record and have an easy way to pull out the key points.

Now I would love to correlate background noise (level and context) with meltdowns. We know babies crying set him off, as that's obvious, but would love to analyse further to spot other trends.


this one will suit you well, even has a magnetic back so you can attach it to something, https://www.aliexpress.us/item/1005003535825295.html


that's a fantastic use case!, the easiest way (and the one i'm currently using) is by upload the audio manually on :

https://replicate.com/openai/whisper


Thank you for the links and for the article. How long can record the smaller one? Actually if it can record for a day, it'd be enough for me.

I used to record all phone calls, until EU made Xiaomi remove the feature. It was very useful because I always could take notes later if they sent me a number, contact name or appointment hour.


At 128kbps the MP3 takes about 56mb per hour, I got the 16gb, so you have a lot of time, the battery of the smaller one I read is 800 mAh , according to the docs should last around 2hrs, but I try to recharge it as soon as I can


Thank you, hmmm... you wrote 2hrs, I guess it's a typo. In the page it's 20 and more than enough for my use case. And even the 4GB is overkill if you make a daily dump.


my mistake, according to the description "Working Time: About 7hours on one Charge Can store up to 96 hours of audio", I haven't let the battery run out yet


The Ali Express link says "Continuous recording:20hours". But since they offer sizes from 4GB to 32GB it's unclear which storage size that's for. That 20 hours could also be how long the battery will last while recording. But 20 hours is still enough to last the day and then some.


It's probably the battery life. 128 kb/s AAC is effectively transparent even for music, and only translates to 1.1GB. Even if it's uncompressed (1 channel 16 bit 44100hz PCM), 20 hours only translates to 6.35 GB.


Super cool project! How do you carry the microphones on your person? The big one looks like it wouldn’t clip to a shirt very easily. Does it pick up your voice from your pocket?


thanks!, I try it on my shirt pocket but now I have it hanging from my neck with a badge rope as close to the mouth as I can


Yeah but I want the software! Will you open source it? I'd contribute!


I'm doing it simple for now, transcribe it by uploading the files to colab or replicate.com, then using regex to extract the commands, the panel is in rails but nothing fancy so far.

As I clarify in the article: This is a “proof of concept” and not yet ready for production, everything described here works but probably “glued with tape”, several of the processes are probably not automated or polished.


Not as hardcore as OP, but after Whisper came out, I quickly built an app that allows me to record from lock screen: https://whispermemos.com/


This app apparently sends data to their servers. If you don't want to share this information, you can use an app like Lockflow (https://apps.apple.com/us/app/lockflow-lock-screen-shortcuts...) to put an Apple Shortcut on your home screen.

That Apple Shortcut could be the Dictate Text action hooked to create/append to an Apple Note (thereby not leaving your device) or fire off an email or send a message via your favorite bot service (Discord/Telegram/Slack/etc).

Bonus: That Shortcut will also work on your Mac.

There's also the minimal friction app Just Press Record (https://apps.apple.com/us/app/just-press-record/id1033342465), which will transcribe and has a decent Shortcut library.


Yeah I tried JPR before but missed the workflow of sending it to my email. (Maybe they have it and I didn’t notice)

Also Whisper is better for my Slovakian accent.


That looks like it's iOS only; I've been using a similar app on Android, Voiceliner, but it doesn't yet also record from the lock screen. That would definitely make it more useful!


This is a cool project. One of my pet ideas that I haven't done is to build a home assistant where all data is stored and processed by a home "server". The biggest benefit I see is that it could truly be omnipresent. There in the background, answering questions, jumping into your conversations without prompt. And it's much less creepy if all that data isn't going to someone else's computer.

Also piping in and processing the data from my mobile would be cool, but I wouldn't want to invade other people's privacy if I'm in public.


SAID this before physical VPN and open source cloud is exactly what Im trying to make a reality VM to the TVs not to mention we all need that Hillary Clinton privilege AS WELL as when you pull in at home you car uploads. updates, charges (manuel plug for now) and tie that in to the obd reader software that alone could invaluable also having two devices is the best solution one staying the car gives me a internet connection to my car for remote access does my maps or music the other does my Hud


It does not sound like a realistic capacity plan. The reason this works in the cloud is the inference can be run in parallel on a huge amount of hardware for a short time. To run those kind of models on your rinkydink computer would take forever.


An Nvidia 3090 GPU can run open ai's whisper at 17x realtime[0]. they're not exactly cheap (~$500?), but they're cheap enough that running the transcription end at home is quite feasible. And, it includes translation, so you don't have to do it in English.

Searching all of a downloaded copy of Wikipedia wouldn't be that computationally expensive either if the assistant has hot words it picks up to look up.

[0] https://news.ycombinator.com/item?id=32928207


It's also possible to use local AI processing chips like Coral or Gyrfalcon for this.

Could just load up a pcie card full of them if necessary. A local home AI would be such a boon to people, not just the average person but the elderly as well, combined with a refined GPT etc it could conversationally respond to requests rather than most assistants' current request->response "I am a robot" scheme.

>Your son called when you were asleep to ask if you wanted to get coffee today, shall I call him back for you or put you through to him? >X, you've fallen! Please let me know you're okay or I will call emergency services for you

It's sad that we have the technology to do this already but haven't.


Respectfully, I don't think that's true. "The Cloud" is just computers in a warehouse somewhere

$5/month's worth of "cloud" is going to work out to be less actual raw CPU resources than a low end raspberry pi running full time in-house


I don't actually think they're true.

One second of Google cloud TPU has roughly the same number of floating point operations then 4 hours of raspberry pi 4B time.

So 3 minutes of cloud TPU time already covers your whole month of raspberry pi usage. Pretty sure it costs them less than 5$ as well, since they have the hardware anyways.


"The cloud" is also massively parallel software. If I run a Google search, many thousands of CPUs will be brought to bear on my query, and a gazillion DIMMs, and all the throughput of a hell of a lot of SSDs, and so on. If you just happened to have a copy of the web, and an index of it, on "a computer" no matter how big, it would be impossible to get prompt answers.

If Google (or whomever) needs to run voice models, they take your query and all the other queries that arrive in the same millisecond, smoosh them all together and shove the batch into a TPU and run it. You don't have any TPUs and you also don't have any traffic you can use to amortize the cost of your infrequent queries.

The idea that you could run these kinds of ML inference tasks is economically fanciful. You would need a huge investment in hardware and the opex would be ridiculous.


> The idea that you could run these kinds of ML inference tasks is economically fanciful. You would need a huge investment in hardware and the opex would be ridiculous.

Google, Apple, Amazon and even Sonos are all releasing voice assistants that work locally on their relatively low powered speakers.

Apple seems to be ahead with what is local, while Google seems to be the smartest. (Sonos doesn’t have a cloud, but it’s not ‘general purpose’ afaik).

Sure you can’t amortize them across a bunch of TPUs BUT instead they can ship custom hardware. A tpu needs to be big and support parallel streams. A home server may only need to ever serve one stream. There are arduino style devices that can perform basic tensor flow audio models in real time now. And obviously most phones can perform this locally now, so depending on opinion that may be considered affordable.


I don’t think a $5 instance is enough for ML/AI workloads. You need something with a GPU.


this is also one of my pet ideas, but I keep procrastinating. Have your idea transformed into any kind of repos that we can contribute to?


> One of my pet ideas that *I haven't done*

I suspect OP was clear enough.

But there exists https://mycroft.ai/

https://github.com/MycroftAI


MORE INFO ON THE DEVICES:

https://www.aliexpress.us/item/3256803349510543.html

https://www.aliexpress.us/item/3256803085687061.html

both recorders are using the same generic bios, you have a .txt file called FACTORY.TXT, by changing the values of the file you configure the device, this is the content of the file.

---------------

TYP:1 (0:WAV 1:MP3)

VOR:0 (0:voice-activated off 1-7:voice-activated sensitivity,higher means record less)

BIT RATE:2 (0:32Kbit 1:64Kbit 2:128Kbit 3:192Kbit 4:Translate ON 5:512Kbit 6:768Kbit 7:1024Kbit 8:1536Kbit 9:3072Kbit)

GAIN:5 (0-7 record sensitivity 8 grades)

SECTION:(30) (1-999 record time exceed this,file will auto save,uint minutes)

DATE:2022-10-15 (year-month-day)

TIME:08:36:24 (hour:minute:second)

TIMER:1 (timer record 1:on 0:off)

START:08:39:32 (timer record start time)

TIMELONG:(120) (1-720,timer record length,uint is minute)

CYCLE:(030) (1-999,how many dyas,0:everyday)

--------------------------

I got the 32gb version of the bigger one and the 16gb version of the smaller one.

I configure the device to save a file each 30m, each 30m mp3 file takes 28.125kb, so around 56mb per hour at 128kbps


Thanks for the post! I find that other voice assistants (eg. Siri) are not particularly able to detect the activation command when there's any background sound (like music with lyrics). How does your system perform against this?

I understand that you're doing batch processing asynchronously and so any immediate task isn't affected; but it's arguably even more of a problem where you record a task, put it out of your mind, but then the AI fails to detect the command because it got confused by the background?

[EDIT] I see you've sort of responded to this already at this comment: https://news.ycombinator.com/item?id=33612155


How do you get the files off the device? Do you have to manually take out the SD card, put it into your computer, and copy the files over, every single day? I'd never be able to keep up a habit like that consistently, so I'm wondering if you found a more convenient way to transfer the data.


both recorders work as USB drives, once a day upload all your files, but is just drag & drop


Interesting work, glad to see I am not the only crazy one left in the life logging scene after all these years. Have been lifelogging since 2004-ish, and built a few custom bits of software and hardware to support it. I don't record 24x7 anymore, but I used to. Now my recordings are limited mostly to my office environment, and when I am out and about using a Sensecam-like device with custom firmware. When in my office I capture video, audio and depth data from multiple view points, along with images of the desktop of whatever computer I am on, and process most of it on a Jetson.

How's the audio quality on those devices you link to in other comments? I find I pick up a lot of ambient noise when outside of the office, and always struggled to come up with a viable algorithm and model to differentiate "background chatter" from the main conversation, and it is a problem I've never really managed to solve so I am interested in your experiences on the subject.


> Have been lifelogging since 2004-ish

Hopefully new advances in AI will let you try new things with your old recordings

> How's the audio quality on those devices you link to in other comments?

Decent, quality is directly proportional to the distance between the microphone and the mouth, but can't expect too much from 30$ devices.

>and always struggled to come up with a viable algorithm and model to differentiate "background chatter" from the main conversation

Yes, that's a big problem to solve, you can try Pyannote's Diarization https://lablab.ai/t/whisper-transcription-and-speaker-identi...

that will be a next step for the experience


Do you mind sharing your experience, why you started, what you want to get out of this etc? I'm interested to read your experience.


Have you seen the show "My Strange Addiction"?



Yep that one.


Any specific episode is relevant?


Not interested.


This is really interesting and many of the comments here go into the utility of this, however, verify that you aren't recording somebody else without their consent, in many places it is illegal to record to conversations without the other party's consent.

I only ran across this problem years ago when, due to a serious potential workplace issue I suggested somebody basically "wear a wire" and record their workday to catch some HR problems. We found out that the state this was occurring in had a two-party consent law and violating it was not a great idea.


This "be careful of recording" line is harmful, because folks tend to read it and assume that it is universal - there are places where it is unilaterally allowed if you have good reason to suspect not having things recorded will result in your rights being abused. Then there are places that just don't care if you do.


I don't know of any juridiction where it is illegal to record someone else.

What is usually illegal is broadcasting or making that record available to someone else.


There are many examples, for example in the state of Pennsylvania, all parties must consent. While it is a felony to record and then playback without consent (you are correct in this case), it is a punishable misdemeanor to violate privacy (broadcasting is not required) with a fine of up to $5000 in the first violation and 2 years of prison and/or up to a $5000 fine after the first violation.


Does this mean it's illegal to have an Alexa in your cubicle?


No because the terms of services and privacy policy says that you accept being recorded if you use alexa.


I think it does, but I believe that we are still waiting for someone to test that theory in court.


Expanding on the structure the OP created, this is how I see us getting to human level AI:

1. Record video sound etc... (trajectories) egocentrically

2. Analyze the data and assign reward labels (more/good, less/bad) to state and transitions actions

3. Use the reward feedback and trajectories to build the policy for some set of actions in certain environments

This is why I'm bullish on anything sousveillance - so AR cameras on your head, always on mics etc...

The challenge is doing this democratically, without it being intermediated by a giant for-profit mega corp that doesn't care about you and wants to mess with your head


Honest question,how does this make the lives of humans better?


Well, for example. Lets say that I have a goal BMI I want to maintain

If I reach for the Oreos, I can choose to have a flag set with a heuritic I created myself that will tell me:

"Having 5 oreos means you need to reduce other calorie intake by n calories to maintain your BMI"

That data can also be aggregated to give me my macro/micros for everything I've eaten etc... without me having to log it like I do now

Think about it as the ultimate personal assistant and all you need to do to instrument it is attaching a camera and mic to your face. You can decide what your goals are, and this kind of instrumentation will capture the data that you need without you having to actually annotate everything.

Your personal life API


> all you need to do to instrument it is attaching a camera and mic to your face

It's funny that this is a reasonable thing to say.


Likewise, if we go back to 1995, and tell a tech-fearing farmer that within 20 years he and all his salt-of-the-earth colleagues would soon be voluntarily (and gladly!) carrying in their back pocket (in the form of a cell phone) a small cheap generic device connected 24/7 to global corporate networks, with built-in high def cameras, microphones, location detectors, and data gatherers, and would casually store much of their personal and financial information within them.

They would find that notion preposterous. But now, some short years later, they would give it barely a thought.


It really is.

I've been in CV since 2009 and it's face melting how many things we thought were impossible are effectively "solved."


I've been thinking about doing this for a while now, cameras all over the house hooked up to ML algorithms that help you audit and tweak your behavior towards some specified goal.

When I used to play video games ultra competitively, I would analyze recordings of my gameplay to try and get better, and it worked wonders.


I've honestly thought about recording my work sessions to do the same thing. RescueTime works _somewhat_ to track when I get distracted on something, but moreso I'm interested in identifying when I do something suboptimally and playing it back to identify why I went with that path and try to course-correct the next time around.


Recording the desktop 24/7 and making those videos searchable could be an incredible tool as well. Text on screen as well as audio from meetings. If you didn't document how you initially configured something, you could just go back and watch what you did.

Edit: Another comment has informed me of rewind.ai, which does this on Mac, interesting!


I think the oreo thing is probably doable now with things like habitaware. It’s most likely a very easily distinguishable motor pattern. Not sure they could be programmed to give you an oreo-specific-reminder, but that’s a design gap more than it is a technological one.


I'd just get really annoyed at that AI

...being annoyed increases stress, which increases appetite


Ok, then have it do something else?

The point here is, you could have it do anything you choose

...or just don't all together in which case, why comment?


Sounds like a nightmare!


Perhaps the BMI/Food tracking example isn't one that resonates with you

Can you explain a bit more about what part of having a non-intermediated "personal API" (or whatever you'd call it) is nightmarish?


Another challenge is industry-wide bad security fundamentals.

Godspeed to work/people like agoric.com and seL4.


All of the rooms/corridors in my house except my bathrooms are covered by cameras. My initial motivation for installing them was to keep an eye on what my pets were doing when I'm not around, but I find in recent years that if I misplace something, I end up tracing back my history on the cameras and finding where I left it.

It seems obvious that at some point, AI will be able to do that for me and I'll just be able to say "Alexa, where did I leave my glasses?", "Hey Google, where did I put my box of spare fuses?".


I would 100% prefer to lose my keys rather than letting Amazon or Google in.

FYI: I have zero Alexa/Siri enabled device, zero automated home device, a degoogled phone, etc etc. So we might have different perspectives on the matter.


Each to their own. Personally the value these cloud/AI assistants give me is worth the loss of privacy. There's nothing I do that I think anyone would be especially interested in spying on, other than to try and sell me things.

Note that I don't think anyone should be forced into this sort of surveilance. It should always be a choice. I also support the open source projects to bring it back to individual control - it's just too much hassle for me, personally.


There is no reason this technology needs to rely on consumers sacrificing privacy. The big players are trying to create that perception in the public so consumers will willingly sacrifice their privacy regardless.

The tech is there so someone could make a box with no external data transferred that could store and analyze video data. I would be a customer for sure for something that had this capability without the privacy concerns.

Google and Amazon say they want this data for quality control, but I suspect each of them have plans (if not active projects) for converting video inside people's homes into actionable marketing data.


> but I suspect each of them have plans (if not active projects) for converting video inside people's homes into actionable marketing data.

I suspect not. Besides the fact that it’s a whole new level of creepy and that alone is a PR mess, I doubt it’s that useful. Sure a camera in your home sounds perfectly useful for marketing but whose camera is positioned like that. Mine is aimed at the entryway door. The best you can get from that is presence. I suspect that’s true for most peoples home.

Beyond the question of actually data quality, data processing would have to be very expensive. You couldn’t run those models locally (because the object detection would be too complex and changing) so you’d need to stream to cloud. That would instantly be the largest and most expensive streaming platform ever, dwarfing YouTube or Netflix or anything. Not to mention the actual ML components of it.

I suspect smarthome companies don’t want the data and begrudgingly accept that some cloud is needed because people are notoriously bad at protecting backups (and remote monitoring is a convenient feature).

I question if the incremental increase in marketing revenue would exceed the technical costs.


> the value these cloud/AI assistants give me is worth the loss of privacy

they've got you right where they want you.


He seems to have them right where we he wants them, too. Mutual transaction. Everybody's happy.


The benefits of the consumer measures in comfort or social status, mostly.

The benefits of the producer measures in dollars.

However you balance it, the producer wins. By many orders of magnitude.

And since were talking about privacy and personal data, the more consumers there are, the more the producers improve their margin on each and all consumers.


Why is it a competition? If you give me a slice of cake, and I give you $5, and we're both happy with that, why should I care if you're somehow "winning" or I'm "losing"? That mindset seems like a self-fulling prophecy that robs me of my satisfaction.

Also, I would argue that receiving money does not mean the producer wins, since ultimately the producer is also a consumer, and who will thus be spending those dollars on the same things as every other consumer… comfort and status, as you put it.


Vague snipes like this are generally not allowed on HN, FYI. (Source: I've done it myself too many times)


you're falsely characterizing my observation as a vague snipe.


Just trying to help.


Just because the data isn't interesting to anyone right now doesn't mean that a future oppressive government won't use it against you


> There's nothing I do that I think anyone would be especially interested in spying on, other than to try and sell me things.

Do Uyghurs have something to hide and are worth spying on? How many times are we going to hear this argument? It comes only from a position of privilege. You're only uninteresting to be spied on as long as it's allowed by the security apparatus you depend upon. There's a reason we have sayings like "power corrupts"; dismissing the potential for abuse of a cloud-based unencrypted surveillance system is narrow-mindedness at best and subversion at worst.

Note: the above hardly represents me politically, it is just a counterargument against the perennially repeated "I have nothing to hide."


I'm aware of all those arguments and I completely agree with them in principle, but I genuinely would be SO far down the oppression list.

It's definitely a privilege to be the majority ethnicity and sexuality in a modern western liberal democracy, but it is what it is. The chances of the British government suddenly turning against white straight apolitical irreligious men are just so low it's not something I worry about.

What I worry about more are things like people breaking into my house, my dog chewing up the carpet and forgetting where I left my glasses.

I do hope that we can figure out a way to package all the privacy violating cloud-based services in a way that's simple to use, encrypted, local only, etc. though so perhaps more subversive people can enjoy these systems without worrying about oppression.

To be quite honest, the most privacy sensitive things in my life are probably my emails and documents, but those are all already in Google Drive and Gmail anyway, along with basically everyone else's. All anyone will get from my cameras is a stream of me feeding my rabbits, browsing tiktok and scratching my arse. GCHQ are welcome to tune in any day, provided they also help me pick out my clothes in the morning.


i mean you could always run a home server for the automated home things. heating/ac and lights are nice things to automate


Because spending tens of thousands of dollars in home infrastructure to avoid fiddling with the thermostat four times a year definitely makes sense.


My heat is controlled and automated with open source software for the grand total of about fifty bucks and a free surplus server.


So DIY solutions exist that raise the cost to an education in computer programming and $50 in hardware...to avoid touching a thermostat four times a year. That's neat.


I don’t know how that can be true. Can you tell us more?


It's incredibly easy to do (caveat - at least if you're familiar with software dev already).

Most thermostats are literally just digital thermometers that control a relay that turns the furnace/ac on and off.

A simple arduino (or much cheaper IC) can easily do the same thing if you wire it in.

And then on the software side... there's several large, open-source projects that exist in this space and provide nice api tooling for interacting with those devices. Things like:

OpenHab: https://www.openhab.org/

HomeAssistant: https://www.home-assistant.io/

HomeBridge: https://homebridge.io/

etc...

Even Alexa has basically drop-in self hosted alternatives like Mycroft: https://mycroft.ai/ or ADA/Almomd (now Genie) https://genie.stanford.edu/

It's not only true - I strongly suspect you can do it for much less than 50 bucks if you don't need the physical thermostat to have buttons/screens.


Makes sense. My setup doesn't allow for that. Hence my ignorance. Good for you!


I'm considering making an OpenTherm controller for my heating boiler, I just researched this topic a few days ago - it's absolutely true, there are ready-made Arduino libraries for that.


I inherited a 1980's model AC/Furnace and controlling the AC at least is extremely simple and cheap. A 12V relay in the compressor housing activating the 220V switch, connected to another relay controlled by a Pi zero which is controlled by yet another PI zero with a $10 DHT 22. A bash script check the temp and activates the compressor via SSh when the temp goes above 74F. The furnace control hasn't died yet so I haven't bothered replacing it. Putting the cooling system on IoT total cost = ~ $100


What if you charged someone to build and install the same system in their house? You'd probably charge a lot more than $100, and that's what the real cost would be for most people.


Nobody has suggested professional installation though, the original suggestion was just a nice home automatic project to play with.


controlled heating for my flat with open source and Zigbee compatible devices would cost me ~1k. I did not calculate the ROI but break even looks like it’d take many years.


Curious: are you concerned about data leaks or you don't trust the employees to not access your user data? Or something else?


Not who you asked but: I’m afraid of the data being stored and available to anybody. As long as it’s out there, the government can compel others to give it to them; and companies can get acquired and structures and laws can change in such a way that the data gets in others hands perfectly legally.

Thus, I should only be okay with it if I’m okay with the “nothing to hide” argument, which I’m not.


Yes, yes, and I don't like anything home automation to be dependant on anything cloud. Enhancing function is fine but house that stops working right the moment internet link is down is a dystopia.


"Dystopia" seems like a stronger word than applies here.


Both and more. Having my data sold to 3rd parties is an obvious first. And if you think the terms of service are enough to cover you, see how fast they can change in everyday life and please reconsider. Plus, data can be sold pseudo-anonymously and build up a profile against which your identity is compared and metered, as in, for example, health insurance risks or crime potential.

Additionally, we, the consumers, have lost the right to own things. Or at least, if we do own things, it comes with all sorts of strings attached in the form of "features" or "connectivity". Which is just marketing lingo to say that you're feeding the cash cow.


Nah, is it much more efficient to distract you into loosing your box of fuses and manipulate you into buying new ones.

Alexa AI doesn't work for you, it's a hired gun in your house.


Just like 2001: A Space Odyssey - but instead of Hal trying to kill me, it just tries to get me to buy things I don't need.

"I'm afraid I can't do that, Dave, not until you watch this advert"


Wish they’d at least release a Christopher Walken package.

“I can…NOT find your ANswer.”


I agree that it is an obvious extension for AI to use this data at scale to help users. It also is obviously a huge temptation to abuse it for other purposes.

Wasting a few minutes in the morning to find my glasses is a small price to pay to not be watched and analyzed all the time. Let's not build our own panopticons.


When you do, can you invite me over to show me how it works? Then I'll test it for "Alexa, in which mattress does cameronh90 keep their savings?".


"Voice not recognized. Releasing the robotic attack dogs"


Like pretty much everyone in my country, I already entrust a bunch of private corporations to safeguard my wealth. Worse, there's nothing to stop my bank from suddenly deciding tomorrow that I don't have any money, and I don't really have any paper records to prove otherwise...

I figure any AI advanced enough to monitor everything I'm doing and where all my stuff is, is probably smart enough to know if it's me asking.


Just having a front door cam paid off immensely this year when I was able to prove that I had left the house with an item (that I later misplaced, and was able to recover with that knowledge.)


Excellent idea. You can later search through your logs in the future for reference. As it's all in text.

Prior solutions posted on the net, had this take photo / record audio 24/7 features, but then those were stuck there. What next? What would anyone do with these data?

But this Hi Jarvis styled recording of text on the go is a very useful feature.

Another step ahead.


I've wanted to do the same thing with my online activity as well. Chat logs especially. They tend to go into a void and finding an older log is weirdly difficult. I've wanted to log everything and then be able to apply better search algos (semantic search perhaps) to try and make my chat logs useful.


Cellphones are placed amazingly well to provide this sort of search. Seeing the post about BeOS and its amazing metadata-driven BFS filesystem yesterday really makes you think what might have been had iOS and Android been more ambitious about filesystems instead of just re-applying the same old conventions from our desktop computers.

You should be able to just text search every phone call you have made on iOS/Android, today, similar to the automated voicemail transcription features already present etc etc.


I think the "total recall" search can be a killer feature


I remember an Asimov short story in which scientists developed a machine that could see backward in time.

If I recall correctly, the upshot was the government became terrified because any machine that can see 1000 years into the past can also see 1000 milliseconds into the past and therefore functionally be used to spy on anyone in real time.


There was an article some years ago (2 or 3?), that described a drone (or drones?) that flew 24/7 over Mexico city taking high resolution video of the entire city at all times.

Whenever there was a crime, the police could zoom into that location at the time of the crime and then run backwards to see where the vehicles came from. They then knocked on that door.

I'm disappointed that I can't seem to find it using Google anymore, maybe it was from a movie or TV show?! That would be weird though, because it seems technically quite reasonable to achieve and hard to believe governments wouldn't jump on it.


The term for this is 'WAMI' - Wide Area Motion Imagery[1]. Here's a Bloomberg article about an instance of it in Baltimore[2] (although this wasn't where I learned about it first, like you I can't find my original source either)

1: https://en.wikipedia.org/wiki/Wide-area_motion_imagery

2: https://www.bloomberg.com/features/2016-baltimore-secret-sur...


I remember the same article, although I can't find it now.

These two seem to reference the same demo, although neither are the article I remember: https://www.bloomberg.com/news/articles/2016-08-23/watch-thi...

There's this reference to it: https://www.pressreader.com/usa/the-washington-post/20140206...

https://www.bloomberg.com/features/2016-baltimore-secret-sur... has a lot more details


I've read about that happening in Cleveland, using tech developed to find insurgents leaving IEDs in Afghanistan. Yeah, citation needed...


So glad someone else saw this, I'm not finding anything on it and I'm starting to question my own memory, as I'm quite sure I saw the original article about the Mexico program on this site.

FWIW, I also recall the tech being originally used to find people who planted IEDs in Afghanistan.

I'm kind of shocked about how all the articles I am finding seem to emphasize real-time police chases.

Now I'm feeling super suspicious.


I first heard about this on Radiolab. Maybe you heard it there too?

>> In 2004, when casualties in Iraq were rising due to roadside bombs, Ross McNutt and his team came up with an idea. With a small plane and a 44 mega-pixel camera, they figured out how to watch an entire city all at once, all day long. Whenever a bomb detonated, they could zoom onto that spot and then, because this eye in the sky had been there all along, they could scroll back in time and see - literally see - who planted it.

https://radiolab.org/episodes/eye-sky


I think you found it. Now I recall that episode exactly.

Apparently, my mind created some very visual memories from the narrative.

Thanks!


Well, a bit more googling ( https://www.google.com/search?q=police+drone+afghanistan+rew... ) got me just 2 relevant hits.

https://www.theatlantic.com/national/archive/2014/04/sheriff...

https://scholarship.law.uc.edu/cgi/viewcontent.cgi?article=1... (search for "rewind")

I'd rather think it's because Google sucks now, and those keywords just bring up too many similar articles, but my metaphorical tinfoil hat is my hands.


Nice job!

Your tips got me to this one, where it more clearly spells out the "rewind" capability. I think the problem was that the tech was attached to a low-flying, piloted plane, not drones.

https://www.csoonline.com/article/2226742/record-and-rewind-...

Whew! It feels better to set my tinfoil hat down on the table next to me...


There was a website shown on HN a few years ago that used AI and plane transponder data to find circling planes which were presumably doing this kind of surveillance over American cities. It might have used further parameters to narrow it down, e.g. “over a city, circling for >3 hours” to rule out planes waiting to land. I thought it was named something simple like “plane-circles.com” but I’m not having any luck finding it again.

See also https://en.m.wikipedia.org/wiki/ARGUS-IS

Edit: found it. Should have limited the search to HN from the start. https://news.ycombinator.com/item?id=24188661


There have been a few products that record everything you see on the web, so you don't have this problem. Obviously analogous to recording everything you hear.

https://www.searchenginejournal.com/all-about-seruku-search-...



Not sure about the Mexico City drone, but a similar thing was developed by the US military: https://en.wikipedia.org/wiki/Gorgon_Stare

I know some folks who deployed during OEF/OIF and used these types of systems. Many a night raids were conducted simply by watching where attackers originated from.


Different author, but sounds somewhat similar to 'The Light of Other Days' by Arthur C Clarke and Stephen Baxter.

Although iirc correctly it starts with being able to see other locations in space but at the same time, and the historical viewing is a second development.

Fantastic book, even if it's not the same one you were thinking of.


Pretty sure both those authors wrote similar concepts, with the same creepy conclusions of taking the technology to a limit.

It came up in an acoustics class once. I said that sound never really dies. It just bounces around until it becomes thermal energy, thus warming the room a little as a prelude to joking about professors talking hot air.

A student asked whether, one could recover sound from reverberations that had fallen below RT60? Could you listen back in time to conversations that had happened hours ago?

Obviously entropy can't be put back in the box with the technology we have now, but it makes you wonder.

Two things have since made me revise the question. One is recovery of sound from video images. The other was an archaeological recovery of sounds from a ceramic vase spun on potters wheel many centuries ago. Sorry but the references for both escape me atm.


The pottery record thing was tested on mythbusters and hailed from an episode of csi.


Fake? Got a link so I can dig in a bit. Thanks.

EDIT: found this thread

https://groups.google.com/g/sci.archaeology.moderated/c/5Jec...

Damnit, seemed so plausible.


Clark also used it as a throwaway line in Childhood's End. IIRC, humans were given a device that would allow them to see the past--most religions didn't survive seeing the true origins of their faith.


It was The Dead Past [1]

The idea of it was that it was known that the technology existed, but the government went to great lengths to imply that it could only see into the far distant past. The reality was it could only see 20 years back or so, and the government was covering it up because of the 1000 milllisecond issue.

[1] https://en.wikipedia.org/wiki/The_Dead_Past


Yes, that was it! Nice find :)


I wonder if this was an inspiration to the "Devs" miniseries. Won't say more about it for fear of ruining it. Amazing show.


Sounds very similar to the guy talked about in Albert-László Barabási's book (either Bursts, or Linked ... don't recall which atm) - he was photoing/videoing his whole life, but never of himself - ie, the camera was always facing outward (like a policeman's bodycam)


The entire topic and many posts in this comment page also sound like things straight out of The Circle and The Every by Dave Eggers.


I had forgotten about The Circle :)


I did an experiment where I lived for awhile with a sony recorder/mic on me 24/7. It was nice to be able to refer back to conversations and events when I wanted them. Biggest issue was sorting through the data-- timestamps and recorder bookmarks were OK but I really needed full text search on the audio. It would have been great to tag via `Robert, mark timestamp, end Robert`. AI seems to be required, especially when dealing with wind noise and other issues (like the mic twisting around and all of a sudden one channel is my heartbeat.)

The sony voice recorder out there easily last 24 hrs on 1 AAA battery.. dumping to mp3 on a large sd card.


I did a similar experiment in about 2005 using a small iRiver iFP [1] and reached the same conclusion.

It needed a physical "Something interesting just happened" button that could be annotated later. At the time, creating custom hardware as well as the entire software/service stack was more than I was willing to bite off.

The iFP is tiny, roughly a 4" long by 1.5-2" cylinder. It easily covered a full day, the silence detection worked great, and quality was fine when used in a pocket or on a belt. Basically, the stuff that I expected to be difficult was already solved.

[1]: https://en.wikipedia.org/wiki/IRiver_iFP_series, https://www.cnet.com/reviews/iriver-ifp-790-digital-player-r...


Excellent. Just terrific.

My future perfect system also logs my location and what I'm doing. And probably health metrics too, like heart and breathing rate.

Instead of initiating my exercises, I just want to say "Robert, start jog". The "modal" nature of my Apple Watch's Activities really frustrates me.

I don't want to take notes while I'm listening to a podcast. I'm generally doing something else at the time. I just want to say "Robert, bookmark". And magically a link will be made to whatever I'm listening to at the time. (Audio book, radio, stream, podcast, whatever.)

Ditto identifying songs (Shazam!).

I don't want to fart around with exchanging contact information. My hands are usually full or whatever. Just say "Robert, contact info" and then repeat out loud whatever I hear.

I also want to rewind after the fact. When trying to recall a tidbit, I'll remember the song, where I was (eg while walking the dog), who I was with, what I was eating. So if I want to remember which podcast I was listening to while at the park, I'd just start with my location log and jump over to my podcast listening log.

What could be more simple?

FWIW, I'm still waiting for my "bicycle for the mind".

PS- I've tried, half-heartedly, to use the voice recorder app, and notes with voice transcription. But then it quickly becomes a treasure hunt. And my attempts to do this stuff with Siri just leaves me more frustrated.

Thanks for listening.

Great project. Please keep us posted on updates.


thanks!, you should try to transcribe your recordings now for free with whisper and see what you can make of them: https://replicate.com/openai/whisper


I've been experimenting with this recently as well, but with an app on my apple watch. Looking for a method/model to split different speakers into different tracks to only look at audio from myself and certain people.


Someone is experimenting with diarization (speaker identification) + Whisper here https://github.com/openai/whisper/discussions/264



Ahh I’m working on exact same project. I applied to YC with the idea and was told that “nobody wants this” during the interview.

There’s a ton of problems in the space around privacy and UX. But I’m incredibly excited about projects in this space because in modern society we’re basically surrounded by a million unhealthy things designed to tempt us. Logging forces you to “stay honest”. I’ve been shocked already by how many unhealthy habits I underestimated and how many healthy habits I overestimated.

My #1 priority is just to improve my own physical and mental health. Whether there’s a market for this stuff, who knows.

Good luck!


My original inspiration is to better understand how I talk to others and study my own behavior


A noble goal. One of my bad habits I've been tracking and trying to reduce is rude behavior to people, online or in-person.


Check out this model, I've had limited success with it. Best I've done so far is to just add the labels it gives to the overlapping segments whisper spits out, which means some sentences have multiple speakers, but that's mostly the case because of cross-talk. I'd say it gets it right ~80% of the time with the 5 speakers I've done it on across ~16 hours of audio.

https://huggingface.co/pyannote/speaker-diarization


I will!


Speaker identification is the next step, you might want to read about Pyannote's Diarization:

https://lablab.ai/t/whisper-transcription-and-speaker-identi...


we're experimenting building out a version of this too, but on desktop with www.usebacktrack.com - should have splitting speakers/inputs early next year and seeing what that's like


what app are you using on the apple watch?


Here's a 24/7 background audio recorder app I made for Android. The impact on battery and storage is surprisingly reasonable.

https://github.com/miguelrochefort/eardrum


I like this. It vibes with a language learning app concept idea I recently shared out loud.

https://twitter.com/kuizinas/status/1591867392220594183


I've been doing this with Anki.

When I have a conversation with someone in a language that I'm learning (was Russian and Greek, now Arabic) I record the conversation. I then get both native-speaker audio to add to Anki for the things they said, plus I get a list of words that I either needed to use or that the other person used, to add to Anki.

A secondary benefit is that this system encourages me to go out and seek interactions with people, a clear benefit for a natural introvert.


Well done!

Got a similar PoC that uses Tasker to record sound on my phone, Whisper to convert it to text, and neatly organizes everything into Obsidian.md. The continuous recording kills the battery life on my phone so it's only usable if you don't mind going around with a powerbank. Would be great if a manufacturer would put in a separate low-energy chip with a good ADC.

P.S. "Active functions" with custom home automation is easy as pie with joaoapps's suite. I use BusyBox to SSH into a Pi with a Tellstick Duo. And some RFID tags for the system to know where I am (e.g. bedtime routine gets triggered when I place my phone on the bedside table). But yeah...traffic goes thru Google.


you should write about it!


How would this work with other voices, like a coffee shop, would it hear those simultaneously, and interupt a command?

Also, how do you handle using OpenAi whisper, seems like they do 30 second intervals - would that be an issue if your command is cut off mid word?


For now I try to give the commands when there is not much noise, but you can lower the gain of the microphone so that it only record my voice.

The 30 second limit is not a Whisper model limit, but a limit some of the free online "try whisper" put.


I think he means that even whisper segments the audio into 30 second bits and does transcribing on them and then stiches everything together.


The future will definitely have devices which record visually/verbally all your life. VR headsets are already able to record all your facial expressions. A google glasses like gear which records all your life is pretty much possible in the near future. The future influencers won't have to carry a phone/camera to create vlogs, they would just see wherever they want and the glasses will record not only the thing they are seeing but also their expressions. Privacy will probably not be such a big thing as now given most people with each generation are increasingly becoming more and more comfortable sharing their whole lives online.


Ted Chiang explores this idea in a short story called "The Truth of Fact, The Truth of Feeling" (https://devonzuegel.com/post/the-truth-of-fact-the-truth-of-...), which takes place in a world where commercial, individual, always-on recording exists. Ted Chiang also wrote the short story that the movie Arrival was based on.


Halperin's debut novel _The Truth Machine_ also has people living "documented lives", with always-on video and audio recording, as a primary plot element.


As does the Neanderthal parallax by Robert J. Sawyer [0]. It describes an advanced civilization of Neanderthals who have taken completely different societal choices. IIRC, each neanderthal has a recording device that constantly films and uploads a 3D feed of their surroundings. In case of murder, the court can access this recording to determine find the killer.

[0] https://www.goodreads.com/book/show/264946.Hominids


> Privacy will probably not be such a big thing as now given most people with each generation are increasingly becoming more and more comfortable sharing their whole lives online

That's not quite right. People are just unaware of the power of what they share and usually react quite negatively if their data is used against them and start shielding themselves from surveillance of the sort that might affect them.

So predicting "people won't care about privacy because they share their whole lives online more and more" is a bit disingenuous.

It's what the tech companies like Facebook want but it's far from the truth.


Reminds me of the Black Mirror episode “The Entire History of You.” Could be pretty scary if misused.


I've thought for 20 years that the Life Recorder is inevitable. I figured it would be like journaling constantly, getting insight and guidance to improve.

Now I think it will result in unbearable self-consciousness. You will yearn to be offline, quiet, to just forget, maybe to enjoy the moment without it going in your permanent history.

Arguments in relationships are messed up when you can rewind and debate what he said she said. The actual words are often not important, it's the emotions. The permanent record makes it harder to forgive and move on. It's like being in court, everything transcribed.

> given most people with each generation are increasingly becoming more and more comfortable sharing their whole lives online.

There is an entire generation who learned not to post, many who are very anti social media, many who stay anonymous. Chat is much bigger than public social these days.

There are also scenes that avoid digital. They make cassette music and black and white photocopy artwork.


Arguments in relationships was one of the first “off-label” circumstances that came to mind when I saw this post.

Thinking about it a bit past the initial reaction though, I think it’d actually be a massive boon for relationships. Sometimes we get so hung-up on the wording, that may or not may not have been clearly expressive, but due to some slight subtlety, changed the course of the interaction from a potentially positive one to a toxically negative one. I think if you can just subvert the “you did say that! You don’t remember??/No I didn’t/Yes you did” you can focus on the actual content of the conflict resolution.

I’d personally want to know if I said X-potentially-hurtful-thing. Or was the other person just hearing it? What’s workable, what’s not workable. Post-mortems for arguments (assuming the relationship is a viable one, ie genuinely collaborative & not contingent on point-keeping) would be a lot easier & constructive conversations can be realized much more quickly.


I'm not sure people are increasingly willing to share their whole actual lives online, they are mostly presenting a curated and tailored image of themselves to project the self that they want the world to see. 24/7 straight-to-internet type recording doesn't serve that goal very well. I'm not sure that something like this would be popular among streamers, other than maybe a small niche for who that is their whole thing.


I predict future generations (maybe even those being born right now) to start moving away from recording and uploading everything as it's the uncool thing their parents are doing.


Current gen Tiktokers ran away from facebook and instagram, not from oversharing on social media. I think teens will always be very active on some kind of social media, but yes one can certainly hope that all of it is seen in some negative light by the coming generations.


After reading The Circle, I'll definitely pass on sharing my whole live online.


Awesome idea. However people would find it weird that I talk to myself all day long.


Yeah, I can see how this would be easier if you work from home, but you could explain that you are running a long lived experiment on yourself. Then again, that doesn't exactly scream "I am fully sane" :)


Well, genius borders madness, and it's been said that talking to oneself is a sign of intelligence, so you may be right.


A case for neuralink.


This is awesome! I've been recording myself (video/audio) for the last couple years on and off (thousands of hours) and have no efficient way of processing the info. Was not aware of Whisper and what he's done is exactly what I'm looking to pull off.

The GPT-3 idea is scary and most certainly the future. I can't stand the world of never ending 'Moviefone' menus and chatbots, but when it's me that gets to be the machine response the future doesn't seem so annoying. Would be nice to have my own GPT-3 model that I can use to "get to a real person" when calling places.


The passive information would be useful if that would work with your inner voice.


I read a research paper quite some time ago that most inner monologue is articulated at least partially by the mouth, throat, & tongue, and can be detected. Might be a better approach than having a chip in my brain.

If I find the article I will update.


Anecdotally, my inner monologue certainly is physically articulated. It would be really interesting to build tech that can understand it.


what's fascinating is apparently there's a connection with schizophrenia, where the "voices" are actually the person saying things to themselves at a level too low for other people to hear

also suggests a throat mic would do wonders


Love this idea. Most people subvocalize when they read, and my guess is the same for when they write. As for subvocalizing when we think, I have no idea, but I suppose if one were engaged in “talking to oneself” type thinking, that it could be possible that many subvocalize their internal monologues.

Is there some kind of sensor that one could wear that would be unobtrusive enough to measure subvocalization? If so, building a training dataset for an ML model would be as simple as having many people read and write a significant amount of text while using the sensor.

Even better if that text corpus overlaps with text that has been used to train text-to-speech models like Whisper, as you might get away with knowledge transfer with such a model.

It’s definitely worth looking into!


With feedback, it is likely possible for you to train yourself to subvocalize your thoughts as an act of recording. I thought this sentence before writing it, but I subvvocalized while typing.


That would be creepy and next level.

Isn't that what Musk's Neurolink is trying to achieve?


That technology doesn't exist yet.


I have recording turned on on my phone. Usually it records 6 hours at a time, so it is annoying that I have to manually restart recording. Another annoying thing is that it will pause the recording when I pick up a call.

Why Google decided to block call recording is beyond me. In the past when I was able to record calls it saved me a lot of trouble - for instance when insurance company lied to me over the phone about the product I could confront them about it and get my money back. I wish I could be able to record calls with my relatives as well. Call recording is legal in my country.


So the author is recording all their interactions with everyone they meet and then processing and analysing those intetactions. How is this not a massive invasion of privacy?

I see he is concerned about his own, but he doesn't seem to be at all concerned with anyone elses. Personally if i discovered that someone I interacted with was doing this I would insist that they deleted any data concerning myself they had gathered and pursue whatever means were available to me if they refused.


I was pondering this as I read, as well. I would be a bit disconcerted to find that a coworker was recording EVERYTHING they say all day and that meant I was on those recordings throughout the week. The lack of any acknowledgement of that piece in this writeup is a little confusing... I don't question their intent but I do question their lack of awareness of others (based on the writing).


I am using it only with friends and family around, and tell them beforehand about the experiment


Very interesting idea.

Would you be willing to share more info on the tech used in the process?

>I bought a couple of Chinese microphones

Which exact microphones? How long does their battery last?

As well as other parts of the process.


https://www.aliexpress.us/item/3256803349510543.html

https://www.aliexpress.us/item/3256803085687061.html

the particular choice was for the battery and the other for the size, both are generic and come with the same software and bios, several vendors, if I could buy something better I would look for one that can have a lavalier microphone


Not the OP but I've been tinkering with the same concept (24/7 processing).

'm using vosk browser: https://github.com/ccoreilly/vosk-browser

To do speech to text locally and it works very well for English.


Browser models are too small, unlikely they recognize accurately. They are more for simple predefined phrase.

You can probably try vosk-api on the desktop-grade machine. You need big models from https://alphacephei.com/vosk/models, they require like 8Gb to run but they are much more accurate.


What a bonanza for opposing counsel of any kind.

(which is a bummer, as there are lots of interesting uses for digitizing our lives if the data could be guaranteed to remain private)


An off the shelf solution for recording your whole life:

I have a Sony recorder, ICD-UX570, and it has a setting where it turns on or off based on sound, and also adjusts the gain to best record. It takes a micro-SD card and has pretty solid battery.

I think you could put it in a breast pocket and run it for several days on a single charge. Because it would just record when you are talking or making noises you could likely run it for a year on a big SD card recording in mp3.

Change to a wifi SD card and suck the files off and process them and you might have something kind of cool.


Would the sound of jostling in your pocket/bag not set off the recording?


I'm not sure about jostling, but lots of noises would set it off. It's erring on the side of recording when there is sound, and it can't tell speak from not speach, though you can set a sensitivity level.


That makes sense, thank you. If it’s not too personal, may I ask how you use your recorder? Aesthetically I have long wanted to have my own discrete voice recorder, and that ain’t if yours looks amazing, but I’m not sure I need one for anything.


I was in a course for getting better at speaking and part of it involved spending 10 minutes each morning speaking your thoughts outloud and recording them.

I could have done this with my phone, but I like the single use device nature of the recorder and its nice that's its super small.


A related question, is there a ready solution to do constant recording using some Linux box (e.g. Raspberry PI)?

AFAIR I've seen someone recommended such software on HN but I can't find it right now, it was something for recording radio stations or similar.

I would like to get some kind of sound monitoring of my house when I'm away or sleeping and besides using arecord I couldn't find anything useful.


Even better is something like the Guardian Project's Haven(https://github.com/guardianproject/haven) which IIRC Snowden contributed to.

It's incredibly cost-effective to just buy an old Android phone (which comes integrated with multiple microphones, with good signal processing and noise cancellation), instead of building it with components.

Haven is specifically designed for intrusion detection, and for preventing people from tampering with your laptop for instance by detecting activity on the Android phone's sensors.


Interesting concept, but the issue is that one would need to constantly charge the phone and batteries have a tendency to go bad after some time with such usage.


Seems like this application would justify strapping the phone to a large lead acid battery.


An old laptop will work better.

Raspberry pis don't have Audio in. You need USB microphone and drivers, which are hit or miss.


I was playing with this idea of recording everything using a 1st gen AIY voice kit. The driver is a linux kernel builtin module. The recording quality is good if I'm in the same room. But I didn't find the recording that useful, so I stopped doing that.


Well yeah, but I wonder what software to use for constant recording that could be worked on outside of the recording process.


As you'd be recording all of your conversations, this is illegal under some legislations, unless all your conversation partners agree with being recorded/their convos being stored.


In some countries it's enough for one party to consent, which can be you, so it's legal.


In the US it even varies by state, which has gotten people in trouble with recorded phone calls.


I am using it with friends and family, and tell them beforehand about the experiment


What about going out to a restaurant, a party, a date, or even somebody you bump into on the street? You're recording a lot of people, which to me is a huge invasion of privacy. What do you do if someone is uncomfortable with being recorded?


I mean, at a practical level, you don't tell them and ignore it. The fact that it is illegal to record people doesn't mean it's against your personal ethics.


Same question applies for any photograph you take in public. Isn't the fact that the people are in public material to whether or not they can be recorded? That's different than calling them on their phone and recording that conversation.

I mentioned it in another reply but this is discussed in The Every by Dave Eggers, there is essentially an entire area of San Francisco that is deliberately kept free of any and all devices that can record A/V and in order to enter you have to deposit your phone and be wanded to detect surreptitious recording devices.


In New Hampshire and a few other states it is illegal to record audio from other people without permission. I know because I was part of a team building a social network (CenceMe) that shared sensor data from phones so people could see what others were doing. We wanted to use machine learning on audio and aside from the fact that it would have greatly reduced phone battery life we found it was illegal in the state we were doing experiments in.


Is it illegal to record it, or is it simply inadmissible in court?


Interesting article. Thanks for posting this. I think when wearing something like google glass and recording everything the potential is even bigger. The AI can extract so much more context. Analyse faces, gestures, locations and more. Dystopian and yet so interesting.


This sounds horrible. Nobody should freely record and analyze others without consent. This is not only dystopian but also very rude and possibly illegal.

As far as recording oneself to capture thoughts, processes, this is a fine idea that I'd like to give it a try.


Are you planning to combine it with other info? Your smartwatch already knows how long you've slept, getting that info directly to your database seems more efficient and less error prone. The same goes for the amount of money you've spend, if your bank allows you to export that info it'll save you a step. Your bank doesn't know what you've bought, only the total cost, time and shop, but if you scan and upload your receipts and use ocr you'll have a detailed record of that too.

And you could also keep track of your location, so you know where you had a conversation or at what gas station you spend 250,000


Diarization might be my next step,(recognizing the speaker on the recording).

Combining the information from multiple sources as you say will get you a complete view (location history and time of the recording will let you know if you where speaking with a college or your spouse for example)


If you are going through that much trouble you might as well get a WiFi scale, wear a tracker that has an API, etc. I’ve definitely thought about taking speech-to-text notes at work, nice to see somebody did it.


The problem is all those gadgets have their own APIs and quirks. For example, my WiFi scale often decides to silently not sync to my phone. Or decides to update my partner's weight instead of mine. OP is building on arguably the most natural API - speech, which is what all those smart assistants have been promising us for a decade. I think there is a lot of convenience as well as unexplored ground to be found in such a system.


Well yes, but this is a much cheaper option. Instead of having many smart gadgets you only need one. The mobile phone or some other microphone but that is the most obvious option.

Instead of paying hundreds of dollars for all these gadgets that have to be charged and kept safe you can buy cheap variants and still have basically the same benefits.


But your time is exchangable for money. (Something almost never reflected in hobby behavior: that's why the crafts store has 130 kinds of acrylic yarn and 1 kind of wool yarn.)

A $150 scale is expensive but buys a very small amount of software development.


Interesting to see comments suggesting use of constant recording as a defense against invasion of privacy via constant surveillance.

Interesting that the defense against the harms of technology is technology itself. When the humankind unlocks a new powerful technology, and when it is possible for criminals to use the technology to harm us, our best course of action may be, not to look away from it out of fear, but to spend more time understanding it and its implications, and to get there faster than adversaries do.


Handling the privacy of other people might be oddly easy. If you can detect the voice accurately enough the AI might be able to _drop_ the other participants.


Dropping after the fact still means it was recorded, this violating two party notice statutes.


Does that mean that every phone recording audio to detect its assistant trigger phrase villages two party notice statutes?


Yes.


These assistants have been around for a while, seems like that would have been established in court if it was the case.


What if it was never written to disk? The identification and trimming happening before any audio is actually saved.

Is picking someone else up on microphone at all a violation of two party consent? For example walking past someone in public with a loudspeaker call active.


I think there is even legal precedent for that. Dragnet surveillance recording of all phone calls. It was argued that recordings are only "stored". Judge order is needed for "retrieval", that may be several weeks later.


This reminds me about Black Mirror's White Christmas episode where they create a digital clone inside a white "cookie" and then they use it for receiving tasks such as making toasts. This project is very similar actually. I found very interesting the part where you can track your food and automatically calculate the calories every day without writing anything anywhere.


I'm curious if there was other work you were inspired by. I have also been a bit interested in using this style of "personal database/logger/journaling";

task-agnostic input -> processing -> visualization/recall

My assumption is you are just storing post-processed conclusions in a local db on your computer + raw audio for possible future re-processing, and not currently storing other media input (ala food pics)?


I'm trying to catch up with all the opensource AI stuff out there and explore the posibilities, on the spanish part of the website is a test with stable diffusion and twitter, and now I'm trying to finetune Donut document transformer.


I'd be so self-conscious with my speech being recorded roughly 24/7 by myself. I'd probably get used to it but it'd take some time.


People who take notes in life (the org mode people): oh cool. Everyone else: why would I want to know what I ate, weighed, or thought last week?


Great idea. However always recording is a disadvantage for me.

I thought about a device that could look a bit like Star Trek badge. It should react to pushing it slightly and it would have a microphone. It would connect to a phone with Bluetooth.

Main use for me would be push-to-talk as I use Zello with my wife quite a lot. But all those reliable assistant/voice-notes uses would be also sweet.


The advent of Whisper gave me a similar idea, except instead of uploading the recording once a day, it worked by calling my computer from my phone and recording and processing in batches. Realized pretty quickly that most of my day is silent though, and would rather be able to trigger it on demand, which I haven't gotten around to.


Pretty sure Bill Gates wrote about this idea in his book. It SEEMED like the future, but software/hardware innovation goes where the money is, and no one is interested in recording their own lives. Maybe once the AI to make use of it gets better, it'll find product-market fit.


In some jurisdictions it’s illegal to record a conversation without all-party consent. Example:

https://www.rcfp.org/reporters-recording-guide/massachusetts...


> The law only applies to secret recordings, however, so affirmative consent is not necessary when all parties are aware of the recording.

Hm. Will a T-shirt with “my phone is recording everything” be enough?

Interesting- secret recording is punished more severely than use of such recording. Logic?

Also, the difference between image and sound recording. Secret image recording in public space is basically ok, while sound recording is not. That probably is caused by a wide availability of photo cameras, with known “fair uses”.


No. It's not enough to vaguely inform people.

You must get their consent. T-shirts say lots of things that are hyperbole or bluster. Like "official boob inspector" is not generally understood to mean that you're announcing you literally are a licensed doctor.


> You must get their consent.

The article cites law, like with recent court cases. Let me add the exact court case they cite:

The law only applies to secret recordings, however, so affirmative consent is not necessary when all parties are aware of the recording. Curtatone v. Barstool Sports, Inc., 169 N.E.3d 480, 483 (Mass. 2021).

I am not trying to inform others of recording, actually, by my T-shirt sign. I trying to protect myself against the wrong law. If I was a lawmaker I would probably make all recording in public spaces legal. But if use of recording inflicts harm on you - you should be allowed to sue for damages. Probably you are already allowed anyway.

Again: a harmful use of recording without concent or notification should be punished, not the recording itself. I think.


These states are: California, Delaware, Florida, Illinois, Maryland, Massachusetts, Montana, Nevada, New Hampshire, Pennsylvania, and Washington.

Edit: I should add that in some of those states, it would still be permitted to record others in public without consent, where there's no reasonable expectation of privacy (e.g. a coffee shop or gas station).


Obviously the solution is for the government to record audio in all places at once, and then those with $$$$ can just pay for the audio feed. Win/win.


Pretty well known in the states, luckily he seems to be talking to himself :)


I think the passive part of this could be really interesting - starting with a simple "tag cloud" of keywords by frequency linking to audio snippets that mention them, it'd provide a way to index conversations during the day for future reference (or processing).


Further to indexing conversations, it would be interesting/helpful to be able to pull out:

  * conversation length
  * participants
  * location
So you could search for, say, that long conversation I had in the park with Bob.

I'm not sure how easy it is to identify/track different participants in a conversation.

Edit:formatting


indeed, that I guess will be the best part of the expermient, but the longer one.


Anyone familiar with rewind.ai which seems to be building a product on similar lines?


Looks like a slick wrapper around Apple's ImageAnalyzer and ScreenCaptureKit


Anybody made some progress with using google assistant with arbitrary commands? I know there are a few integrators online that could, in theory, get commands and send them to a spreadsheet, but I couldn't get them to work.


I liked this article and find it intriguing. That said, I would set the original sound data to expire relatively quickly, perhaps erasing everything week or so. I like letting the past be the past.


Would you mind linking/listing what microphones you're using?


I bought two, both from Aliexpress, no brand both, the one on the picture has a 5000 mAh battery, bulky but last a lot, and the other one is tiny but with short battery life, a lot of sellers on Ali, I pay around 30$ for each, both have the same software and bios, only difference is the size and battery.


for a while I had my laptop set up to take a photograph and and screenshot every ten minutes. The information was completely useless, but I got some great candid photos of myself


Are there any good (discrete) wireless throat mic patches out there that are sensitive enough to pick up subvocalization / whispering?


I was curious about this as well. It seems there are bone conducting microphones, but the ones I've found so far go in your ear (so it's visible). I'm going to just hide a mic in my beard.


Are you planning to make it an open source project?


for now it's "glued with tape", but I'm going to try to make it presentable to post something on Github


Please do, this is a very cool project and I would like to give it a try.


I'm looking forward to it!


I do that as well. I had a few arguments with police patrol over driving tickets. Once dog attack and very aggressive dog owner.


I thought this would be recording everything to generate an AI model that could sit in zoom meetings for you.


Did I miss something or is a description and/or link to the software used not in the article?


You can use you smartphone for the recording and transcribe it here: https://replicate.com/openai/whisper

regex to extract commands from the transcripted text


You can download whisper at https://github.com/openai/whisper


That calorie tracking will be WAY off and useless if not dangerous. Nothing beats a kitchen scale.


Very cool idea, the only question I have is how fast does this not drain the battery of the mics?


the device from the picture has a 5000 mAh battery (around 10hrs), and and bought a smaller one (1-2 hrs)


I must be blind, where do you list the microphone(s) you are using?


It appears to be on the Spanish language version of the blog but not the English:

https://roberdam.com/wisper.html

Although in the text it's just described as "a Chinese box" with a 5000MaH battery and the ability to record to its 32Gb of space in chunks of 30 minutes, as MP3 taking about 28mb each.

That section goes on to describe trying different microphone positions, as it makes a great difference to quality. OP originally tried it in a bag but the results were medicore, so moved to a different configuration which, although less comfortable, produced superior audio results.


https://www.aliexpress.us/item/3256803349510543.html https://www.aliexpress.us/item/3256803085687061.html

the particular choice was for the battery and the other for the size, both are generic and come with the same software and bios, several vendors, if I could buy something better I would look for one that can have a lavalier microphone


I bought two, both from Aliexpress, no brand both, the one on the picture has a 5000 mAh battery, bulky but last a lot, and the other one is tiny but with short battery life, a lot of sellers on Ali, I pay around 30$ for each, both have the same software and bios, only difference is the size and battery.


It's pictured in the Spanish version, under "El Equipo": https://roberdam.com/wisper.html


This is the sort of thing Nietzsche alludes to as his "last man".


What are the limitations of numbers as descriptors of Being?


Doesn't this break wiretapping laws (depending on the user's geographic location) and possibly GDPR/NDAs(if left on while at work)?

Ethically most people probably wouldn't be happy to find out that you recorded a conversation with them.


I am using it with friends and family, and tell them beforehand about the experiment


Forgetting stuff is a feature, not a bug.


Depends on the human haha for me it is for sure a bug because it is way to prevalent...


Why is this being upvoted ? There is no code


Did you think there was a requirement to have code?


For normal blog posts, no, but OP is submitting as "Show HN" which requires it to be something readers can try on their own:

>Show HN is for something you've made that other people can play with. HN users can try it out, give you feedback, and ask questions in the thread.

https://news.ycombinator.com/showhn.html

This is an on-topic blog post, but it doesn't seem to fit the criteria of a Show HN.


Hm, how can you tell it was a Show HN? Doesn't seem to be in the title. Maybe it was before?


Yeah, the title originally had the "Show HN" prefix. Looks like it's been updated since.


So where was the remote control ?


yeah but there's 3 toasts


Why are you so self-obsessed?


Why are you so mean-spirited?


Too bad Abbot and Costello aren't around to attend a standup with a bunch of people using your app.

  - Robert Robert what did you do yesterday End Robert
  - Robert I met with Robert in accounting to finish Jira 12392 story on the Robertson patches End Robert
  - Robert Then I took a break to have coffee and watch a Julia Roberts short with Robert in System Admin End Robert
  - Robert Robert what are you going to do today End Robert
  - Robert It depends if Robert Roberts has internet access End Robert
  - Robert If he doesn't, that'll be the end of Robert Roberts End Robert
  - Robert No impediments for either Robert in the Robert epic End Robert
  - Robert Robert's on Help Desk, Jean Robert's in Code Review, and Sam Robertson's running the Roberton's retrospective End Robert
  - Robert But then who is Robert Robertson? End Robert
  - Robert Oh Robert Robertson's our scrum master! End Robert


I'm a bit concerned about the calorie level I see here, 832/day. That is about 1/3 of the NHS recommendation [1] for males.

1. https://www.nhs.uk/common-health-questions/food-and-diet/wha...


The lamb sandwich, orange juice and almonds alone could approach 800 calories, even excluding the other three meals. People are quite bad at estimating calories though, it takes practice.


It's typical in calorie tracking that people start the day strong, but then forget or lose will to track later in the day. Ideally days without a full record should be excluded; I'm assuming that's just not happening here.


I agree, but looking at the actual food eaten, it doesn't seem to line up.


Those recommendations are for people that are on their feet all day, not for office workers and home dwellers that never go to the gym.


No, you add more calories if you're active. The USDA has a breakdown by age, gender, and activity level [0]. 2400 calories is recommended for an adult sedentary male between 21 and 40.

Sedentary is now the norm in the developed world, so it would be weird to use active as the baseline for health recommendations.

[0] https://www.fns.usda.gov/estimated-calorie-needs-day-age-gen...


Estimations for roman soldier on the march come closer to 3000 calories.


I think your last sentence summarizes the sentiment really well: “The difference between utopia or dystopia is who has access to that information”


> My biggest problem with “OK Google” is that I don’t know by heart what it can do interactively

Maybe it’s just me but this feels unaddressed and that seems ridiculous.

Why is it so hard for me to find a single, precise location on my phone with an enumerated list of every command Siri or Google can work with?


> Why is it so hard for me to find a single, precise location on my phone with an enumerated list of every command Siri or Google can work with?

The likely answer here is that the engineers who work on such products would scoff at the idea that their work amounts to a simple list of commands. In their minds, they’re working on a natural language virtual assistant, whose understanding of user input is “intelligent”, and it should know what you want regardless of how you phrase it. Want to do something? Just ask! Treat it as if it's a person! The possibilities are endless!

Never mind that its actual functionality (y’know… the things it can do when it understands you) is embarrassingly finite and boils down to a “list of commands” anyway.


I'm not sure if that's what Google is doing. More than half of my queries result in a "Let me google that for you" response where it pulls up a search page.


Apple's "assistant" is similarly useless. The best use case I have for it is when I'm driving and pondering about something silly so I ask it: "How does X do Y?" or similar, and the response in 99% of the cases is "I can't show this to you right now".


Alexa takes "the customer is always right" a little too far:

  Me: Alexa, when do babies double their birth weight?
  Alexa: According to an Amazon customer, some time within the first month. Does that answer your question?
  Me: No!
(The true answer is more like 5 months: https://www.mayoclinic.org/healthy-lifestyle/infant-and-todd....)


Not having lists means they can collect more training data.

Build a database of all the attempted interactions. Cluster them by task. Sort by most used (or most monetizable) that the system can’t support today. Bam! You’ve got a rough futures capabilities roadmap.

It’s more complicated of course but here you literally have a large customer base telling you what it wants, but your product can’t yet do, regularly.


Unfortunately there's also drift in behavior that comes from retraining. "OK Google, play NPR news headlines" get different results some days than others. Sometimes I get the latest hourly news, sometimes I get a robot voice reading headlines to me. Sometimes asking to dial someone calls them, sometimes it returns search results. Yadda.


Yes, is there a list of every command a human can execute or can work with?


Not sure what side of the debate you're taking here, but I think you've outlined the issue perfectly.

Engineers: "We couldn't have a list of commands, that's not how humans work, you're supposed to treat Alexa like a human, and the possibilities are endless"

Users: "Ok, then. Alexa, take out the trash."

Alexa: "Sorry, I can't do that."

(Ok, so obviously the possibilities aren't endless, right?)

I can somewhat understand general knowledge queries. For those, you can totally make the case that there's just too many things you can ask about to enumerate them all.

But imperative commands, like sending text messages, setting timers, or home automation? There's a finite list of those, since at the end of the day they actually have to be authored by some human who's writing a (say) Alexa skill. The number of utterances that may map to those skills are unbounded, but the number of skills aren't. So yes, at the end of the day, for "command" like things, they really should be able to give a list of them.


> (Ok, so obviously the possibilities aren't endless, right?)

This does not follow from the above. The set of positive integers is countably infinite. So is the set of positive even integers. Even if "half of the positive integers are missing!" there are still "endless" even postive integers.


By that logic the calculator app has an (effectively) infinite amount of functionality since there is an infinite number of integers which you can add together.

Somehow though they still list all the features.


> By that logic

This doesn't follow at all. It's not what I said and I find it difficult to believe that you even think it's what I said.


> This does not follow from the above

Well, I elaborated after. There's an actual finite set of skills that are coded up by actual engineers. A natural language system isn't hallucinating the ABI for the function calls that send text messages. There's code there which takes the utterance and sends the texts. What I'm saying is that you can take an inventory of what skills have been written (and/or are installed), and y'know... document them somewhere.


> you can take an inventory of what skills have been written (and/or are installed), and y'know... document them somewhere.

Sure. I didn't take exception with anything except the standard HN middlebrow dismissal.


I'm not giving a middlebrow dismissal. There exists a real discoverability problem with virtual assistants, and asking users to "just try things" leads them to try things that don't work, and then conclude that the assistant must not be as useful as they thought.

Moreover, when an assistant doesn't do a thing, you're unlikely to try it again later; instead most people will conclude "I guess it can't do that" and move on. If they add the feature later, it's too late.

With every failed request, your confidence that an assistant really is intelligent and can understand you, diminishes more and more. Every time a user hits a dead end with a virtual assistant, it doesn't encourage them to try more things that do work, it instead gives the user less confident that anything will.

I can't count the number of times my wife has been surprised I can get Siri to do things. Her typical response is "I can never get her to understand me so I just stick with timers." It's a real problem, and I'm not being dismissive of anything.

In contrast, reread your comment in this context. You're taking my comment, reading in the least charitable way, condescending to me about the meaning of finite when the rest of my comment clarifies what I mean, and being completely dismissive of the point I'm trying to make. How can you say I'm the one issuing middlebrow dismissals?


You should do some self reflection on why you felt the need to make a comment just to make yourself look smart.


> why you felt the need to make a comment just to make yourself look smart.

I hardly think it made me look smart. It's borderline trivial. The parent comment was insanely reductive in the stadnard HN style. I was hoping to help reduce the appearance of future such comments.

Sibling comments indicate that it had no positive effect. Such is life.


> It's borderline trivial.

> I was hoping to help reduce the appearance of future such comments.

> Sibling comments indicate that it had no positive effect.

I'm really not trying to attack you here but this honestly reads like a high-school kid trying to make themselves sound smart by emulating spock from star trek.


Yes, actually, here: https://en.wikipedia.org/wiki/Basic_English

  If one were to take the 25,000 word Oxford Pocket English Dictionary and take away the redundancies of our rich language and eliminate the words that can be made by putting together simpler words, we find that 90% of the concepts in that dictionary can be achieved with 850 words. 

  The shortened list makes simpler the effort to learn spelling and pronunciation irregularities. The rules of usage are identical to full English so that the practitioner communicates in perfectly good, but simple, English.

  We call this simplified language Basic English, the developer is Charles K. Ogden, and was released in 1930 with the book: Basic English: A General Introduction with Rules and Grammar.
Even Includes 200 picturables: http://ogden.basic-english.org/wordpic0.html

"A widely known 1933 book on this is a science fiction work on history up to the year 2106 titled The Shape of Things to Come by H. G. Wells. In this work, Basic English is the inter-language of the future world, a world in which after long struggles a global authoritarian government manages to unite humanity and force everyone to learn it as a second language."

- Sounds pretty close to Siri and the other digital assistants to me. Ever watch people from none English countries use their smartphones? Not all of it is implemented yet but this is almost all you need to run an empire.

Here it is deployed in favor of much needed disciplinary action for two Scottish people:

https://www.youtube.com/watch?v=BOUTfUmI8vs


> Here it is deployed in favor of much needed disciplinary action for two Scottish people

There was a moment when call centers started deploying “just say it” en masse - and I was literally in panic. Luckily, they brought back “or enter” pretty soon and also en masse.

To be fair to robots you protein constructs are not much better. In a two mile radius of our company’s office humans trained themselves to understand Russian accent pretty well. But beyond that…


I would love to see something similar to Basic English: A General Introduction with Rules and Grammer for other languages. It seems like it would be a great tool for learning a new language.


Anyone reminded of XKCD's "Up goer five" strip (https://xkcd.com/1133/), or is it just me?


He expanded the idea into a book: https://en.m.wikipedia.org/wiki/Thing_Explainer


There isn't, but a partial list could be assembled.

Most human interactions are context-triggered and heavily scripted.

This is easy to see on social media where responses to a popular trigger post fall into groups. A lot of people make one of a small number of generic expected responses, and there's an even smaller number of funny/off-beat posts - which all make the same joke.

Occasionally you get a truly original inventive reply. But only very rarely.

I have a vague memory of a fringe AI startup which has been trying to formalise that contextual database since the 90s.


It also annoys me that there's no (obvious) meta-interactions with most smart assistants to explore what's possible. I can't easily ask "can you do X?" or "what can you do with Y?"


This, plus the already-discussed lack of a list of working commands, further cements my belief that "voice assistants" are not there for the benefit of those who keep them in their homes.


There isn't even an enumerated list of all the features of the Google search engine (i.e. quotes for full expression, minus to rule out words etc.) And this might be the most popular web service in the world!


There is actually a fairly good list here: https://support.google.com/websearch/answer/2466433?hl=en

I know for a fact that it isn't complete. But most of the "secret" ones that I am aware of were very obscure and usually buggy, so maybe this is all of the officially supported ones.


Other OK Google problems:

o it changes. who cares if you know even some "commands"... it'll break. I used to ask google maps when driving "Ok Google, ETA". It's been many many years since that stopped working.

o can't change name/ATTN keyword. how dumb is your AI that you can't even rename _your_ assistant. /s


Oh yeah the constant syntax changes got me to stop using it entirely.

Commands would suddenly lead to web searches, I'd then have to Google the new set of magic words to make it set a reminder or whatever, only for it to break again two weeks later.


> Why is it so hard for me to find a single, precise location on my phone with an enumerated list of every command Siri or Google can work with?

Because engineers (and managers) contrive problems like this to the point they are useless solutions.


(I'm posting here because it's the most recent comment by your account).

You've unfortunately been breaking the site guidelines repeatedly and egregiously:

https://news.ycombinator.com/item?id=33585475

https://news.ycombinator.com/item?id=33550821

https://news.ycombinator.com/item?id=33547727

https://news.ycombinator.com/item?id=33472366

https://news.ycombinator.com/item?id=33472317

https://news.ycombinator.com/item?id=33468223

https://news.ycombinator.com/item?id=33451816

https://news.ycombinator.com/item?id=33447930

If you keep doing this, we're going to have to ban you. I don't want to ban you, so if you'd please review https://news.ycombinator.com/newsguidelines.html and use HN in the intended spirit, we'd really appreciate it.


Oh cool, even the site admins are ganging up on me. Most of those comments are taken entirely out of context, or direct replies to comments that were replies to my own, but you do you.

I am using HN in the intended spirit, and I'm sorry that you're letting the echo chamber color your perception of perfectly acceptable comments. You can email me if you'd like to continue discussing this in a more professional manner (why else would I fill it...?), but I have to say this is an appalling showing of leadership on your behalf, and I hope you address it soon.

Public intimidation does not make for a safe and healthy culture. Period.


No one is ganging up on you—the examples I listed contain plenty of personal attacks. That's clearly against the site guidelines and of course we have to ban accounts that won't stop doing it, so please stop.


One, maybe two examples contain personal attacks. You're going deep into threads to find these. Sounds more like a witch hunt than a fair and balanced analysis of my contributions. Ganging up is an understatement.

The rest? Get real. Cherry picking at it's finest. You're barking up the wrong tree here and I have no problem defending those comments ad nauseum.

I'll say it again since you're really not getting it: feel free to email me if you'd like to continue this in a more professional manner. I'll keep flagging comments that blatantly ignore this.

How many times is enough? I've now politely asked three times to be contacted via email if you need to, twice specifically for this altercation. Please don't make the same mistake a fourth time, as an admin and representative of this dying site.


If you're posting publicly on the site, it's appropriate for moderators to respond publicly on the site. Moderation comments are important not just as a one-to-one conversation, but as a signal to the community.


If you publicly intimidate users with a bias, you are not facilitating a fair and safe place to share ideas.

It's a concept called inclusivity, I think you should read up on it a bit.


I've been thinking about wiring up whisper[0], mozilla's tts[1] and gpt-3 together to make a voice assistant of sorts. Wouldn't have the access to device hardware and no guarantees of correct answers, but should blow siri etc out of the water in terms of understanding the context.

[0] https://github.com/openai/whisper [1] https://github.com/mozilla/TTS


It should also talk to spammers and provide them fake credit card numbers.


I would not count on Google to make public any such thing. But a third party could test it out to build such a list. And that could include caveats like "works if you ask in this form, no if you ask in this other form".


I stopped using Alexa after it almost burned my house down on thanksgiving. Apparently “bake at 400 degrees Fahrenheit for thirty minutes” somehow became “microwave for thirty minutes” even though it got all the words except bake! Who sets a temperature with their microwave?

Anyway we meant to bake something but instead absolutely roasted a metal pan and wire rack that merged into the glass somehow.

My wife thinks it’s kind of funny because the Disneyworld “Carousel of Progress” shows a very similar event happening due to voice controls, which they predicted in the 1960s!


Third party integrations?

It's both a static list (available to everyone) and a dynamic list (available only to you).

Having seen all the dead products at Google. Who would get rewarded for this/compensated? Would the complexity in building the list increase ongoing costs with an unclear return on investment?


Presumably, there is a list somewhere in Google’s internal documentation. All we’re asking for, is for them to copy and paste it from that documentation, clean it up a bit, and post it online.


There probably isn’t. There’ll be some hard coded “if this, say that”, but there are a lot of trained responses in the models that won’t be as simple as that.


The original Siri had such a list. I found it demonstrated here: https://youtu.be/agzItTz35QQ?t=716

Did that ever make it to release? I can't remember seeing it on my actual phone.


It would change constantly over time, and would eventually become very large. It's an interesting idea, though, a school subject on how to interact with your AI. Lots of grammer, machine learning theory, culture, a bit of security, etc, to second guess it.


Students in most public schools don’t even learn English grammar now in most states. That went away at some point in the Bush or Obama administrations, probably due to the NCLB and Common Core initiatives. It is not uncommon now to encounter college students who simply have never heard of a “direct object.” They need classes on the grammar of their own language more than they need a school subject on interacting with an overhyped and underperforming Siri.


Common Core has English grammar as a foundational skill. Your specific example is taught in grade 5.

That said, English is a difficult language and I'm not at all surprised that people get through school without fully grasping the names for grammar concepts, even if they use them every day.


I learned this when I was about 20, met a foreign friend online who was formally learning English, and I couldn't answer most of her questions.

It was a bizarre but very educational moment: I use English like I write code: I have no formal education on it but I seem to do fine.


You can use open source assistant instead like Dicio https://github.com/Stypox/dicio-android and configure it the way you like.


(Slightly off topic) If you click in the "RoberDam.com" link that appears when you scroll a little bit you get redirected to "http://localhost:8080/".

It seems to only happen in the English page. In the Spanish version of the post the link works well.


Good catch. There seem to be several "http://localhost:8080/" strings in the page source. Leftover code from testing?


Fixed, thanks for the tip Alex!


Still goes to http://localhost:8080/ for me.


fixed now?


>RELATIONSHIP THERMOMETER

>According to studies on couple relationships, it is possible to predict with an accuracy of up to 90% if the couple is going to divorce by studying the interactions, specifically the relationship between positive and negative interactions between the couple

Apparently the studies that were used to reach that conclusion does no such thing and were hilariously flawed.

https://slate.com/human-interest/2010/03/a-dissection-of-joh...

>The upshot? What Gottman did wasn’t really a prediction of the future but a formula built after the couples’ outcomes were already known. [...] The fundamental problem is that no matter how many equations, even quite similar ones, Gottman generates, we have no real idea of his forecasting power because of the way he reports his data


literally horoscopes for tech bros. I am starting to be slightly concerned that there's actually people out there who think some sentiment analysis python package is going to tell them what their individual relationships are like.


Oh there definitely are.


brb, need to edit my pitch deck..


I'm in the middle of building literally the exact same thing for myself.

Beyond privacy/security, the aspect of the app I worry about the most is giving oneself perfect memory and then never being able to escape the past. That last fight you had with your ex? Well now its recorded and you can listen to it, and dissect it, and wonder what you could have done differently, right up until you blow your brains out.

But, as always, its up to the user to use the technology in a healthy way. It would be, after all, a choice to remain mired in the past rather than taking healthy lessons from it to make your future better.


Nothing is stopping you from deleting processed audio and text. With a lag.

I think that while there is a recording it does not mean you will remember and access is. E.g. I have folders of my high school math but will likely never consult them for an actual day problem, only for remembering/reminiscing.


insert men with autism meme


What a nice pair of feet!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: