Hacker News new | past | comments | ask | show | jobs | submit login
I Made a Self-Quoting Tweet (oisinmoran.com)
622 points by OisinMoran on Nov 30, 2020 | hide | past | favorite | 136 comments



> And for anyone at Twitter who was depending on the network of tweets being a Directed Acyclic Graph, I'm so terribly sorry.

I love the idea that there's someone out there with code that resolves retweet chains recursively, who's about to be in for a great head scratcher of a bug.


One of the replies is an engineer at Twitter, who confirmed that someone managed to achieve this 7 years ago and it caused issues in the backend.


It’d be a simple check that anything referenced has to have a lower ID (and hence time stamp).

I find this bug more interesting:

> Also, it seems like Twitter doesn't actually care about the username and just resolves URLs based on the tweet ID. I'm sure lots of people already knew that but it's new to me.

They’re not validating the parent directory matches the actual tweet. I wonder if that’s an actual bug or intentional to allow for handle renaming to not break existing links.


That seems incredibly brittle due to making assumptions about the ID format that twitter has no obligations to keep. It just so happens that if your logic had been if(newId > olderId) you would have survived their new format (due to the fact that the timestamp leads the integer) but that'd be a win based on pure luck. The example the engineer came up with was seven years old so it had no way of foreseeing the ID format change.


It's a good point but I would suspect that the timestamp being prefix is no accident. If your IDs were sortable in the past it's probably a good a idea to keep them sortable. And since tweets more often than not refer to tweets in the same time range or are paginated together with tweets in the same time range (roughly) having time as a prefix has other advantages


> I would suspect that the timestamp being prefix is no accident.

Yeah, it's what they created Snowflake for.

https://blog.twitter.com/engineering/en_us/a/2010/announcing...


This bug actually has some security implications because you can make it seem as though an account has tweeted something when really it was another one. A casual observer might not notice the discrepancy between the name before and after the click.


Can’t be unintentional, else they can’t keep up with users changing IDs.



And the content, for anyone too lazy to click:

Commenter

>what's really amazing is that twitter programmers thought about this edge case and made sure the tweet would not display itself

Twitter Engineer

>We didn't think of this edge case. Someone did this about 7 years ago and the recursive hydration would make a tweet service crash by simply loading the tweet in a browser. It took a principal engineer an entire day of wading through heap dumps to figure out what was happening.


TIL: debugging via memory dumps is a Principal Engineer level skill.

Anyone here actually do this? I read about it in Release It and it sounds by far like the closest thing there is to a super power when it comes to solving production incidents. I've never actually seen anyone do it though.

Recently saw a video on this technique from Dotnet Conf. Piqued my curiosity again, and now this. I've really gotta learn this.


The fact the principal engineer was doing this does not mean doing this is a principal engineer skill. There's lots of software engineers who can deal with coredumps which is pretty much the same idea.


I have done it once successfully in 10 years (.NET dev). Would recommend having any other kind of logging or instrumentation in place so you don't have to do it. It's still worth learning WinDbg and sosclr.


In my company, we used to have a plugin for our bug tracker to automatically analyze .NET core dumps with WinDbg (if they were attached to a bug) and extract some useful information. We used to do this relatively often, for a shipped product, not a live service, especially if we found memory leaks.


Would you say something like that is worth to set up?

I noticed EC2 now has an API to get memory dumps. Theoretically you could automate collecting memory dumps when an unhealthy instance is pulled out of a load balancer. Then some automated analysis could happen, and allow further manual analysis.


Not sure how much it cost, but it was definitely helpful - even the fact that it was obvious which team needed to take a look first based on the objects that had leaked often made it worth it.


I remember spending quality time with coredumps and gdb back in 2012/2013, when a prototype supercar dashboard we were building crashed on certain CSS animations.[ß]

The call chain went through GTKWebkit, Wayland and all the way to Pango and Cairo. Getting that part untangled took a long afternoon. Figuring out the root cause was another two full days.

The topmost parts of the stack above could be dealt with breakpoints, but even with pango/cairo libs from a debug build it was painful. The failing function could only be single-stepped, trying to place breakpoints inside it would not work. In the end it was an unhandled divide-by-zero deep inside the rendering library.

ß: story for another time.


How else do you debug C/C++ programs that crash?


By having a crash harness in the program that dumps the call stack and relevant internal context. Coredumps are really an option of last resort.


WTF? If you already have the infrastructure to coredump, they are without a doubt the most convenient way to debug. A stacktrace does not even begin to compare. It is like limiting yourself to printf-debugging in the presence of gdb.

Actually, it exactly is! Now I'm not sure if you were /s or not.


It all depends on how tangled your spaghetti are.

For the code that implements basic state and invariant checks (ie ships with asserts compiled in), crashes are usually exceedingly rare and limited to one of these checks failing. Debugging them requires a stack trace and, optionally, some context related to the check itself. If the program dumps this info on crash, the fix can typically be made in less time it takes to retrieve/receive the coredump and start looking at it. If it can't be fixed this way, then it's to the coredump we go.

On the other hand if the code is prone to segfaulting on a whim, requiring dissecting its state to trace the cause down, then, yeah, it's a coredump case too. But a code like that shouldn't be running in production to begin with.

So that's, roughly, what the F is.


Sure, if by miraculous chance you happen to have printf'd exactly the state you required to figure out the assert/crash, "you don't need gdb". You could also find -- by divine inspiration -- what went wrong just by looking at the line number where the assert failed. But it's still WTF-y to argue that therefore, an actual {,post-mortem} debugger is "a last resort tool".


Any chance you remember which video this was? I can't see it in the dotnet conf 2020 playlist.


Analyzing Memory Dumps of .NET Applications.

https://channel9.msdn.com/Events/dotnetConf/2020/Analyzing-M...

I found another one where more detail is gone into on how to script WinDbg to have breakpoints that run code to do stuff. Sounds pretty powerful.


I did this in my second year as a professional coder and it took me a while (a week? a week and a half?) to understand what to do and what I was seeing. I would prefer never to have to do it again.


Hydration?


Hydrate: (verb, jargon) To populate with metadata or subobjects.

i.e. The FriendsList service hydrates each friend object with a list of friends you have in common



They will have to reckon with the fifty thousand-long chain at https://twitter.com/every_peano


I actually just came across this account recently and was tempted to make some sort of reply bot on whether they were prime or not. I was also trying to find some of the "funnier" numbers to see if they had a disproportionate number of likes (certainly not to like them myself...) but gave up after realising they had almost 50k tweets.


So this brings up a question I've long had, but didn't want the distraction of researching: Is there an easy way to get to some point in a user's timeline? e.g. first tweet, or November 1, 2020?


I laughed out loud at the other twitter reply asking ThreadReaderApp to unroll the thread


And it seems that ThreadReaderApp didn't reply. Evil, indeed.


Recursion isn't the problem. Not keeping track of seen tweets is the problem. Recursion can be used to detect cycles and traverse a cyclic graph in a way that doesn't blow up.


But you wouldn’t do that if your assumption is that the graph doesn’t cycle.

Edit: child comments are correct and I regret my oversight


For traversing a DAG you probably still would to avoid exploring an exponential number of paths (consider a chain of diamonds [1]).

1: https://www.researchgate.net/figure/A-diamond-shaped-DAG_fig...


But a diamond cannot occur in a Twitter-reply graph, right? It would require a Tweet to be able to reply to more than one tweet.


It can because the assumption is that we are crawling embedded links as well as native parents.


You can reply to one tweet while quote-retweeting another. Depending on how links are defined that could result in a diamond.


It’s a lot easier to just have a depth limit on such non-cyclic graphs than keep an in-memory list of previously seen nodes. its interesting for sure! but a much rarer edge-case imo


Graph doesn't need to cycle to hit a node twice. Good code does not visit the same node twice.


When doing recursion (Postgres recursive CTE) I keep a path on the latest edges and check to see new edges aren't already visited, so same nodes can appear in multiple branches but not on the same branch. Works flawlessly.


Can you provide an example of this? I’d like to understand this technique.

Edit: is this an example? https://stackoverflow.com/a/1757915


Irritation would probably better in terms of not blowing up your memory requirements. Just keep hashes of all visited nodes and a stack or queue of to-be-visited nodes and loop until you have no more to-be-visited nodes.


I think you meant "iteration", not "irritation".


Way back (probably around 2000) I created a Usenet News message that referenced itself (pretty trivial).

Turns out that a bunch of news reading software didn't like this and crashed when it tried to load that newsgroup.


To save people some time, here’s the tweet:

https://twitter.com/quinetweet/status/1309951041321013248

In essence, the approach was:

* Find out what the tweet id is of a recent tweet * Find out what a tweet is shortly after * Estimate the rate of new tweets appearing * Publish a tweet with a reference to a tweet with a now+guess id

The write up is well done and interesting but a little long winded.


Thank you! The reason for experimenting with a sort of blow-by-blow report style here and detailing the full and true journey of the idea was that I feel the whole process from idea to artifact is not written about enough.

That said, it can often be hard to fully pinpoint the actual inception of an idea and just thinking about this one yesterday I realised that the idea likely came up while traversing a long chain of quote tweets. I really appreciate the feedback though—it's especially useful as I'm only starting to publish regularly now so have lots to learn—so thanks again.


There seem to a be a couple of "camps" in HN. Broadly, those who enjoy reading, and those who don't, but endure it as a means to obtain information.

I'm in the former camp, and enjoyed your writeup quite a bit.

The latter camp will reliably show up when writing is any more long-winded than it absolutely must be, and will a) summarize (good) and b) complain (annoying).

It is what it is. Congrats on your whimsical creation!


Maybe a constructive message would be 'please post a summary at the top and a stronger conclusion at the bottom' since we're conditioned this way by various written structures.


I would argue that @OisinMoran followed this exact structure, the first section (above the fold) has this:

> Fundamentally the challenge is just correctly guessing what ID a given tweet is going to get, then appending that onto the URL for our profile and tweeting it.

And of course, the strong conclusion at the bottom is the Tweet itself.


Thank you!


I loved the write up, especially those basic things all experienced programmers know but sometimes makes newer ones scared, like sometimes you don't quite know how the answer works exactly at first, or sometimes you have a big idea and a lack of double quotes is your problem, etc etc


Thank you! Yes, I think you put what I was trying to capture here really well. When I first attempted to learn programming I got stuck on how to _actually_ run the code and eventually gave up but then started in an actual class in college. So being painfully aware of the gaps that can be left in explanations I try to avoid creating more of them.

The ultimate solution that will probably happen at some point is a kind of collapsable/expandable explanation that takes your knowledge into account and can omit the bits you are overly familiar with. One of the difficulties with this would be adoption from the creators so it needs to not have too much overhead—writing is already difficult enough.


Also loved the write-up (have to, really, it's how I do it too.)

HN, 1950s edition: "Look, Mr Tolkien, what I want to know is does the ring go in the mountain? Why do I have to read about this journey?"


The write-up was great, actually. The whole point of sharing the article is to read about the process and the technical details, I don't know why people insist on these cheap TL;DRs.


I think enough people want that kind of TL;DR that it's reasonable to put one right up front. I see a similar sort of thing with DIY projects, where they tend to get better attention if the first picture is the finished product, and then it works you through the process. It's partly to placate people who only want to see the finished product, but it also gives enough of a hook to everyone else that I suspect some people will read the article who otherwise wouldn't.


Thank you!


As I said, the write up was quite interesting, so don’t take it the wrong way! A long read - including the journey taken to get there - can be educational itself (including wrong turns, proof of concepts etc).

The only reason I posted the tweet was because the lede was well and truly buried at the end of the article, and you may have lost some readers’ interest by the time they get there.

You could have started with something like “This is how I managed to post a self referential tweet [link]” which would both bring it up front and centre, as a means to entice readers to find out more.

Sort of like how some TV episodes begin with “[record scratch] You may wonder how I ended up here...” – deadpool is a movie that begins in that way, for example.

For really long posts you might want to have a kind of table of contents that sets up the structure which you can link to parts.

Anyway, don’t take my abbreviated summary as negative; I enjoyed it.


I'm seeing this:

> This is not available to you


Yes unfortunately Twitter's UI doesn't seem to handle recursive quote tweets, which was the main disappointment for me. I'm puzzled at those saying it does work as all I've ever gotten for the _quoted_ tweet was:

> This Tweet is unavailable.

But perhaps they are just talking about the actual tweet.

Anyone at Twitter here that can give an indication of where this is on the roadmap?


That's a common Twitter bug, just try again a few times.


Alternatively, replace twitter.com in the URL with nitter.net. It points to an alternative client.


sure, if you want to give your twitter credentials to some random company based in the caribbean.


What? https://nitter.net/quinetweet/status/1309951041321013248 doesn’t ask for credentials and AFAIK there’s no place to enter them on the whole site, unless you type them into the search bar.


It still shows up as "tweet unavailable" for me, though, in this link


What you're seeing is an available tweet with an embedded tweet that is kept from being displayed. Because the embedded tweet is itself, it is handled by showing the same UI as any embedded tweet that is not available.


Right, but my point is the behavior is the same on Twitter and nitter


"tweet unavailable" is a different error from the "This is not available to you" message. Got that message earlier today when I tried to open a different linked tweet. Changing to nitter fixed it for me.


Ah, I get tweet unavailable from both, thus my confusion.


If it's the same bug I frequently see, manually refreshing the page by hitting enter in the address bar always fixes it for me.


You were right, it worked this time.


Nope, I'm assuming that's someone on Twitter noticing what he did and hiding it.


It's definitely a common Twitter bug.

"This is not available to you." is something different from "This Tweet is unavailable."


I have "This Tweet is unavailable."


Yes, that is a completely different thing. "This is not available to you." comes up in place of the content in the Twitter UI.


Or the software just doesn't handle it.


Why the speculation from you and the parent? You can literally click on the Tweet and see that it works fine.


When I click on the tweet, it shows the URL in the text and where the quoted inset should be it instead reads "This Tweet is unavailable."

The parent and grandparent are speculating because it works for them but not for others here.


I made two mutual-quoting tweets that run the toad oscillator from the Game of Life

https://twitter.com/mauritscorneIis/status/12668346972560875...


This is actually much cooler in my opinion! I love your progress indicator too. How did you go about doing this one? And what are the four parts you are referring to?

Do you think a 3-cycle is doable?


AFAIK the tweet id has four moving parts, a datacenter id, a worker id, a sequence number, and a timestamp (original poster mentions three).

3-cycle is definitely doable, it may take a week posting 100 tweets per hour or something like that. The biggest problem is twitter blocking the account (even when you stay under the API rate limit).

Code is here if you are interested, I ran it using GitHub Actions: https://github.com/pomber/conway


They don't mention a datacenter id in the docs, but I wouldn't be surprised if it's changed since the docs were written: https://developer.twitter.com/en/docs/twitter-ids

I implemented a .NET port of Snowflake where I work--being able to work with data without ever having to worry about an identity column is extremely freeing.


The Data Center ID has been in Snowflake IDs for a while. It's present in the original version of Snowflake that Twitter open sourced back in 2010 [0]. Looks like it was added in this commit [1] from August 2010.

[0] https://github.com/twitter-archive/snowflake/blob/snowflake-...

[1] https://github.com/twitter-archive/snowflake/commit/ba2e67ea...


Ooh that's very interesting! I've just presumably subsumed one of those into one of the others.

Seems like there's a cycle-3 game of life creature called a pulsar that would be perfect for this: https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#/media...


Previously (2018): https://news.ycombinator.com/item?id=16978913

Previously (2009): https://twitter.com/selfrefer/status/3128391843 https://twitter.com/spoonhenge/status/2878871344 https://news.ycombinator.com/item?id=743144

(last 2 were self-referential at the time, but then twitter changed the rules for linking; I recalled seeing these at the time)



Oh wow—I'm even less original than I thought! They even had the same name!

Thanks for these omoikane and bazzargh (and the others both here and on Twitter), I'll have to add an addendum to the blog post with links to all the other fun related examples.


FWIW, I also did this back in 2019: https://twitter.com/nneonneo/status/1177641328705851392?lang...

I don't know of a better way to do this other than some analysis of the IDs + clever bruteforce. If I remember correctly, I used just over 500 tweets to do it.


This is excellent, hats off to you! I love your creative lead-in as well, mine is bland in comparison but I'm somewhat glad to be off the hook for destroying the DAG as all these prior examples are showing.


I'm pretty impressed that it took less than 1000 attempts. It would still be feasible even it were several orders of magnitude harder. This twitter project I made (predicting celebrity deaths) took about 250k tweets over the course of a few months: https://twitter.com/ghastly_omens


This is great! That's not 250k tweets per death, right? Did you do the classic make a lot of predictions then delete the wrong ones? How did you select the celebrities?

Would be interesting to do something with actuarial tables here.


Yes, delete the misses. 250k total. About 700 celebrities. I was predicted the day of death too, so each required 365 tweets. I did a combination of scraping IMBD + manual selection plus filling in the details. The whole process could probably be automated and I regret not doing that. It would be wild to see this in 2020...

I did consult actuarial tables mainly to decide how many of each age to include. With that few candidates, it wasn't wise to focus on people in their 20s and 30s obviously because it's very likely all of them would have survived.

A few more details here: http://jere.in/i-predicted-23-celebrity-deaths-in-2017-then-...


This reminds me of when we were counting collisions in tweet IDs years ago at my old job (social media agency). We used the collision rate to estimate the total volume of tweets going through the system.

We also determined ID assignment was determined by three servers in a round robin load balancer and load was distributed based on modding of a 32 bit integer, so two servers were getting more load than the other since you can’t evenly divide a 32 bit integer by 3. They fixed that bug after a couple months of observation. I forget if we let Twitter know or not.

I love stuff like this.


>Fundamentally the challenge is just correctly guessing what ID a given tweet is going to get, then appending that onto the URL for our profile and tweeting it.

Reminds me of that program for referring to a future git commit, which operates on the same principle: iterate through guesses of the future commit’s hash prefix until you have a (long enough prefix-)collision.

https://github.com/hundt/git-time-travel


Writing quines is one of my favorite things to do in a new language. That it is always possible is a consequence of the recursion theorem[1], which is, in my opinion, one of the coolest results in basic computing science. I personally find it far more interesting than the halting problem.

[1] https://en.wikipedia.org/wiki/Kleene%27s_recursion_theorem


> I personally find it far more interesting than the halting problem.

I agree.

Another cool theorem related to the halting problem is the Rice theorem, because it's a really powerful thing to say that every semantic property on programs is either 1. always true 2. always false, or 3. undecidable.

It's an absolutely uncompromising theorem.



I think I'm missing a tweet of context. I get that they quoted their own url, like in the top article, but then the account replies "Ugh, busted :(" to someone who has their tweets protected.


It's just Twitter engineers goofing.


As I recall, the trick is that you can post a tweet, grab the URL, then if you edit it quickly enough it won't show up as edited.


But you can't edit tweets. Any apps that offer "editing" either just impose delays, or easy ways to delete-and-tweet-again in one step, no?


Congratulations! I tried a similar approach based on the original attempt at this challenge (https://www.spinellis.gr/blog/20090805/), but gave up after running it for a few hours.

Very cool that you were able to achieve this. I'd thought it was impossible while using the original method given how many more tweets there are now then in 2009.


A bit annoying that Twitter's quote tweet UI sort of breaks. Maybe two mutually recursive tweets would show up better :)



Specifically, it seems like it only shows if the ID of the quoted tweet is stricted lower than that of the tweet itself, which basically solves any sort of recursion and acyclic issue, as well as trying to quote non-existing future tweets.


wow, i did this in 2015 https://twitter.com/kcimc/status/619889224909750272 https://gist.github.com/kylemcdonald/da198988061dce54bae6a67... it's great to see all the examples in this thread going all the way back to 2009! i had no clue this was a thing.


Belated congratulations! This is really cool and it's always interesting to see the different approaches people take, so especially valuable to share the code. I'll definitely put this one in the addendum.

The hardcoded ID in your code was added after it was generated just before deleting everything else, right? Otherwise I'm stumped.


Thanks! I think you are correct, after the successful completion I added that statement as a safety before deleting all other previous attempts. Reading this code again it really could have used the data more efficiently to predict the correct ID faster, but if I remember correctly it didn't take very long to run so I never made another attempt.



Interesting, I wondered if the same can be done on hn. Likely easier since the IDs appear to be sequential. But one can simply edit their post to point to the URL too if they wanted, so it defeats the fun of the exercise.


On a related note, from 2012: “Show HN: This up votes itself” https://news.ycombinator.com/item?id=3742902


Unfortunately, I wasn't allowed to keep the karma :(


That surprising to me. I think you should keep it as you did something novel. At the very least as a 'bug bounty'.


In a similar vein, Reddit's "self" posts were originally formed by users submitting posts linking to themselves, by guessing the URL for the next submitted post.


Tom Scott made a video about this last time this happened: https://www.youtube.com/watch?v=zv0kZKC6GAM


Similarly, a Hacker News post from 2012: https://news.ycombinator.com/item?id=3742902


"shell=True"? Sloppy, mate, sloppy. Get in the habit of never doing that lest you find yourself vulnerable to a shell command injection attack one day.


I especially enjoyed the M.C. Escher Drawing Hands background you chose for your Twitter account.


Thank you! It's quite apt isn't it? I'm a really big fan of his and have a print of his Tetrahedral Planetoid on my wall!

Although the self-drawing hands are probably more apt for this even more Escher themed one that I've since been made aware of from the author in the comments here: https://twitter.com/mauritscorneIis/status/12668346972560875...


The quote tweet drag has evolved into the quote tweet airfoil.


It seems like Twitter knows not to show the quoted tweet in this case, but I wonder if it would show it if it were a 2-cycle.


Higher up someone posted an example of that and it seems like no: https://news.ycombinator.com/item?id=25259046

Only one of the two shows, as expected it only shows if the ID of the quoted tweet is smaller than that of the tweet itself.

This also means that in general, you can't quote a non-existing future tweet hoping it'll be something cool in the future. Well you can but the preview won't work.


At first I thought a two-cycle would be incredibly hard. But I suppose you really only have to adjust the timestamp to forward-guess the next ID. You would know the ID of the first one. It would definitely help narrow down the algorithm used for preventing the recursion.


This used to be a pretty cool trick on imageboards to quote future posts. Many lulz on 4chan back in the day


Entertaining read - thanks!


Posted at 4:20 PM :)


Funny you mention that [0] as I'm actually in Ireland so it was 9:20 PM for me. I presume you're on the US East Coast? Being able to do one of these at a specified minute would be impressive though (of course you could just _only_ try during that minute every day and you'd eventually get it).

[0] I had to first check that you were referring to the main tweet and not this post because I actually posted this on HN yesterday to crickets so was happily surprised to find it re-upped today and so undeservedly close to the AlphaFold news.


I actually thought it was intentional also and I think every east-coast US person is going to think the same :)

Just makes your tweet look even more "1337".


Looks like someone is working really hard.

/s


This is nerd humor at its best.


Was able to pull this off on 4chan, kinda: https://boards.4chan.org/b/thread/841284283


Nice. But you didn't guess someone's tweet id, the name in the url is actually not used, these three links actually all point you to the same place because they have the same tweet id:

https://twitter.com/quinetweet/status/1309684114073808896

https://twitter.com/gzhdigital/status/1309684114073808896

https://twitter.com/donkeytron200000/status/1309684114073808...


The author is aware of this and has a whole section in the blog post about it..


My bad! I missed the one sentence where he says that. And I even read it twice... need more caffeine.


There’s 3 paragraphs about it :) (the whole section with the “Shit gets weird” heading is about how only tweet IDs matter).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: