> And for anyone at Twitter who was depending on the network of tweets being a Directed Acyclic Graph, I'm so terribly sorry.
I love the idea that there's someone out there with code that resolves retweet chains recursively, who's about to be in for a great head scratcher of a bug.
It’d be a simple check that anything referenced has to have a lower ID (and hence time stamp).
I find this bug more interesting:
> Also, it seems like Twitter doesn't actually care about the username and just resolves URLs based on the tweet ID. I'm sure lots of people already knew that but it's new to me.
They’re not validating the parent directory matches the actual tweet. I wonder if that’s an actual bug or intentional to allow for handle renaming to not break existing links.
That seems incredibly brittle due to making assumptions about the ID format that twitter has no obligations to keep. It just so happens that if your logic had been if(newId > olderId) you would have survived their new format (due to the fact that the timestamp leads the integer) but that'd be a win based on pure luck. The example the engineer came up with was seven years old so it had no way of foreseeing the ID format change.
It's a good point but I would suspect that the timestamp being prefix is no accident. If your IDs were sortable in the past it's probably a good a idea to keep them sortable. And since tweets more often than not refer to tweets in the same time range or are paginated together with tweets in the same time range (roughly) having time as a prefix has other advantages
This bug actually has some security implications because you can make it seem as though an account has tweeted something when really it was another one. A casual observer might not notice the discrepancy between the name before and after the click.
>what's really amazing is that twitter programmers thought about this edge case and made sure the tweet would not display itself
Twitter Engineer
>We didn't think of this edge case. Someone did this about 7 years ago and the recursive hydration would make a tweet service crash by simply loading the tweet in a browser. It took a principal engineer an entire day of wading through heap dumps to figure out what was happening.
TIL: debugging via memory dumps is a Principal Engineer level skill.
Anyone here actually do this? I read about it in Release It and it sounds by far like the closest thing there is to a super power when it comes to solving production incidents. I've never actually seen anyone do it though.
Recently saw a video on this technique from Dotnet Conf. Piqued my curiosity again, and now this. I've really gotta learn this.
The fact the principal engineer was doing this does not mean doing this is a principal engineer skill. There's lots of software engineers who can deal with coredumps which is pretty much the same idea.
I have done it once successfully in 10 years (.NET dev). Would recommend having any other kind of logging or instrumentation in place so you don't have to do it. It's still worth learning WinDbg and sosclr.
In my company, we used to have a plugin for our bug tracker to automatically analyze .NET core dumps with WinDbg (if they were attached to a bug) and extract some useful information. We used to do this relatively often, for a shipped product, not a live service, especially if we found memory leaks.
Would you say something like that is worth to set up?
I noticed EC2 now has an API to get memory dumps. Theoretically you could automate collecting memory dumps when an unhealthy instance is pulled out of a load balancer. Then some automated analysis could happen, and allow further manual analysis.
Not sure how much it cost, but it was definitely helpful - even the fact that it was obvious which team needed to take a look first based on the objects that had leaked often made it worth it.
I remember spending quality time with coredumps and gdb back in 2012/2013, when a prototype supercar dashboard we were building crashed on certain CSS animations.[ß]
The call chain went through GTKWebkit, Wayland and all the way to Pango and Cairo. Getting that part untangled took a long afternoon. Figuring out the root cause was another two full days.
The topmost parts of the stack above could be dealt with breakpoints, but even with pango/cairo libs from a debug build it was painful. The failing function could only be single-stepped, trying to place breakpoints inside it would not work. In the end it was an unhandled divide-by-zero deep inside the rendering library.
WTF? If you already have the infrastructure to coredump, they are without a doubt the most convenient way to debug. A stacktrace does not even begin to compare. It is like limiting yourself to printf-debugging in the presence of gdb.
Actually, it exactly is! Now I'm not sure if you were /s or not.
For the code that implements basic state and invariant checks (ie ships with asserts compiled in), crashes are usually exceedingly rare and limited to one of these checks failing. Debugging them requires a stack trace and, optionally, some context related to the check itself. If the program dumps this info on crash, the fix can typically be made in less time it takes to retrieve/receive the coredump and start looking at it. If it can't be fixed this way, then it's to the coredump we go.
On the other hand if the code is prone to segfaulting on a whim, requiring dissecting its state to trace the cause down, then, yeah, it's a coredump case too. But a code like that shouldn't be running in production to begin with.
Sure, if by miraculous chance you happen to have printf'd exactly the state you required to figure out the assert/crash, "you don't need gdb". You could also find -- by divine inspiration -- what went wrong just by looking at the line number where the assert failed. But it's still WTF-y to argue that therefore, an actual {,post-mortem} debugger is "a last resort tool".
I did this in my second year as a professional coder and it took me a while (a week? a week and a half?) to understand what to do and what I was seeing. I would prefer never to have to do it again.
I actually just came across this account recently and was tempted to make some sort of reply bot on whether they were prime or not. I was also trying to find some of the "funnier" numbers to see if they had a disproportionate number of likes (certainly not to like them myself...) but gave up after realising they had almost 50k tweets.
So this brings up a question I've long had, but didn't want the distraction of researching: Is there an easy way to get to some point in a user's timeline? e.g. first tweet, or November 1, 2020?
Recursion isn't the problem. Not keeping track of seen tweets is the problem. Recursion can be used to detect cycles and traverse a cyclic graph in a way that doesn't blow up.
It’s a lot easier to just have a depth limit on such non-cyclic graphs than keep an in-memory list of previously seen nodes. its interesting for sure! but a much rarer edge-case imo
When doing recursion (Postgres recursive CTE) I keep a path on the latest edges and check to see new edges aren't already visited, so same nodes can appear in multiple branches but not on the same branch. Works flawlessly.
Irritation would probably better in terms of not blowing up your memory requirements. Just keep hashes of all visited nodes and a stack or queue of to-be-visited nodes and loop until you have no more to-be-visited nodes.
* Find out what the tweet id is of a recent tweet
* Find out what a tweet is shortly after
* Estimate the rate of new tweets appearing
* Publish a tweet with a reference to a tweet with a now+guess id
The write up is well done and interesting but a little long winded.
Thank you! The reason for experimenting with a sort of blow-by-blow report style here and detailing the full and true journey of the idea was that I feel the whole process from idea to artifact is not written about enough.
That said, it can often be hard to fully pinpoint the actual inception of an idea and just thinking about this one yesterday I realised that the idea likely came up while traversing a long chain of quote tweets. I really appreciate the feedback though—it's especially useful as I'm only starting to publish regularly now so have lots to learn—so thanks again.
There seem to a be a couple of "camps" in HN. Broadly, those who enjoy reading, and those who don't, but endure it as a means to obtain information.
I'm in the former camp, and enjoyed your writeup quite a bit.
The latter camp will reliably show up when writing is any more long-winded than it absolutely must be, and will a) summarize (good) and b) complain (annoying).
It is what it is. Congrats on your whimsical creation!
Maybe a constructive message would be 'please post a summary at the top and a stronger conclusion at the bottom' since we're conditioned this way by various written structures.
I would argue that @OisinMoran followed this exact structure, the first section (above the fold) has this:
> Fundamentally the challenge is just correctly guessing what ID a given tweet is going to get, then appending that onto the URL for our profile and tweeting it.
And of course, the strong conclusion at the bottom is the Tweet itself.
I loved the write up, especially those basic things all experienced programmers know but sometimes makes newer ones scared, like sometimes you don't quite know how the answer works exactly at first, or sometimes you have a big idea and a lack of double quotes is your problem, etc etc
Thank you! Yes, I think you put what I was trying to capture here really well. When I first attempted to learn programming I got stuck on how to _actually_ run the code and eventually gave up but then started in an actual class in college. So being painfully aware of the gaps that can be left in explanations I try to avoid creating more of them.
The ultimate solution that will probably happen at some point is a kind of collapsable/expandable explanation that takes your knowledge into account and can omit the bits you are overly familiar with. One of the difficulties with this would be adoption from the creators so it needs to not have too much overhead—writing is already difficult enough.
The write-up was great, actually. The whole point of sharing the article is to read about the process and the technical details, I don't know why people insist on these cheap TL;DRs.
I think enough people want that kind of TL;DR that it's reasonable to put one right up front. I see a similar sort of thing with DIY projects, where they tend to get better attention if the first picture is the finished product, and then it works you through the process. It's partly to placate people who only want to see the finished product, but it also gives enough of a hook to everyone else that I suspect some people will read the article who otherwise wouldn't.
As I said, the write up was quite interesting, so don’t take it the wrong way! A long read - including the journey taken to get there - can be educational itself (including wrong turns, proof of concepts etc).
The only reason I posted the tweet was because the lede was well and truly buried at the end of the article, and you may have lost some readers’ interest by the time they get there.
You could have started with something like “This is how I managed to post a self referential tweet [link]” which would both bring it up front and centre, as a means to entice readers to find out more.
Sort of like how some TV episodes begin with “[record scratch] You may wonder how I ended up here...” – deadpool is a movie that begins in that way, for example.
For really long posts you might want to have a kind of table of contents that sets up the structure which you can link to parts.
Anyway, don’t take my abbreviated summary as negative; I enjoyed it.
Yes unfortunately Twitter's UI doesn't seem to handle recursive quote tweets, which was the main disappointment for me. I'm puzzled at those saying it does work as all I've ever gotten for the _quoted_ tweet was:
> This Tweet is unavailable.
But perhaps they are just talking about the actual tweet.
Anyone at Twitter here that can give an indication of where this is on the roadmap?
What you're seeing is an available tweet with an embedded tweet that is kept from being displayed. Because the embedded tweet is itself, it is handled by showing the same UI as any embedded tweet that is not available.
"tweet unavailable" is a different error from the "This is not available to you" message. Got that message earlier today when I tried to open a different linked tweet. Changing to nitter fixed it for me.
This is actually much cooler in my opinion! I love your progress indicator too. How did you go about doing this one? And what are the four parts you are referring to?
AFAIK the tweet id has four moving parts, a datacenter id, a worker id, a sequence number, and a timestamp (original poster mentions three).
3-cycle is definitely doable, it may take a week posting 100 tweets per hour or something like that. The biggest problem is twitter blocking the account (even when you stay under the API rate limit).
I implemented a .NET port of Snowflake where I work--being able to work with data without ever having to worry about an identity column is extremely freeing.
The Data Center ID has been in Snowflake IDs for a while. It's present in the original version of Snowflake that Twitter open sourced back in 2010 [0]. Looks like it was added in this commit [1] from August 2010.
Oh wow—I'm even less original than I thought! They even had the same name!
Thanks for these omoikane and bazzargh (and the others both here and on Twitter), I'll have to add an addendum to the blog post with links to all the other fun related examples.
I don't know of a better way to do this other than some analysis of the IDs + clever bruteforce. If I remember correctly, I used just over 500 tweets to do it.
This is excellent, hats off to you! I love your creative lead-in as well, mine is bland in comparison but I'm somewhat glad to be off the hook for destroying the DAG as all these prior examples are showing.
I'm pretty impressed that it took less than 1000 attempts. It would still be feasible even it were several orders of magnitude harder. This twitter project I made (predicting celebrity deaths) took about 250k tweets over the course of a few months: https://twitter.com/ghastly_omens
This is great! That's not 250k tweets per death, right? Did you do the classic make a lot of predictions then delete the wrong ones? How did you select the celebrities?
Would be interesting to do something with actuarial tables here.
Yes, delete the misses. 250k total. About 700 celebrities. I was predicted the day of death too, so each required 365 tweets. I did a combination of scraping IMBD + manual selection plus filling in the details. The whole process could probably be automated and I regret not doing that. It would be wild to see this in 2020...
I did consult actuarial tables mainly to decide how many of each age to include. With that few candidates, it wasn't wise to focus on people in their 20s and 30s obviously because it's very likely all of them would have survived.
This reminds me of when we were counting collisions in tweet IDs years ago at my old job (social media agency). We used the collision rate to estimate the total volume of tweets going through the system.
We also determined ID assignment was determined by three servers in a round robin load balancer and load was distributed based on modding of a 32 bit integer, so two servers were getting more load than the other since you can’t evenly divide a 32 bit integer by 3. They fixed that bug after a couple months of observation. I forget if we let Twitter know or not.
>Fundamentally the challenge is just correctly guessing what ID a given tweet is going to get, then appending that onto the URL for our profile and tweeting it.
Reminds me of that program for referring to a future git commit, which operates on the same principle: iterate through guesses of the future commit’s hash prefix until you have a (long enough prefix-)collision.
Writing quines is one of my favorite things to do in a new language. That it is always possible is a consequence of the recursion theorem[1], which is, in my opinion, one of the coolest results in basic computing science. I personally find it far more interesting than the halting problem.
> I personally find it far more interesting than the halting problem.
I agree.
Another cool theorem related to the halting problem is the Rice theorem, because it's a really powerful thing to say that every semantic property on programs is either 1. always true 2. always false, or 3. undecidable.
I think I'm missing a tweet of context. I get that they quoted their own url, like in the top article, but then the account replies "Ugh, busted :(" to someone who has their tweets protected.
Congratulations! I tried a similar approach based on the original attempt at this challenge (https://www.spinellis.gr/blog/20090805/), but gave up after running it for a few hours.
Very cool that you were able to achieve this. I'd thought it was impossible while using the original method given how many more tweets there are now then in 2009.
Specifically, it seems like it only shows if the ID of the quoted tweet is stricted lower than that of the tweet itself, which basically solves any sort of recursion and acyclic issue, as well as trying to quote non-existing future tweets.
Belated congratulations! This is really cool and it's always interesting to see the different approaches people take, so especially valuable to share the code. I'll definitely put this one in the addendum.
The hardcoded ID in your code was added after it was generated just before deleting everything else, right? Otherwise I'm stumped.
Thanks! I think you are correct, after the successful completion I added that statement as a safety before deleting all other previous attempts. Reading this code again it really could have used the data more efficiently to predict the correct ID faster, but if I remember correctly it didn't take very long to run so I never made another attempt.
Interesting, I wondered if the same can be done on hn. Likely easier since the IDs appear to be sequential. But one can simply edit their post to point to the URL too if they wanted, so it defeats the fun of the exercise.
In a similar vein, Reddit's "self" posts were originally formed by users submitting posts linking to themselves, by guessing the URL for the next submitted post.
"shell=True"? Sloppy, mate, sloppy. Get in the habit of never doing that lest you find yourself vulnerable to a shell command injection attack one day.
Only one of the two shows, as expected it only shows if the ID of the quoted tweet is smaller than that of the tweet itself.
This also means that in general, you can't quote a non-existing future tweet hoping it'll be something cool in the future. Well you can but the preview won't work.
At first I thought a two-cycle would be incredibly hard. But I suppose you really only have to adjust the timestamp to forward-guess the next ID. You would know the ID of the first one. It would definitely help narrow down the algorithm used for preventing the recursion.
Funny you mention that [0] as I'm actually in Ireland so it was 9:20 PM for me. I presume you're on the US East Coast? Being able to do one of these at a specified minute would be impressive though (of course you could just _only_ try during that minute every day and you'd eventually get it).
[0] I had to first check that you were referring to the main tweet and not this post because I actually posted this on HN yesterday to crickets so was happily surprised to find it re-upped today and so undeservedly close to the AlphaFold news.
Nice. But you didn't guess someone's tweet id, the name in the url is actually not used, these three links actually all point you to the same place because they have the same tweet id:
I love the idea that there's someone out there with code that resolves retweet chains recursively, who's about to be in for a great head scratcher of a bug.