Wrote up something in python that achieves similar, but for increasing numbers of characters. Found a few hits fairly quickly. pypy3 is pushing 420k evaluations a second on my laptop.
e.g.
"The SHA256 for this sentence begins with: five, two, and d"
"The SHA256 for this sentence begins with: seven, e, and nine"
"The SHA256 for this sentence begins with: one, five, e, and three"
My brain fart this morning was forgetting to account for the newline, so I started out spitting out bad options.
You can wrap tqdm around the "permutations(TOKENS, k)" if you want to measure progress. I haven't spent time trying to make this particularly optimised, for example that dict lookup is likely avoidable with a little bit of work via index lookup, and maybe cheaper. I've also not attempted to parallelise it, which would be fairly easy to do.
#!/usr/bin/env python
import string
import hashlib
from itertools import permutations
WORD_DIGIT = {
"one":1,
"two":2,
"three":3,
"four":4,
"five":5,
"six":6,
"seven":7,
"eight":8,
"nine":9}
TOKENS = [
"one",
"two",
"three",
"four",
"five",
"six",
"seven",
"eight",
"nine",
] + list(string.ascii_lowercase)
SEPARATOR = ", "
STARTING_TEXT = "The SHA256 for this sentence begins with: "
for k in range(2, 6):
for perm in permutations(TOKENS, k):
sha_start = ""
for char in perm:
if char in WORD_DIGIT:
sha_start += str(WORD_DIGIT[char])
else:
sha_start += char
test_string = STARTING_TEXT + SEPARATOR.join(perm[:-1]) + ", and " + ''.join(perm[-1:]) + "\n"
checksum = hashlib.new("sha256")
checksum.update(test_string.encode())
if checksum.hexdigest().startswith(sha_start):
print(test_string)
This is an interesting starting point for a python implementation, but it doesn't work correctly as far as I can tell. The list of tokens leaves out zero and includes a bunch of ascii characters that aren't valid hex. That second problem won't produce wrong results, but it'll spend a lot of time testing combinations that can't possibly appear in a SHA256 hex output. I also think the original tweet calculated without a newline.
More significantly, permutation in python is probably not the right tool for this. I believe it will not repeat token values at different positions, so for example you'll never end up testing "The SHA256 for this sentence begins with: f, f, and f", which is certainly a potentially valid result.
another pretty easy optimization is swapping to https://github.com/minio/sha256-simd, particularly if you're on a process with the sha extensions.
versus just spawning a goroutine per attempt, it'll likely be much faster as well to just split the search space into number of cores and have one chugging away on each chunk of the search space. (I have a thing to do vanity git commit hashes and that's what I do for that; for 7 characters in the hash, it takes well under a second on my CPU on average)
While we're there, since the message has a fixed prefix, you can initiate the sha256 computation with it, and then repeatedly branch off of that for the test suffix.
Oh I didn’t mean per attempt. If you have <= 16 cores, just wrap the first level loop body in a goroutine; if you have >16 cores, wrap the second level to spawn 256 goroutines. Very little code change, big speedup.
There are N possible sequences, and you try N times with a success probability of 1/N each (because it is a good hash function). This means the expected number of hits is 1.
I like little puzzles so I gave it a shot in Ruby.
#!/usr/bin/ruby
require 'digest'
charmap = %w{zero one two three four five six seven eight nine a b c d e f}
(0..0xf).map do |prefix|
Process.fork do
(0..0xffffff).each do |i|
hex = "%01x%06x" % [prefix, i] # "0123456"
digit_words = hex.chars.map { |c| charmap[c.to_i(16)] } # ["zero", "one", ...]
sentence = "The SHA256 for this sentence begins with: %s, %s, %s, %s, %s, %s and %s." % digit_words
sha = Digest::SHA256.hexdigest(sentence)
if sha[0,7] == hex
puts(sentence)
end
end
end
end
Process.waitall
8 minutes on an old quad-core. I don't expect there's much to optimize since all the heavy lifting is in the libraries. I didn't bother with synchronization so you might get a scrambled result if it somehow finds a flurry of hits. :)
> Was just verifying your tweet's hash, and then...omg!!! I couldn't believe what I realised. The SHA256 of THIS tweet starts with exactly the same 7 characters as your tweet's hash. What are the chances of that?
This means there's quite a lot of solutions. Running the entire 2^56 would take over 770 core-days if one core can achieve an impossible 1 giga-iterations per second.
(Easiest optimization is wrapping the loop body in a goroutine. GOEXPERIMENT=loopvar really makes this nicer btw.)
The reply is obviously more interesting, need to come up with a lot of variations.