Wrote up something in python that achieves similar, but for increasing numbers o...

Twirrim · on Sept 11, 2023

You can wrap tqdm around the "permutations(TOKENS, k)" if you want to measure progress. I haven't spent time trying to make this particularly optimised, for example that dict lookup is likely avoidable with a little bit of work via index lookup, and maybe cheaper. I've also not attempted to parallelise it, which would be fairly easy to do.

  #!/usr/bin/env python
  
  import string
  import hashlib
  from itertools import permutations
  
  WORD_DIGIT = {
          "one":1,
          "two":2,
          "three":3,
          "four":4,
          "five":5,
          "six":6,
          "seven":7,
          "eight":8,
          "nine":9}
  TOKENS = [
          "one",
          "two",
          "three",
          "four",
          "five",
          "six",
          "seven",
          "eight",
          "nine",
          ] + list(string.ascii_lowercase)
  
  SEPARATOR = ", "
  
  STARTING_TEXT = "The SHA256 for this sentence begins with: "
  
  for k in range(2, 6):
      for perm in permutations(TOKENS, k):
          sha_start = ""
          for char in perm:
              if char in WORD_DIGIT:
                  sha_start += str(WORD_DIGIT[char])
              else:
                  sha_start += char
          test_string = STARTING_TEXT + SEPARATOR.join(perm[:-1]) + ", and " + ''.join(perm[-1:]) + "\n"
          checksum = hashlib.new("sha256")
          checksum.update(test_string.encode())
          if checksum.hexdigest().startswith(sha_start):
              print(test_string)

rainsford · on Sept 12, 2023

This is an interesting starting point for a python implementation, but it doesn't work correctly as far as I can tell. The list of tokens leaves out zero and includes a bunch of ascii characters that aren't valid hex. That second problem won't produce wrong results, but it'll spend a lot of time testing combinations that can't possibly appear in a SHA256 hex output. I also think the original tweet calculated without a newline.

More significantly, permutation in python is probably not the right tool for this. I believe it will not repeat token values at different positions, so for example you'll never end up testing "The SHA256 for this sentence begins with: f, f, and f", which is certainly a potentially valid result.

Twirrim · on Sept 12, 2023

> The list of tokens leaves out zero and includes a bunch of ascii characters that aren't valid hex

I rewrote it in rust last night and realised that, yeah. Lots of wasted time on irrelevant characters.

>permutation in python is probably not the right tool for this.

Correct, missed that. I need `itertools.product(<iterable>, repeat=2)`

stavros · on Sept 13, 2023

Interestingly, cPython is twice as fast as PyPy for me.