Hacker News new | past | comments | ask | show | jobs | submit login

Since I'm slacking off, here's a straightforward, not at all optimized Go implementation:

  package main
  
  import (
      "bytes"
      "crypto/sha256"
      "encoding/hex"
      "fmt"
  )
  
  var (
      _chars = []byte("0123456789abcdef")
      _names = []string{"zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "a", "b", "c", "d", "e", "f"}
      _size  = len(_chars)
  )
  
  func main() {
      hexsum := make([]byte, 64)
      for i1 := 0; i1 < _size; i1++ {
          for i2 := 0; i2 < _size; i2++ {
              for i3 := 0; i3 < _size; i3++ {
                  for i4 := 0; i4 < _size; i4++ {
                      for i5 := 0; i5 < _size; i5++ {
                          for i6 := 0; i6 < _size; i6++ {
                              for i7 := 0; i7 < _size; i7++ {
                                  s := fmt.Sprintf("The SHA256 for this sentence begins with: %s, %s, %s, %s, %s, %s and %s.", _names[i1], _names[i2], _names[i3], _names[i4], _names[i5], _names[i6], _names[i7])
                                  sum := sha256.Sum256([]byte(s))
                                  hex.Encode(hexsum, sum[:])
                                  prefix := []byte{_chars[i1], _chars[i2], _chars[i3], _chars[i4], _chars[i5], _chars[i6], _chars[i7]}
                                  if bytes.HasPrefix(hexsum, prefix) {
                                      fmt.Printf("%s\n", s)
                                      fmt.Printf("%s\n", hexsum)
                                  }
                              }
                          }
                      }
                  }
              }
          }
      }
  }
Takes a few minutes on common consumer hardware. There's exactly one hit.

(Easiest optimization is wrapping the loop body in a goroutine. GOEXPERIMENT=loopvar really makes this nicer btw.)

The reply is obviously more interesting, need to come up with a lot of variations.




Wrote up something in python that achieves similar, but for increasing numbers of characters. Found a few hits fairly quickly. pypy3 is pushing 420k evaluations a second on my laptop.

e.g.

"The SHA256 for this sentence begins with: five, two, and d"

"The SHA256 for this sentence begins with: seven, e, and nine"

"The SHA256 for this sentence begins with: one, five, e, and three"

My brain fart this morning was forgetting to account for the newline, so I started out spitting out bad options.


You can wrap tqdm around the "permutations(TOKENS, k)" if you want to measure progress. I haven't spent time trying to make this particularly optimised, for example that dict lookup is likely avoidable with a little bit of work via index lookup, and maybe cheaper. I've also not attempted to parallelise it, which would be fairly easy to do.

  #!/usr/bin/env python
  
  import string
  import hashlib
  from itertools import permutations
  
  WORD_DIGIT = {
          "one":1,
          "two":2,
          "three":3,
          "four":4,
          "five":5,
          "six":6,
          "seven":7,
          "eight":8,
          "nine":9}
  TOKENS = [
          "one",
          "two",
          "three",
          "four",
          "five",
          "six",
          "seven",
          "eight",
          "nine",
          ] + list(string.ascii_lowercase)
  
  SEPARATOR = ", "
  
  STARTING_TEXT = "The SHA256 for this sentence begins with: "
  
  for k in range(2, 6):
      for perm in permutations(TOKENS, k):
          sha_start = ""
          for char in perm:
              if char in WORD_DIGIT:
                  sha_start += str(WORD_DIGIT[char])
              else:
                  sha_start += char
          test_string = STARTING_TEXT + SEPARATOR.join(perm[:-1]) + ", and " + ''.join(perm[-1:]) + "\n"
          checksum = hashlib.new("sha256")
          checksum.update(test_string.encode())
          if checksum.hexdigest().startswith(sha_start):
              print(test_string)


This is an interesting starting point for a python implementation, but it doesn't work correctly as far as I can tell. The list of tokens leaves out zero and includes a bunch of ascii characters that aren't valid hex. That second problem won't produce wrong results, but it'll spend a lot of time testing combinations that can't possibly appear in a SHA256 hex output. I also think the original tweet calculated without a newline.

More significantly, permutation in python is probably not the right tool for this. I believe it will not repeat token values at different positions, so for example you'll never end up testing "The SHA256 for this sentence begins with: f, f, and f", which is certainly a potentially valid result.


> The list of tokens leaves out zero and includes a bunch of ascii characters that aren't valid hex

I rewrote it in rust last night and realised that, yeah. Lots of wasted time on irrelevant characters.

>permutation in python is probably not the right tool for this.

Correct, missed that. I need `itertools.product(<iterable>, repeat=2)`


Interestingly, cPython is twice as fast as PyPy for me.


Obvious variations include:

"SHA256" vs "SHA-256"

"begins" vs "starts"

":" vs ""

"%s," vs "%s"

"%s and" vs "%s, and"

"." vs ".\n"


Yeah, I'm surprised how simple it is to find these.

The SHA256 for this sentence begins with: 1c0510f.

The SHA256 for this sentence begins with: 4, 5, b, a, a, 5 and f.


another pretty easy optimization is swapping to https://github.com/minio/sha256-simd, particularly if you're on a process with the sha extensions.

versus just spawning a goroutine per attempt, it'll likely be much faster as well to just split the search space into number of cores and have one chugging away on each chunk of the search space. (I have a thing to do vanity git commit hashes and that's what I do for that; for 7 characters in the hash, it takes well under a second on my CPU on average)


While we're there, since the message has a fixed prefix, you can initiate the sha256 computation with it, and then repeatedly branch off of that for the test suffix.


Oh I didn’t mean per attempt. If you have <= 16 cores, just wrap the first level loop body in a goroutine; if you have >16 cores, wrap the second level to spawn 256 goroutines. Very little code change, big speedup.


> There's exactly one hit

There are N possible sequences, and you try N times with a success probability of 1/N each (because it is a good hash function). This means the expected number of hits is 1.


The probability is 1/e


The probability of missing is 1/e


You can increase the odds by adding various suffixes, e.g "what are the odds" or "imagine that" or "can you beat that" or "can you do better" ....


Why would that increase the odds? It’s the same chance on every iteration


I think they mean "increase your possible search space?"


you effectively have multiple chances to try to get a given target hash (prefix)


Trying different prefixes doesn't take less time than trying different sequences. Unless you're hellbent on finding a match for a particular sequence.

I'd settle on a fixed prefix instead and try longer sequences to make it look more impressive.


Every time you lengthen the sequence you increase the "luck" requirement.


I like little puzzles so I gave it a shot in Ruby.

  #!/usr/bin/ruby
  
  require 'digest'
  
  charmap = %w{zero one two three four five six seven eight nine a b c d e f}
  
  (0..0xf).map do |prefix|
    Process.fork do
      (0..0xffffff).each do |i|
        hex = "%01x%06x" % [prefix, i]  # "0123456"
        digit_words = hex.chars.map { |c| charmap[c.to_i(16)] }  # ["zero", "one", ...]
        sentence = "The SHA256 for this sentence begins with: %s, %s, %s, %s, %s, %s and %s." % digit_words
        sha = Digest::SHA256.hexdigest(sentence)
        if sha[0,7] == hex
          puts(sentence)
        end
      end
    end
  end
  
  Process.waitall
8 minutes on an old quad-core. I don't expect there's much to optimize since all the heavy lifting is in the libraries. I didn't bother with synchronization so you might get a scrambled result if it somehow finds a flurry of hits. :)


What is the reply? Following the link leads to the tweet with no context.


> Was just verifying your tweet's hash, and then...omg!!! I couldn't believe what I realised. The SHA256 of THIS tweet starts with exactly the same 7 characters as your tweet's hash. What are the chances of that?


This means there's quite a lot of solutions. Running the entire 2^56 would take over 770 core-days if one core can achieve an impossible 1 giga-iterations per second.


It's 2^28, not 2^56. A hex digit is 4 bits, and we have 7 such digits to match.


Ugh, you are right. That was a bit embarrassing :)


https://go.dev/play/p/6Lg9L2up5mI

Here is a slightly different version I wrote that has 0 allocations per loop.

Took about 26 seconds.

I'm welcome to suggestions on how to make it more readable! Its been a few years since I have written Go professionally


not that difficult either. i wrote a script in 10 minutes and had a result in under a minute: https://twitter.com/schuilr/status/1701268438931460311




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: