Hacker News new | past | comments | ask | show | jobs | submit login

A while back I copied from somewhere this script that does the job nicely.

  #!/bin/bash
  # Dependencies: tesseract-ocr imagemagick scrot xsel

  IMG=`mktemp`
  trap "rm $IMG*" EXIT

  scrot -s $IMG.png -q 100
  # increase image quality with option -q from default 75 to 100

  mogrify -modulate 100,0 -resize 400% $IMG.png
  #should increase detection rate

  tesseract $IMG.png $IMG &> /dev/null
  cat $IMG.txt | xsel -bi
  notify-send "Text copied" "$(cat $IMG.txt)"

  exit



In the spirit of sharing, cuz I think this is a great script (thank you), I prefer using maim over scrot simply because it has a --nodrag option. Personally feels better when making selections from a trackpad. Click once, move cursor, click again.

    maim -s --nodrag --quality=10 $IMG.png
10 is scrot's 100


Yet another variation I have been using for ages, using ImageMagick's `import` tool (which probably only works on X11)

    import "$tempfile"
    TEXT=`tesseract -l eng+deu "$tempfile" stdout`
    echo "$TEXT" | xsel -i -b


I was using something like this for awhile, but I found tesseract did poorly quite often. That resize trick didn't seem to affect much. I'm not sure what pre-processing would make it better.

I'd love to if TextSnatcher does anything to improve on this. The github page is opaque.


The source is pretty straightforward - it's calling `scrot -s -o` to a temp file, and then `tessaract` with no further preprocessing.

https://github.com/RajSolai/TextSnatcher/blob/master/src/ser...


> I found tesseract did poorly quite often

The script calls Tesseract in default page segmentation mode (PSM 3). [1]

Depending on the input text, PSM mode 11 for disconnected text would probably work much better. That uses the flag "--psm 11".

[1] From the original repo: string tess_command = "tesseract " + file_path + " " + out_path + @" -l $lang" ;


Having used Tesseract for OCR for other things, getting the right PSM helps but it's still rather terrible, especially for sans-serif fonts, which are common in UIs.

Granted there's a lot of ambiguity in sans serif fonts, lower-case "L", vertical bar, and upper-case "i" can even be pixel-identical, but I've seen tesseract turn

  Chapter III
into

  Chapter |l1
which really surprises me. In fact, for books, I run it through sed to replace vertical bar with upper-case "i" and it significantly improved recognition.


I had a PowerShell script which did this as well, but alas, it was lost to time with the rest of my little scripts from my last job.

Apologies to all of my fellow Unix-Windows borderers.


  trap "rm $IMG*" EXIT
see https://www.shellcheck.net/wiki/SC2064

also, use mktemp -d and recursively delete the directory


This is perfect for me! Having a window with a button that I need to click is much worse than just binding a script to a hotkey.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: