I wrote these years ago. They're damn handy. It's true that they're not implemented in Bash (that would be nuts), but having them on hand lets me do much more on the command line than would otherwise be possible.
~/bin/union
===========
#! /usr/bin/awk -f
!acc[$0]++
~/bin/intersection
==================
#! /usr/bin/awk -f
!buf[$0]++ {acc[$0] += 1}
ENDFILE {
delete buf;
files++
}
END {
for (k in acc) if (acc[k] == files) print k
}
~/bin/set-diff
==============
#! /usr/bin/awk -f
! filenum { acc[$0] = 1 }
filenum { delete acc[$0] }
ENDFILE { filenum++ }
END {
for (k in acc) print k
}
In bash? How would you even implement a set in bash without just doing linear greps? Or did someone add sets to bash 20 years ago and I never got the memo?
You could use the 'look' command, which does a binary search.
It's basically meant to look up spellings in /usr/share/dict/words, but it can work on any file. It will match any line that your pattern is a prefix of, so you'd have to add logic to eliminate longer matches.
But if you had some huge file to search and you wanted to do it from a shell script, that would be one way. Caveat: although it's fairly standard, 'look' might not be installed on every system.
Also, you have to be sure to maintain your file in sorted order. So no adding things by appending to the end, and checking if something is in the set is much quicker at the expense of adding things being much slower.
Been looking for one a while a go and it feels like it’s something sound be built in.