Seems like software patent processing would benefit by some simple machine learning tools to help examiners identify similar content. Anyone familiar with what tools they currently use? Muddling about with the law will obviously take forever, the guys in congress have no idea what they are legislating, and the lawyers in corporations have no idea what they are lobbying for, aside from monopoly money.
Machine learning on that corpus would be extremely difficult task. This is what engineers don't really get about patent applications. We see the underlying technology or specifications and say, same thing-identical.
IP Lawyers, patent clerks look at the claims and see difference. Claims have two functions. Public notice and kind of a "patent scope" of applicability. It's all very obtuse and would not be very easy I think to attach semantics via ML alone.