Most are a 4xx code, I checked myself, some may be 301/302 redirect to 4xx not being handled properly by their crawler
Right now it only excludes pages based on the text content: https://github.com/lindylearn/aboutideasnow/blob/main/apps/a...