The very post we're commenting on shows that that's not true: PHP, Python, Java and .NET (C#) share one behavior (accept "\n" as "$"), and ECMAScript (Javascript), Golang, and Rust share another behavior (do not accept "\n" as $).
Let's not argue about which is “the most common”; all of these languages are sufficiently common to say that there is no single common behavior.
> $ matches at the end of the string or before the last character if that is a newline, which is logically the same as the end of a single line.
Yes, that is Python's behavior (and PHP's, Java's, etc.). You're just describing it; not motivating why it has to work that way or why it's more correct than the obvious alternative of only matching the end of the string.
Subjectively, I find it odd that /^cat$/ matches not just the obvious string "cat" but also the string "cat\n". And I think historically, it didn't. I tried several common tools that predate Python:
- awk 'BEGIN { print ("cat\n" ~ /^cat$/) }' prints 0
- in GNU ed, /^M/ does not match any lines
- in vim, /^M/ does not match any lines
- sed -n '/\n/p' does not print any lines
- grep -P '\n' does not match any lines
- (I wanted to try `grep -E` too but I don't know how to escape a newline)
- perl -e 'print ("cat\n" =~ /^cat$/)' prints 1
So the consensus seems to be that the classic UNIX line-based tools match the regex against the line excluding the newline terminator (which makes sense since it isn't part of the content of that line) and therefore $ only needs to match the end of the string.
The odd one out is Perl: it seems to have introduced the idea that $ can match a newline at the end of the string, probably for similar reasons as Python. All of this suggests to me that allowing $ to match both "\n" and "" at the end of the string was a hack designed to make it easier to deal with strings without control characters and string that end with a single newline.
So the consensus seems to be that the classic UNIX line-based tools match the regex against the line excluding the newline terminator (which makes sense since it isn't part of the content of that line) and therefore $ only needs to match the end of the string.
If you read a line, you usually remove the newline at the end but you could also keep it as Python does. If you remove the newline, then a line can never contain a newline, the case cat\n can never occur. If you keep the newline, there will be exactly one newline as the last character and you arguably want cat$ to match cat\n because that newline is the end of the line but not part of the content. It makes perfect sense that $ matches at the end of the string or before a newline as the last character as it will do the right thing whether or not you strip the newline.
If you want cat$ to not match cat\n, then you are obviously not dealing with lines, you have a string with a newline at the end but you consider this newline part of the content instead of terminating the line. But ^ and $ are made for lines, so they do not work as expected. I also get what people are complaining about, if you are not in multi-line and have a proper line with at most one newline at the end, then it will behave exactly as if you are in multi-line which raises the question why you would have those two modes to begin with. Not multi-line only behaves differently if you have additional newlines or one newline not at the end, that is if you do not have a proper line, so why should $ still behave as if you were dealing with a line?
The very post we're commenting on shows that that's not true: PHP, Python, Java and .NET (C#) share one behavior (accept "\n" as "$"), and ECMAScript (Javascript), Golang, and Rust share another behavior (do not accept "\n" as $).
Let's not argue about which is “the most common”; all of these languages are sufficiently common to say that there is no single common behavior.
> $ matches at the end of the string or before the last character if that is a newline, which is logically the same as the end of a single line.
Yes, that is Python's behavior (and PHP's, Java's, etc.). You're just describing it; not motivating why it has to work that way or why it's more correct than the obvious alternative of only matching the end of the string.
Subjectively, I find it odd that /^cat$/ matches not just the obvious string "cat" but also the string "cat\n". And I think historically, it didn't. I tried several common tools that predate Python:
So the consensus seems to be that the classic UNIX line-based tools match the regex against the line excluding the newline terminator (which makes sense since it isn't part of the content of that line) and therefore $ only needs to match the end of the string.The odd one out is Perl: it seems to have introduced the idea that $ can match a newline at the end of the string, probably for similar reasons as Python. All of this suggests to me that allowing $ to match both "\n" and "" at the end of the string was a hack designed to make it easier to deal with strings without control characters and string that end with a single newline.