Hacker News new | past | comments | ask | show | jobs | submit login

  ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
I found the w3 specs (rfc3987) suitable for my needs ~ http://www.ietf.org/rfc/rfc3987.txt a nice Regex to parse Url formats. This Regex allows you to extract scheme ($2), authority ($4), path ($5), query ($7) and fragment ($9) ~ http://www.flickr.com/photos/bootload/238916518/

There are problems I've seen with using Regex strings and expecting them to work in all cases on all Regex engines which is why I tend to stick with PCRE ~ http://en.wikipedia.org/wiki/Perl_Compatible_Regular_Express... a point in favour of the Gruber example.

"... The pattern is also liberal about Unicode glyphs within the URL ..."

PCRE supports Unicode but it's not switched on by default ~ http://www.pcre.org/pcre.txt




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: