Hacker News new | past | comments | ask | show | jobs | submit login
Best Practices Are Not Always the Best (bloomca.me)
50 points by bloomca on May 10, 2018 | hide | past | favorite | 58 comments



> There are also tons of evangelists, which promote their technology/process/approach of choice, giving endless introductory talks and blogposts, but without answering complicated questions.

I find some programmers are very adamant that there's always a right way and a wrong way to do something and you just have to examine the pros and the cons. Well, what about when it's between a legacy codebase that no one on the team has even spent half a day looking at and an idea that someone read on a blog post that they never bothered to try? You can't honestly begin to identify pros and cons in a situation like that. But this is how our industry is now. So you kind of have to take claims of "best practices" with a grain of salt.

I don't even really engage in discussions on "best practices" anymore. Not because I'm cynical, but because they just lead nowhere. Write some code that runs, show it to me and we'll talk.


you know what grinds my gears? when I go looking for help on a problem and the answers I get are "don't do that it's not best practices". like not okay here's how you do it but you shouldn't but just flat out I'm not going to tell you. that's the most arrogant/presumptuous and consistent thing I've ever dealt with and it's absolutely unique to software development. It completely discounts the individuals personal experience (and abilities to make the right decision for themselves) and their circumstances (the project itself) and it's always around something dogmatic/a reflection of fanboyism. most recently on #reactjs I wanted to know if there were a way to have locally scoped styles in react like in Vue - I got only condescension about how react is for "bigboys" and hence eschews the practice. Finally someone said hey maybe try styled-components, which was basically close enough and has 16,000 stars on GitHub (so I doubt it's "worst practices"). software devs are some of the most zealous people I've ever met - everything is an opportunity to bike shed and every question or difference of opinion is an opportunity to quarterback someone else's code base.


Best practices are a very wide category, don't parse HTML with regular expressions is the kind of thing you really should just say don't do that. Camelcase is the middle ground where it's a good idea but personal preference may show up yet people get just as dogmatic about it.

IMO, what trips people up is when wisdom says "don't use Oracle products" it's completely accurate, but not that helpful when you inherit a huge mess.


>Best practices are a very wide category, don't parse HTML with regular expressions is the kind of thing you really should just say don't do that.

I call BS.

"Don't have a continuously running service parse arbitrary HTML with regular expressions" would be a bad thing.

Parsing specific, given, HTML files, with known structure to the dev, with regular expressions (e.g. as part of a one-off scrapping script) is totally, absolutely, fine.

If my HTML file is like:

  <html>
  <body>
  <a href="xxxxx">foo</a><br>
  <a href="xxxxx">foo</a><br>
  <a href="xxxxx">foo</a><br>
  <a href="xxxxx">foo</a><br>
  <a href="xxxxx">foo</a><br>
  <a href="xxxxx">foo</a><br>
  <a href="xxxxx">foo</a><br>
  </body>
  </html>
I can parse it just fine with regex.

And even parsing is the wrong word: I can extract the information I want is a better term. I won't be parsing anything in the AST sense.


As someone who has created more hacks like you describe than I care to admit, you actually make the case of precisely why you don't parse HTML with regex.

Because that code works fine for 5 years until it blows up after something upstream changes. From a RoI calculation this might be perfectly fine if you're still around and remember how to fix it - it's a 5 minute fix after all! But if you've left, or if that system became old and crufty and fragile you may have just burned 3 days tracking it down.

If someone is asking this question, it is entirely appropriate to tell them "don't ever do that" - they obviously lack the experience and big picture thinking to know when it is appropriate to deviate due to business reasons. If you leave it at that I agree you're being a dick - whenever shooting someone's idea down you had damn well better have some appropriate alternative options.


"Because that code works fine for 5 years until it blows up after something upstream changes."

I believe the point here is exactly that nothing will change, because the regex in question isn't for a service, it is just for that specific file, right now, today. I've regexed specific HTML files myself too, because even though I am very comfortable with XPath and Beautiful Soup and tree representations in general, regexes are even easier on a static file like that.


When I was young, I was asked to print some address labels, so I wrote a quick super-short BASIC program to parse a specific file, right now, today, and got those labels printed. Made all sorts of assumptions, but it didn't matter, because it was a static file, and nothing could ever change in it.

Three or four years later, I was having lunch with the guy I'd done that for when he got a phone call. Turns out they were using that program of mine to print labels monthly now, and one of the completely-safe assumptions I'd made years earlier had bitten them. Fortunately, he knew how to resolve it easily, but I learned a very important lesson that day.


What lesson was that?

From my perspective, this other guy was using a program outside of its design spec. That can be fine, but such a thing is prone to the problem of safe assumptions suddenly not being safe. You wrote the program for the problem you needed it for, and for which, it worked fine. Unless part of the problem at the time was "we expect to use this program for a couple years", there is no reason to make it overly robust.


"right now, today" and "that specific file"

Code designed to be used once has a habit of sticking around.



There is premature optimization and then there is not being stupid. Using one of the 10,000 XML/HTML parsers to get to v0.01 takes about as long on average and avoid ridiculous problems.

The only thing regex has going for it is you might already know it. But, this is one of those time when learning something is worth it.

PS: XP is doing a depth first search, which can be EXtreamly wasteful.


Was the lesson that there's a fortune to be made in software maintenance?


We're not talking about one-liners used on a single file here (as far as I could tell) - if that's the case, use whatever quickest hack you can think of to get the job done. Use-once coding is entirely different - everyone loves some code golf once in a while, but it doesn't mean you commit that to git :)

For re-usable tooling though, after some time you tend to avoid design patterns that you've personal witnessed break down repeatedly and cause issues.


"We're not talking about one-liners used on a single file here (as far as I could tell)"

Well, we are, because to quote coldtea, the person you directly replied to, "'Don't have a continuously running service parse arbitrary HTML with regular expressions' would be a bad thing. Parsing specific, given, HTML files, with known structure to the dev, with regular expressions (e.g. as part of a one-off scrapping script) is totally, absolutely, fine." In context the first sentence clearly means that running the service like that would be a bad thing (hooray English and it's deep ambiguities in double negatives).

I still wouldn't necessarily be too upset about telling a junior dev to be suspicious of REs or avoid them (for instance, parsing HTML with REs typically requires using non-greedy matches, and there are some subtleties around that), but if you know what you're doing on a quick job it's fine. If you're a senior dev and you still haven't figured out what's likely to entrench itself and what really is a one-off job, well, that's your real problem. I haven't been surprised about what gets entrenched in quite a while.


There are plenty of disposable code situations where the code wont break after 5 years, because it does not exist after two months. Also, failing regexp on likely generated html like that likely wont take 3 days to find. That is ridiculously high estimate even for messy codebase.

You are trying to make estimates and decisions for project you know literally nothing about. That is about worst practice of them all.

In case the question was asked by someone lacking the experience - well then it is absolutely ok to learn regexp by trying to parse info few downloaded files of toy scrapped site. Having newbie fight like this just to learn makes no sense.


> failing regexp on likely generated html like that likely wont take 3 days to find

This is true, but only because you're implying that the reason (failing regexp on HTML input) has already been identified. Maybe the parsing job fails silently and then people down the line (who have no idea that the data was passed around as HTML at some point) wonder why no data is coming in anymore, or why they're seeing incomplete data. That could take weeks to trace back to the parsing problem.


Silent failure seems to be root cause of delay in situation you described. Ast based parsing can be made to swallow errrors too.


And parsing the HTML to an AST won't work either in 5 years, the format of the tree will probably have changed and you will be getting index errors.


Serious question... why would any other approach be exempt from the problems you stated?


Nothing is exempt, some things are far less fragile than others though.


But are also significantly slower and consume hundreds of times more memory, and will still break with the first little change in the DOM tree on the path that you rely on - which smart regexps can sometimes handle. For instance if you're looking just for a title and a price of a single product (common requirement for spiders), you can extract that with regexp without caring about the html page at all, just pick the <h1> and something that looks like a dollar value. I had spiders like that kept working after a full redesign of the site where even the platform was moved from WP to Magento. No DOM parser could handle that, plus they would be too slow in the first place. You need to pick the right tools for the job, everything else is BS...


That is a trap and why people give that advice so freely. Sure, today you just want to parse these files just the once an never again, ever, really I mean it.

Can you determine if it's <html><body>Yes</body></html> Vs <html><body>No</body></html> well sure and even more complex operations also 'work'.

However, if you control what's going on then don't use regular expressions it's wasteful overhead. If you don't control it then you can't tell if it will stay like that, which is the trap. Things that don't change the meaning like swapping the order Attributes which does not change the meaning will often force an update to your regular expression.

PS: It's also why these are called best practices not physical laws.


>If you control what's going on then don't use regular expressions it's wasteful overhead.

Not really, if you control what's going on it's a very fast option -- use awk, sed, or your scripting language's regex lib, and get the values you need. End of story.

Nobody cares if you can micro-optimize it with lower level string parsing that doesn't use a regex engine -- and the regex engine might end up being faster that that anyway.

>If you don't control it then you can't tell if it will stay like that, which is the trap.

No, but I already covered that in my original comment. One can do it to extract values for a specific, known html file, I said.

Besides,

(a) a lot of changes can still be caught by regex -- e.g. the order of attributes don't matter if the regex checks for that specific attribute alone (e.g. a href in an a tag).

(b) other changes will also fuck any non-regex, parser-based lib. E.g. if the nesting changes or an element is moved to another section. It's not like you don't have to update scripts written with e.g. BeautifulSoup or some HTML parser when the document changes.

>It's also why these are called best practices not physical laws.

That's up to those who advocate them to understand: and don't hand them out as if they were physical laws "NEVER PARSE HTML WITH REGEX".


I don't mean CPU overhead.

I mean you are creating a brittle process that's hard to verify it actually worked, unless you have access to the data some other way in which case it's mostly pointless. Regular expressions are not going to tell you if one of the files just happens to have bad data or any number of things that's going to cause problems.


In the case of processing HTML with regex there is a strong theoretical argument against it: a regular expression is a type 3 language while HTML is described by a type 2 grammar. Type 3 being a subset of type 2, regular expressions are not expressives enough to handle the grammar of HTML.

In most cases of best practices however, such a mathematical argument doesn't exist. So, it's more of a battle of (not so) informed technical opinions and experiences, where drawing a clear line is impossible.

Reading list:

https://en.wikipedia.org/wiki/Chomsky_hierarchy

https://stackoverflow.com/a/14207715


>don't parse HTML with a regular expressions

Even if I agree with you (I don't know because I've never been in the situation where that was a thing I was considering and weighing against other options) what I'm arguing here is that if someone asks how to do that then you should either tell them along with the admonishment or not say anything at all. But don't go around being a dick by yelling "haha that's the dumbest thing ever and I'm not going to tell you because of best practices"


While I agree with you in general (i.e. educate people or just stay silent instead of pontificating/be condescending if not rude), what would you say about the canonical answer on StackOverflow, then?

https://stackoverflow.com/questions/1732348/regex-match-open...


meh. the tone isn't so bad but, while at first i thought otherwise, it doesn't really comprehensively explain why it's a bad idea (on first skim i thought most of the content of the answer was an explanation of the difference between a state machine and cfg or something like that). i think a good model for how to answer questions like this is really the same as you'd answer kids' questions as a parent (or maybe anyone's questions?): a flat out no is completely unsatisfactory and lazy. a no with a measured explanation is good.


Again, agreed. But what when something is at the same time pretty much wrong but also something that everyone tries to do because superficially looks like a good idea?

The reaction on StackOverflow was because they were getting dozens (hundreds) more or less identical copies of the same question. It was the same with "parse email via regexp" but I understand that more recent versions of the RFC make this doable.


> you really should just say don't do that

You should also say WHY not to do that.

This lets whoever asked decide whether the reasoning applies to their own unique situation. Perhaps they are doing a one-off scrape of a single file with consistent structure like in the other comment.

The asker can also apply the same reasoning to similar situations. They won't come back to ask if they should use regex to parse JSON tomorrow.

And if you can't explain why something is "best practice", maybe it shouldn't be "best practice".


> you really should just say don't do that

This is counterproductive unless you also say what to do instead.


I agree.. if you ask any kind of question on code optimisation on stackoverflow you’ll be bombarded with “profile and see” responses, or “don’t bother, your compiler can do a better job” or “use an existing library”, but... profiling will only tell me what’s best on my current hardware, not in the general case. It doesn’t help me understand typical performance characteristics of common operating systems or hardware. It also doesn’t help me with coming up with new better approaches that I don’t know about (which is why I’m asking the question in the first place, to learn what I don’t know). The compiler can do a decent job, but it’s not magic and it often pretty easy to beat the compiler or at least helping it (eg by following Mike Actons advice - compiler can’t make cache inefficient code cache efficient often). Finally, regarding using libraries, sure, in production code I probably will, but I want to learn! If nobody learns, who will write tomorrow’s libraries?


A similar frustration I have is when you ask a specific question and get a "Why would you do it that way? Rearchitect your app this way!". That's not reasonable, outside the smallest of applications, regardless of if the underlying advice is sound.


Oh man this brings back memories. I learned golang for/during a performance coding competition. It was extremely frustrating asking questions about unsafe, pointer casting, and other nuts and bolts in golang-nuts and the irc channel.

People were flat out refusing to answer very straight forward questions because "what are you trying to do", "that's not safe", "you shouldn't have to do that".

Other fun times; trying to clear the dirty bit on an ntfs file system, and probing into details only "library writers" should have to worry about.


There was a discussion earlier in life about how Django didn't support prepared statements. I didn't believe this, as no ORM I've ever used didn't use prepared statements by default. I asked around, trying to verify this claim. I got "what are you trying to do?" I'm trying to find out if Django's ORM uses prepared statements by default. Outside of that one ticket in Google search results (I'm not tuned into that ecosystem, so I can't discern if it's reliable proof or not), it seems like nobody really knows.


I work on a lot of less-usual stuff. Kiosks, signage display drivers, interactive games running on touch screen walls, etc.

I quite frequently run into the problem of the answer to something simply being "that's not best practice", "that's insecure" or even "that's user-hostile".

I don't /care/. This is getting installed on a Raspberry Pi and installed inside of a touchscreen coffee table and towed across the country. It doesn't matter that this non-internet-connected device may be vulnerable to other websites accessing its webcam. It doesn't matter that hiding scrollbars and cursors is normally user-hostile.

Go ahead and make the point that it's not best practice, insecure, or user-hostile... But don't make that your entire answer.


I might end up writing react code at any random time in my future, I'm currently curious on how one would do the equivalent of "VueJS component computed properties" in React.


Please consider putting in the effort to capitalise the start of sentences properly, and possibly split up your post into multiple paragraphs.

Whether you like it or not, I and many others instinctively use grammar, spelling and punctuation as a signal about the mind behind the post. Plus, it makes your post easier to read - posts that aren't just a continuous flow of text with no break are easier to read.

My point is - there is an easy way to give your post a greater chance of influencing the thinking more people.


Sounds like the community might be a reason to use Vue over React ;)

In all seriousness this is a problem I've run into before as well. I once reported a bug and the lead developer of the project responded saying he didn't think the bug existed. Setting the hostname in the config to the server's IP address was irreversible even if you change it later -- the site would redirect back to the raw IP and outgoing emails would have the raw IP in them. Many other users were commenting about the issue with no response from the maintainer, and I ended up finding a workaround using one of the "developer options" which was really just a one line ruby config change. Then, the author wrote like a one paragraph response saying that you should never set up the software like that and not to use that workaround because it's a "developer option". This bug went unfixed for about 2 years after that. And this is a relatively common and high profile piece of open source software. The whole thing was just bizarre, especially how positive my experiences with other open source projects (some third party Spring packages and Go Dep in particular) have been.


>Sounds like the community might be a reason to use Vue over React ;)

I've run into these "no I'm not telling you because of best practices" people in every single "community"


I'm sure -- the only times I haven't are when I haven't even had the opportunity to. I was just joking around.


Grrr.

I’ve always designed away the sharp edges. Because I don’t want the 4:00a wake up call from panicky customers.

Which is why all my devs also did rotations in tech supp, QA/test, build monkey, etc.


We’ve iterated on our process quite a bit for my startup. I agree that sometimes you need a completely new approach, but more often than not what already works for another team will most likely work for us with some small tweaks. What I find works pretty well is to think about the problem, apply our existing knowledge and experiences to the solution, make any obvious edits for our unique, then test it out. From there, we make tweaks if needed to get to what works, or we completely scrap what we know and try something completely new. Process, like product, requires iteration. But process is also built off conventions and modeled off core human behaviors. Chances are the wheel doesn’t need to be completely reinvented. My advice would be to fork what works on the process side, and use all your creative energy to come up with something new on the product side.


I suggest that you work on making your statements more punctual. For example, if I'm not missing anything, all of this could be shortened to:

> We apply best practices most of the time to save time, but we usually tweak them for our concrete situation. Be sure to spend most of your energy on your core competence instead of bikeshedding over processes.


At least you didn't plug a link this time!


Evertime I hear someone defend something as a best practice without any other justification, I mentally switch “best practice” with “cargo culting” and lose no information.

Sometimes the best practice in question does have value and can be articulated. But in those cases the articulated reason makes a better argument and you don’t hear the phrase “best practice” quite as often. When it is just cargo culting, the only defense is to repeat “best practice” over and over again.


I prefer to think of the term as "sensible default". A "best practice" is usually not the worst default to have, and if you don't have the time or bandwidth to dive into a problem and just want to pick a solution from a hat, you could do a lot worse.

The danger comes in that when you want to supplant the best practice with a new practice, you have to genuinely understand the problem in your context, and be able to articulate a full explanation about why the new practice has an advantage.

But there is definitely cargo culting around best practices - see using Redux. Core features of your app shouldn't be left to just acceptable defaults, but instead you should grapple with those problems and choose the best possible answer you can at the time.


>The danger comes in that when you want to supplant the best practice with a new practice, you have to genuinely understand the problem in your context, and be able to articulate a full explanation about why the new practice has an advantage.

We're experiencing that exact problem - we're introducing for the first time a company wide best practice, which isn't even defined yet and in a constant state of flux, but it's become a shield the strongest evangelists stand behind "why would you want to do that? It's not best practice?" - the "best practice" might not last the month, and it's not clear why that's the "best practice", but it's become a magic seal of approval.


Two years into my programming career, there's still almost nothing I do that isn't just implementing the best way someone else thought to do it. And I still have a lot more to learn before I would start off on my own. My only problem with best practices is that sometimes there are too many and living up to them all would take all your time. And some of them are just evangelists for niche ideas and it's hard to tell.


"Don't use Goto," "Don't use Regex against HTML." Worse than not following best practices is accepting it without demur. The insight dawned upon when I learned how Chromium's team doesn't use continuous branching. The codebase moves so fast that committing to a single branch is a much better option [1]. I highly doubt Google's engineers made that choice out of sloppiness.

Although, I think it's not simple to foresee long-term implications of our decisions. And, best practices do fill that hole. At the same time, programmers should keep an open mind to use a solution if it's a radically simpler choice.

[1]: https://medium.com/@aboodman/in-march-2011-i-drafted-an-arti...


This wikipedia page explains the nuance of best practice: https://en.wikipedia.org/wiki/Best_practice

Not following best practices is more likely to result in inefficiency. But the key word here is "practice".

Practice dancing a lot and you will be able to dance well, in the form of dancing you practiced. There are other forms of dance and thus other forms of practice. Practice for lindy hop is going to differ from practice for tango, or jazz. And the best practice for lindy will result in the best lindy dancing, as it has been developed over time and has the best outcomes.

But if for some reason you need to do lindy hop in, say, a really cramped space, the practice will have to change. Best practice doesn't mean only practice.


One benefit of best practices is standardizing the way things are done. I have worked on project where the original developer had invented a "thin models fat controllers" philosophy for Django (best practices are fat models thin controllers). What a mess that was.....


When they moved us, programmers and related staff, to "cubettes" -- not even full cubes, but rather a corner in a shared three-sided "pen" with low walls. Your neighbor's shoulder three to five feet away from you. Cube meetings crowding into their half of the pen. Conversations shouted willy-nilly across the open floor plan.

They called that the "best practice".

"Best" is in the eyes of whoever's calling it a "best practice".


Being a tad pedantic and agree with the spirit of the article, but this just seems like a result from the overuse of the phrase 'best practices'.


Exactly. "Best" practices don't always turn out to be the best, after all. "Better" practice might be a better (heh) fit.

It's often more useful to look at anti-patterns and avoid those (still with a grain of salt) than it is to go looking for a pre-existing "best" solution to your specific problem.


Don't get me started on the other best, "Best of Breed". Nothing in software or programming is a binary system, best and dreadful.


Best practices = minimally acceptable

If everyone is doing it. how can it possible make you the best?


b2b business is based upon best practices though, you cannot have a startup mentality (hack or disrupt) when its legal consequences would outrisk the possible gain




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: