Hacker News new | past | comments | ask | show | jobs | submit | desc's comments login

Apropos of nothing, and anecdotally: 2 years seems to be about the length of time it takes for a solution to become the next problem.


Always get out before you have to answer hard questions about what you did six months or a year ago.


Silver bullets. They don't exist.

Code review. Read the results of someone thinking through a process. Spot more than they will, simply by throwing more eyes at it. Actually fairly effective: getting a senior dev to cast even a lazy eye over everything gives more opportunities to discuss Why It's Done This Way and Why We Don't Do That and Why This Framework Sucks And How To Deal With It with specific concrete examples which the other dev is currently thinking about. But it's still easier to write the code yourself than review it, and things still get missed no matter how careful you try to be, so it's still just another layer.

Unit tests. They cover the stuff we think to check and actually encountered in the past (ie. regressions). Great for testing abstractions, not so great for testing features, since the latter typically rely on vast amounts of apparently-unrelated code.

Integration tests. Better for testing features than specific abstractions, and often the simplest ones will dredge up things when you update a library five years later and some subtle behaviour changed. Slow sanity checks fit here.

UI-first automation (inc. Selenium, etc). Code or no-code, it's glitchy as hell for any codebase not originally designed to support it; tends to get thrown out because tests which cry wolf every other day are worse than useless. Careful application to a few basics can smoke-test situations which otherwise pass internal application sanity checks, and systems built from the start to use it can benefit a lot.

Manual testing. Boring, mechanical, but the test plans require less active fiddling/maintenance because a link changed to a button or something. Best for exploratory find-new-edge-cases, but throwing a bunch of students at a huge test plan can sometimes deliver massive value for money/coffee/ramen. Humans can tell us when the instructions are 'slightly off' and carry on regardless, distinguishing the actual important breakage from a trivial 2px layout adjustment or a CSS classname change.

So that's the linear view. Let's go meta, and combine techniques for mutual reinforcement.

Code review benefits from local relevance and is hampered by action at a distance. Write static analysers which enforce relevant semantics sharing a lexical scope, ie. if two things are supposed to happen together ensure that they happen in the same function (at the same level of abstraction). Encourage relevant details to share not just a file diff, but a chunk. Kill dynamic scoping with fire.

Unit and Integration tests can be generated. Given a set of functions or types, ensure that they all fit some specific pattern. This is more powerful than leveraging the type system to enforce that pattern, because when one example needs to diverge you can just add a (commented) exception to the generative test instead of rearchitecting lots of code, ie. you can easily separate sharing behaviour from sharing code. Write tests which cover code not yet written, and force exceptions to a rule to be explicitly listed.

UI testing is rather hard to amplify because you need to reliably control that UI in abstractable ways, and make it easy to combine those operations. I honestly have no idea how to do this in any sane way for any codebase not constructed to enable it. If you're working on greenfield stuff, congratulations; some of us are working on stuff that's been ported forwards decade by decade... Actual practical solutions welcome!

That's my best shot at a 2D (triangular?) view: automated tests can enforce rules which simplify code review, etc. The goal is always to force errors up the page: find them as early as possible as cheaply as possible and as reliably as possible.

The machine can't check complex things without either missing stuff or crying wolf, but it can rigidly enforce simple rules which let humans spot the outliers more easily.

And it is amazing how reliable a system can become just by killing, mashing and burning all that low-hanging error-fruit.


Code reviews should be about project structure and abstractions and keeping approach in order or to use team common approach instead of each team member doing whatever, well syntax/code should be linted and formatted automatically nothing for reviewer. Second thing is checking by second pair of eyes if they understand code in question in the same way.

Unit and integration tests should not be generated. Those should be written by people if they find code that they are writing doing complex things like some specific calculation. It is more as a tool for understanding what you are doing and then maybe leave some tests behind for regression. But don't generate BS tests that will only slow down system and people. People have to understand what is going on and be on top of it and never "just run the tests" because tests that are passing green but are actually wrong are really bad.

UI testing should not be abstractable - it should be only augmenting manual UI testing - so tester should be automating his own work after he has done it manually. That tester should also find things that take him long time or have to be done multiple times and are not changing often so he wins time to do more important things. QA person should also be always engaged with the system and automation because that is the only way you can keep domain knowledge.


It depends how you count the unit/integration tests, really.

If you've got a general rule which must apply across an entire system, generate the necessary tests so that they fail granularly and don't require messing around to find the exact case which breaks. IMO that's one test, just applied to a range of cases.

An example might be mappings for Entity Framework (or similar ORMs, etc). Auto-generated migrations simply do not work if you need to limit migration downtime and maintain certain data invariants (which can't be specified in the schema, and yes, those always exist). So you need to write database migrations manually. This introduces risk of desync of entity mappings and schema.

So don't just spot-test roundtripping entities (a nontrivial system will have hundreds and something always gets missed). Instead, write a tool to introspect the DB schema and the Framework's mappings, and check that they match sufficiently closely. Every time someone adds an entity or property or something, it's already covered.

Similar cases exist when dealing with any interface between separate systems, especially when you don't control one of them. If you're regularly mapping between two models, use something like Automapper which can be asked to verify its mappings to check that every property is handled in some way.

(Granted, Automapper doesn't catch everything, but it builds a model that could probably be introspected over to spot encountered bugs and check that other possible examples of those bugs don't exist. Doing so generatively catches future additions of possible cases for free. If you're really paranoid, define some means of marking manually-written tests which cover each case, and test that a test exists for each case.)

Computers are really good at force-multiplication. They should trivially be capable of spotting other instances of known categories of bug. This is not hard to do and doesn't require wooly nebulous machine-learning shit: we've had introspectable ASTs since the dawn of compilers.


The only silver bullet is approximated by a holistic approach.


This.

Jira is a framework for building something which approximates your actual processes.

> Justin said “I wish Atlassian would sit down with real-world developers and design this product the way we need it to work.”

The way 'you' need it to work. The process 'you' use. I suppose it's possible to build a business on creating a completely bespoke ticketing system for each company which uses one. Over time, that business might gravitate towards building a general toolkit which can be configured for each use case...

Oh look! Jira!

I'd bet that everyone's got their own distinct idea of where subtasks, ticket relations, etc actually fit in their workflow.

The article clearly exists to sell LinearB's product, which will of course work great for people who have the specific problems LinearB are solving with their product.

The rest of us will configure Jira within epsilon of 'works', then spend an afternoon reading about APIs and bash out our own automation service to update X different management tools appropriately for our workflows, and then go down the pub to bitch about management.


I'm not sure why nobody provides a task tracking tool that has tasks organized hierarchically (with chord/circular dependency. This is the real world, regardless of the argument about subjective processes. If you can't see where in your workflow, that 1 task (even from another team's workflow) is impacting multiple other workflows at-a-glance, the tool is too primitive. Subtasks don't cut it.

JIRA is the most popular of all tools on the market, which are glorified gantt points across disparate fragmented views. This is an element of modern tech that makes large organizations hilariously inefficient.


Jira has many project types and products, and the one most commonly used is actually called Jira SOFTWARE. It literally has software in the name. If you're building a product for software companies, you can think about how to do it better specifically for engineers


Some other commenters have mentioned environment variables as input.

IMO there are broadly two types of command: plumbing and porcelain. There's a certain amount of convention and culture in distinguishing them and I'm not going to try to argue the culture boundary...

For the commands which are plumbing (by whatever culture's rules), the following apply:

* They are designed to interact with other plumbing: pipes, machine-comprehension, etc

* Exit code 0 for success, anything else for error. Don't try to be clever.

* You can determine precisely and unambiguously what the behaviour will be, from the invocation alone. Under no circumstances may anything modify this; no configuration, no environment.

For the commands which are porcelain (by the same culture's rules, for consistency), the following apply:

* Try to be convenient for the user, but don't sacrifice consistency.

* If plumbing returns a failure which isn't specifically handled properly, either something is buggy or the user asked for something Not Right; clean up and abort.

* Environment and configuration might modify things, but on the command line there must be the option to state 'use this, irrespective of what anything else says' without knowing any details of what the environment or configuration currently say.

To make things more exciting, some binaries might be considered porcelain or plumbing contextually, depending on parameters... (Yes, everyone sane would rather this weren't the case.)


88. If the signaling value of a college degree is its most valuable part, then we are creating a society that values the appearance of success more than actual success.

Someone might be a bit confused about where the causality arrow is pointing.


I feel like college has less signaling value in most jobs than it functions as an idiot check, which is really valuable for people fresh out of college with few ways to prove they're not complete idiots.

I'm not saying there is no signal value, or that some positions have more signaling than others. I just don't think it's the dominant factor in the majority of cases.


Can you elaborate?


I meant that 'creating' is rather the wrong tense, and that this is probably a symptom that it's pretty well entrenched by now.


Ah, got it, thank you.


I prefer to internalise useful ways of thinking rather than leaning too much on tools, on the grounds that the latter are easy-come-easy-go while the former can last a lifetime.

Learn to explain what you're building, why, and more-or-less how it works in a context appropriate to the listener.

(Related: write things down. Again and again, differently, until you understand them.)

Problems and their solutions usually have similar structures.

Any fix right now might save a lot of money. The right fix next week could save years. Often both are appropriate, but the latter is nearly always most valuable.

The world does not change nearly as fast as your competitors want everyone to believe it does, but people's beliefs can.

A tool which warns you of possible mistakes before consequences occur is always more valuable than a tool which tries to guess what you meant and does that instead.


This, most of all. Substitute native language if not English; the important thing is that the project be defined and developed in both a human language and a computer language, so that mismatches can be identified and resolved.

  * Start by describing what you are trying to do.
    * Specifically.
    * Not 'build a web business to enable users to achieve their
      potential', not 'create another library for X but simpler'
      but *specifically what the software will do* and, most
      importantly, the priorities of scope increase (it'll happen
      anyway; just get the priorities and their order down in
      text ASAP).
    * Put it in a readme.txt or something.

  * For any given subsystem of the code (meaning: any section
    which can sort-of stand on its own) write another such file,
    in more detail.

  * Let these files guide your tests too.

  * Keep them up to date. If priorities change, *start* by
    updating the readmes. The code isn't immutable; nor is the 
    plan. But the plan comes first.

  * When unsure how a new subsystem or feature is going to work,
    write out your ideas and thought processes in text. Don't
    start coding at all for a day or two after you sketch out the
    basics. *Append* to this file instead of replacing large
    parts.
[edit] Wasn't intended to quote that part (sorry to mobile users) but I can never remember how to get bulleted lists on this site...


I found that combining such approach with writing down basic interfaces works really well - after i have a rough written idea i iterate over interface design with full descriptive comments of both the interface, and the methods.


Have any samples/examples you can share?


Things like that https://gist.github.com/antirez/ae068f95c0d084891305. Usually more detailed with data structures, but this one was a very conceptual thing.


You mean like the OS is supposed to be?


Exactly. The browser becomes (basically) an OS running within the host OS, so s/he's invented a kind of half-cocked virtualisation. There's a joke in there somewhere about turtles all the way down.


1. Our customers run our software on their own machines for security and data-control reasons. As soon as something's running on someone else's hardware, the data is out of your control. Unless you're going to accept the (often massive) cost of homomorphic encryption, AND have a workload amenable to that, it's a simple fact.

2. Everything we do in house is small enough that the costs of running it on our own machines is far less than the costs of working out how to manage it on a cloud service AND deal with the possibility of that cloud service being unavailable. Simply running a program on a hosted or local server is far far simpler than anything I've seen in the cloud domain, and can easily achieve three nines with next to no effort.

Most things which 'really need' cloud hosting seem to be irrelevant bullshit like Facebook (who run their own infrastructure) or vendor-run workflows layered over distributed systems which don't really need a vendor to function (like GitHub/Git or GMail/email).

I'm trying to think of a counterexample which I'd actually miss if it were to collapse, but failing.


https://stackoverflow.com/questions/898489/what-programming-...

Most languages have context-free syntax, which is what the article refers too. There really is no reason to sacrifice that. Even modern PHP recognises the value of having a parse tree independent of an entire compiler.

Context-free semantics is an entirely different matter, and I'm not even sure what it'd mean...


The two top answers are in conflict. The second answer with 43 points is closer to right:

There are hardly any real-world programming languages that are context-free in any meaning of the word.

The first answer with 41 points is totally wrong: The set of programs that are syntactically correct is context-free for almost all languages

-----

A better source is this whole series by Trevor Jim, which has nice ways of relating theory to practice. He lists a bunch of reasons why you could consider nearly all programming languages not context-free.

http://trevorjim.com/parsing-not-solved/ -- hundreds of parser generators support context-free grammars, but there are almost no context-free languages in practice.

http://trevorjim.com/python-is-not-context-free/ -- A main point here is that you have to consider the lexer and parser separately. Grammars don't address this distinction, which arises in essentially all programming languages. Also there is some lexical feedback, similar in spirit to C's lexer hack.

http://trevorjim.com/haskell-is-not-context-free/

http://trevorjim.com/how-to-prove-that-a-programming-languag...

http://trevorjim.com/c-and-cplusplus-are-not-context-free/ -- the way LALR(1) conflicts are resolved in practice can make a language not context-free

(copying from a recent comment)


The implementation of D has a lexer that is independent of the parser, and a parser that is independent of the rest of the implementation. I have resisted enhancement proposals that would put holes in those walls.


could you give an example for

> Any programming language that is massively adopted is context free?

I want to have an intuition on what is context free and what is not.

Is D context free?

I mean I want to make a useful language, is being context free practical?


D is context free in that the parser does not require semantic analysis in order to complete its parse.


The best way would probably be to work on a language that isn't context-free. When you run into the problems context-sensitivity introduces, then you'll really start to understand it.

In the past, Pascal was widely used in teaching contexts, for both learning to program and later how to implement a compiler. I've seen Walter say in the past that he got into doing languages from a series of articles in Byte magazine that included the listings for a small Pascal compiler. For your task, you can use Oberon instead, which is a smaller and simpler than Pascal. Oberon-07 is context free up to a point, which is another way of saying that it's not context-free. It'll work well for this exercise. Don't worry about writing a full-fledged compiler, because you'd end up spending forever on the code generator backend. Instead implement half of a compiler.

Here's one project that fits the description of being one half of a compiler—a documentation generator:

https://github.com/schierlm/OberonXref/

Implementing this requires doing the lexer and the parser, but its "backend" streams out documentation pages instead of object files or assembler. The truth is, you wouldn't even need to go that far. Just do the lexer and then start implementing the parser—let's say that your program should just read in source code and then say whether it's valid or not—whether it will parse. Since Oberon is mostly context-free and the natural way to do things is to use a recursive descent parser, you'll get into a rhythm implementing things, until you run into the part that's context-sensitive, at which point things will get uncomfortable. You will find true enlightenment then. This can be done over a weekend, but if you reach the end of the first weekend and realize it will take you two, don't sweat it.

> I mean I want to make a useful language, is being context free practical?

It is. If you do the exercise above, it should be straightforward to figure out several different ways to change the grammar to eliminate context-sensitivity in your parser. You'll realize that the transformations would be minor. They won't compromise the power or practicality of the language, and it's possible to end up with one that is almost exactly the same. The biggest casualty in this case would be reworking a dubious design decision made for aesthetic reasons.


Thank you very much for the references. They are very good reads!

In the article about Python, the author said:

> Most languages of interest are context sensitive, and yet we are trying to use inappropriate tools (context-free grammars and context-free parser generators) to specify and parse them. This leads to bad specifications and bad implementations.

Then what do you think is a better tool to specify and parse languages?


what do you think is a better tool to specify and parse languages?

That's unfortunately an open problem. As I mentioned here, lexing is "solved" but parsing isn't:

https://github.com/oilshell/oil/wiki/Why-Lexing-and-Parsing-...

Here are what some recent languages do:

Go: Maintain a written spec [1], with multiple hand-written parsers. (I think the main one is in Go, while gccgo has one in C++?) I believe Go used to be parsed with yacc, but they moved to a hand-written parser for error messages. It looks like [2] was an early attempt to keep it generated.

[1] https://golang.org/ref/spec

[2] https://research.swtch.com/yyerror

-----

Rust: The parser is hand-written, like Go. There's a grammar in the spec, but like Go's, it's not used to generate code, so it isn't tested. They also have an LALR(1) grammar which was suppoed to be an executable specification, but as far as I can tell that effort has not made progress recently.

It used to be done with GNU yacc but it looks like they moved to a Rust-based tool [3]

[1] https://doc.rust-lang.org/grammar.html

[2] https://github.com/rust-lang/lang-team/tree/master/working-g...

[3] https://github.com/rust-lang/wg-grammar/tree/master/grammar

-----

So basically the state of the art is fairly laborious. It's basically write down the grammar, implement it by hand, and try not to make any mistakes. If you need to implement it again, which is extremely common (e.g. for gccgo, for Rust's language server), also try not to make any mistakes.

When you need to change the language, update the spec, and change all the parsers by hand.

I've written multiple recursive-descent parsers based on grammars, and it's not hard if you have examples to test with, but mistakes can easily creep in.

Rely on users to report bugs. There is a "long tail" of programs that will tickle various corner cases.

-----

Here's the approach I took with Oil:

How to Parse Shell Like a Programming Language http://www.oilshell.org/blog/2019/02/07.html

which is basically to do as much work as possible in the lexer, use a high-level programming language translated to C++ for the parser, and to use a DSL for specifying the AST. This approach keeps the mistakes and tedium down because shell is an extremely large language syntactically. (It could be the biggest of any language except for perhaps Perl. Shell is relatively small semantically, but huge syntactically (which is why I'm preoccupied with parsing :) ).


> A main point here is that you have to consider the lexer and parser separately. Grammars don't address this distinction, which arises in essentially all programming languages.

So why does this happen? Couldn't context-free language parsers emulate a lexer by treating individual characters as symbols?


If there's no feedback, you can consider the lexer and parser separately as languages.

- The lexer recognizes a set of strings of characters.

- The parser recognizes a set of strings of tokens (as returned by the lexer)

But those two languages will have different power in general, so, like Jim points out, when you say something like "Python is context-free" or "Haskell context-free", it's not clear what you're talking about. And it's really false under any reasonable interpretation.

So you can consider them separately, and make a precise statement. But if there's feedback between the two, then you can't do that anymore. The theory doesn't tell you any properties a that the (lexer + parser + feedback mechanism) possess.

That is, regular languages and context-free languages have all sorts of properties proven about them, including ones that let you write code generators. You don't get any of those properties when you have ad hoc feedback mechanism between the lexer and parser.

----

Someone could come up with formalisms for specific types of feedback.

I think OCaml's Menhir has done some of this, but I don't remember the details off hand.

They have written parsers for C (CompCert) and POSIX shell and addressed some of the gaps between theory and practice. I don't use it but they're at least tackling the right problems.

But note that C uses a specific type of feedback (the lexer hack), which isn't identical to what other languages use. So you would have to come up with a formalism for each one, and probably nobody has done that.


Is Haskell context-free if you don't use the indentation layout mode? Haskell does support using braces and semicolons too. However, this is not true of Python (as far as I know).


In theory Haskell supports braces and semicolons but in practice nobody uses them, leading to bugs like this one:

https://github.com/haskell/haddock/issues/12


I do use braces and semicolons when programming in Haskell, and I have encountered that bug before, and I hope that they will fix it (although it still seems to be unfixed after six years, but a few people clearly do care, even if others don't care).


what is context free language anyway? context free grammar had a clear definition in parsing, not sure i understand how that can be extended to languages.


The OP is abusing the term "context free". He's saying it avoids "the lexer hack" [1]:

Context free grammars. What this really means is the code should be parseable without having to look things up in a symbol table

That's NOT what context free means. That's a narrow view from someone designing a C-like language and trying to avoid a very specific property of C.

Another example of being context-sensitive, which has nothing to do with symbol tables, is that resolving LALR(1) conflicts in a yacc-style grammar can make the language not context-free. To resolve ambiguity the parser is doing stuff "outside" the grammar.

----

"Context free" is a mathematical term with a precise definition. It's a class of languages that's part of the Chomsky hierarchy [2], which was discovered and described by linguists and mathematicians before any programming language existed, and has applications outside programming languages. Wikipedia does a good job:

https://en.wikipedia.org/wiki/Context-free_grammar

A formal grammar is considered "context free" when its production rules can be applied regardless of the context of a nonterminal. No matter which symbols surround it, the single nonterminal on the left hand side can always be replaced by the right hand side. This is what distinguishes it from a context-sensitive grammar.

Simple examples of languages that's are context-sensitive (not context-free): Lua string literals, Rust raw strings (see other comments in this thread), and shell here docs:

http://lua-users.org/wiki/StringsTutorial

Yes:

    [[ mystring ]]
    [=[ mystring ]=]
    [==[ mystring ]==]
No:

    [=[ mystring ]]    # mismatched
Matching this language requires a context-sensitive grammar and can't be done with a context-free grammar. It's not all that easy to prove: See the posts from Trevor Jim regarding proofs.

The C language is also not context-free.

[1] http://www.oilshell.org/blog/2017/12/15.html#appendix-lexing...

[2] https://en.wikipedia.org/wiki/Chomsky_hierarchy


> [CF] was discovered and described by linguists and mathematicians before any programming language existed

Nitpick: that was 1956, by then a few PL did already exist.


I read it to mean that the language's grammar is context free, but I guess I'm not sure if that's what was meant.


> Most languages have context-free syntax

Do they? I haven't done a survey of languages but I would guess most are context-sensitive.


I would guess so, too. In my experience, they do strive to be CF, the CS parts express mildly/are few and far between and often are accidental.

That makes using a CFG parser practical and hacking around the problems caused by CS does not take much effort.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: