Hacker News new | past | comments | ask | show | jobs | submit login

In my experience, the parser definitely needs to be a library, but I'm not sure about the runtime.

There are a few places in POSIX where the parser has to be invoked recursively:

- command sub: $() and ``

- eval

- alias expansion. (I found some divergence in how shells implement this, but it does involve the parser.)

In Oil I also used the parser as a library in several other places:

- For interactive completion. Bash does not do this, and I don't believe any other POSIX-ish shell does. I wrote a bit about that in the latest blog post [1]. This turned out really well.

- For history expansion, because unevaluated words have to be picked out of previous command lines. Bash does not do this either.

Consider:

    $ echo ${x:-a b c}
    a b c
    $ echo !$
    echo c}
    c}
IMO this is fairly nonsensical behavior, and the underlying cause is that bash chooses to write duplicate, ad hoc parsers for its own language! There are many cases like this with completion, e.g.

    $ if ec<TAB>
    $ for i in 1 2 3; do ec<TAB>
Bash isn't smart enough to complete "echo" in these cases, because it doesn't know it's in the "first word' state.

It also chooses to treat = and : as completion word delimiters, even though they don't delimit normal words, and this causes a lot of problems that the bash-completion project patches over in a very ugly fashion.

----

As for the runtime, one problem is that the shell inherently modifies global process state. So there is a limit to the abstraction you can provide over it. For example consider this program:

    { echo hi
      ls / 
      echo bye
    } > out.txt
There's essentially one way to implement this with Unix system calls, but you couldn't have two different interpreters running them concurrently in the same process, because the process FD tables would get stomped on. (i.e. my definition of library is that you can make multiple instances of it with different parameters.)

----

The general idea of a shell that conforms to POSIX but provides a better interactive experience is a good one. (That seems to be the feedback on the Fish Shell 3.0 thread on the front page).

Although it is a huge amount of work! I hope that I will be able to metaprogram / compile Oil into something more compact, but that's an open problem now :) It is shaping up to be a better interactive shell than I originally thought though. Treating the parser as a library was a big win.

(Among other reasons, it's not a library in bash because it uses many global variables.)

[1] http://www.oilshell.org/blog/2018/12/16.html




Thanks for sharing your insights!

>There are a few places in POSIX where the parser has to be invoked recursively

In your examples, we have the runtime invoke the parser as necessary. The parser doesn't know about the runtime. For alias resolution, we have a callback function, which hooks into the runtime but is pretty thin and abstract.

>For history expansion

Thankfully, this is non-POSIX so mrsh doesn't have to worry about it.

>one problem is that the shell inherently modifies global process state [...] i.e. my definition of library is that you can make instantiate multiple instances of it with different parameters

My definition doesn't line up with yours. My definition is a shared object or static archive and a bunch of headers with an API you can link to instead of implementing something yourself.


Are you parsing command subs at runtime too? Bash does that [1], but I believe it's a bad idea. dash, mksh, and zsh seem to do it "the right way", although none of them statically parses as much as OSH.

IIRC a case that really seals the deal is:

    $ echo $(case x in x) echo foo;; esac)
    foo
How do you find the closing paren? You basically have to parse shell, so you might as well do that at parse time rather than runtime. There's a section in the aosabook bash chapter that talks about that.

In other words, bash has had parsing bugs with PAREN MATCHING for 20 years (I have a case in my suite that was fixed between bash 4.3 and 4.4). If you just statically parse then you can get it right all on the first try.

It can get arbitrarily complicated, you can add a subshell and nested command subs in there too, etc.:

    $ echo $( ( case x in $(echo x)) echo foo;; esac) )
    foo
Bash syntax makes it worse, but this problem appears in POSIX sh too.

[1] http://www.oilshell.org/blog/2016/10/13.html


    ~/s/m/build > cat test.sh
    #!/bin/sh
    echo $(case x in x) echo foo;; esac)
    ~/s/m/build > mrsh -n test.sh
    program
    program
    └─command_list ─ pipeline
      └─simple_command
        ├─name ─ word_string [2:1 → 2:5] echo
        └─argument 1 ─ word_command ─ program
          └─command_list ─ pipeline
            └─case_clause
              ├─word ─ word_string [2:13 → 2:14] x
              └─items
                └─case_item
                  ├─patterns
                  │ └─word_string [2:18 → 2:19] x
                  └─body
                    └─command_list ─ pipeline
                      └─simple_command
                        ├─name ─ word_string [2:21 → 2:25] echo
                        └─argument 1 ─ word_string [2:26 → 2:29] foo


OK it looks like mrsh is parsing command subs at parse time, which is good! bash doesn't do that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: