Some random things that the author seem to have missed: > but TypeScript, Swift,...

autarch · 2024-11-02T15:36:18 1730561778

Perl lets you do this too:

    my $foo = 5;
    my $bar = 'x';
    my $quux = "I have $foo $bar\'s: @{[$bar x $foo]}";
    print "$quux\n";

This prints out:

    I have 5 x's: xxxxx

The "@{[...]}" syntax is abusing Perl's ability to interpolate an _array_ as well as a scalar. The inner "[...]" creates an array reference and the outer "@{...}" dereferences it.

For reasons I don't remember, the Perl interpreter allows arbitrary code in the inner "[...]" expression that creates the array reference.

Izkata · 2024-11-02T19:29:27 1730575767

> For reasons I don't remember, the Perl interpreter allows arbitrary code in the inner "[...]" expression that creates the array reference.

...because it's an array value? Aside from how the languages handle references, how is that part any different from, for example, this in python:

  >>> [5 * 'x']
  ['xxxxx']

You can put (almost) anything there, as long as it's an expression that evaluates to a value. The resulting value is what goes into the array.

autarch · 2024-11-02T19:38:26 1730576306

I understand that's constructing an array. What's a bit odd is that the interpreter allows you to string interpolate any expression when constructing the array reference inside the string.

Izkata · 2024-11-02T19:56:56 1730577416

It's not...? Well, not directly: It's string interpolating an array of values, and the array is constructed using values from the results of expressions. These are separate features that compose nicely.

JadeNB · 2024-11-03T04:11:50 1730607110

> What's a bit odd is that the interpreter allows you to string interpolate any expression when constructing the array reference inside the string.

Why? Surely it is easier for both the language and the programmer to have a rule for what you can do when constructing references to anonymous arrays, without having to special case whether that anonymous array is or is not in a string (or in any one of the many other contexts in which such a construct may appear in Perl).

weinzierl · 2024-11-02T20:38:58 1730579938

You also don't need quotes around strings (barewords). So

    my $bar = x;

should give the same result.

Good luck with lexing that properly.

https://perlmaven.com/barewords-in-perl

shawn_w · 2024-11-02T23:26:38 1730589998

If you're writing anything approaching decent perl that won't be accepted.

weinzierl · 2024-11-03T10:00:43 1730628043

Doesn't really matter for a syntax highlighter, because it is out of your control what you get. For the llamafile highlighter even more so since it supports other legacy quirks, like C trigraphs as well.

emmelaich · 2024-11-03T01:57:30 1730599050

"use strict" will prevent it and I think strict will be assumed/default soon.

JadeNB · 2024-11-03T04:13:38 1730607218

As of Perl 5.12, `use`ing a version (necessary to ensure availability of some of the newer features) automatically implies `use strict`.

https://perldoc.perl.org/strict#HISTORY

layer8 · 2024-11-02T17:15:19 1730567719

> actual code being embedded inside strings

My view on this is that it shouldn’t be interpreted as code being embedded inside strings, but as a special form of string concatenation syntax. In turn, this would mean that you can nest the syntax, for example:

    "foo { toUpper("bar { x + y } bar") } foo"

The individual tokens being (one per line):

    "foo {
    toUpper
    (
    "bar {
    x
    +
    y
    } bar"
    )
    } foo"

If `+` does string concatenation, the above would effectively be equivalent to:

    "foo " + toUpper("bar " + (x + y) + " bar") + " foo"

I don’t know if there is a language that actually works that way.

panzi · 2024-11-02T17:38:56 1730569136

Indeed in some of the listed languages you can nest it like that, but in others (e.g. Python) you can't. I would guess they deliberately don't want to enable that and it's not a problem in their parser or something.

Tarean · 2024-11-02T17:59:01 1730570341

As of python 3.6 you can nest fstrings. Not all formatters and highlighters have caught up, though.

Which is fun, because correct highlighting depends on language version. Haskell has similar problems where different compiler flags require different parsers. Close enough is sufficient for syntax highlighting, though.

Python is also a bit weird because it calls the format methods, so objects can intercept and react to the format specifiers in the f-string while being formatted.

panzi · 2024-11-02T19:24:55 1730575495

I didn't mean nested f-strings. I mean this is a syntax error:

    >>> print(f"foo {"bar"}")
    SyntaxError: f-string: expecting '}'

Only this works:

    >>> print(f"foo {'bar'}")
    foo bar

pdw · 2024-11-02T20:27:28 1730579248

You're using an old Python version. On recent versions, it's perfectly fine:

    Python 3.12.7 (main, Oct  3 2024, 15:15:22) [GCC 14.2.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> print(f"foo {"bar"}")
    foo bar

layer8 · 2024-11-02T17:57:04 1730570224

Even when nesting is disallowed, my point is that I find it preferable to not view it (and syntax-highlight it) as a “special string” with embedded magic, but as multiple string literals with just different delimiters that allow omitting the explicit concatenation operator, and normal expressions interspersed in between. I think it’s important to realize that it is really just very simple syntactic sugar for normal string concatenation.

Timwi · 2024-11-03T00:46:47 1730594807

While you're conceptually right, in practice I think it bears mentioning that in C# the two syntaxes compile differently. This is because C#’s target platform, the .NET Framework, has always had a function called `string.Format` that lets you write this:

  var str = string.Format("{0} is {1} years old.", name, age);

When interpolated strings were introduced later, it was natural to have them compile to this instead of concatenation.

layer8 · 2024-11-03T03:51:31 1730605891

There's no reason in principle why

    name + " is " + age + " years old."

couldn't compile to exactly the same. (Other than maybe `string.Format` having some additional customizable behavior, I don't know C# that well.)

epcoa · 2024-11-03T05:07:30 1730610450

Like python, and Rust with the format! macro (which doesn't even support arbitrary expressions), C# the full syntax for interpolated/formatted strings is this: {<interpolationExpression>[,<alignment>][:<formatString>]}, ie there is more going on then just a simple wrapper around concat or StringBuilder.

ygra · 2024-11-03T08:07:41 1730621261

When not using the format specifiers or alignment it will indeed compile to just string.Concat (which is also what the + operator for strings compiles to). Similar to C compilers choosing to call pits instead of printf if there is nothing to be formatted.

epcoa · 2024-11-03T03:07:52 1730603272

If it’s treated strictly as simple concatenation syntactic sugar then you are allowing something like print(“foo { func() ); Which seems janky af.

> just very simple syntactic sugar for normal string concatenation.

Maybe. There’s also possibly a string conversion. It seems reasonable to want to disallow implicit string conversion in a concatenation operator context (especially if overloading +) while allowing it in the interpolation case.

layer8 · 2024-11-03T03:45:32 1730605532

I failed to mention the balancing requirement, that should of course remain. But it's an artificial requirement, so to speak, that is merely there to double-check the programmer's intent. The compiler/parser wouldn't actually care (unlike for an arithmetic expression with unbalanced parentheses, or scope blocks with unbalanced braces), the condition is only checked for the programmer's benefit.

> here’s also possibly a string conversion. It seems reasonable to want to disallow implicit string conversion in a concatenation operator context (especially if overloading +) while allowing it in the interpolation case.

Many languages have a string contenation operator that does implicit conversion to string, while still having a string interpolation syntax like the above. It's kind of my point that both are much more similar to each other than many people seem to realize.

epcoa · 2024-11-02T18:52:13 1730573533

> "foo { …

That should probably not be one token.

> My view on this is that it shouldn’t be interpreted as code being embedded inside strings

I’m not sure exactly what you’re proposing and how it is different. You still can’t parse it as a regular lexical grammar.

How does this change how you highlight either?

Whatever you call it, to the lexer it is a special string, it has to know how to match it, the delimiters are materially different than concatenation.

I might be being dense but I’m not sure what’s formally distinct.

layer8 · 2024-11-03T03:31:13 1730604673

> > "foo { …

> That should probably not be one token.

It's exactly the point that this is one token. It's a string literal with opening delimiter `"` and closing delimiter `{`, and that whole token itself serves as a kind of opening "brace". Alternatively, you can see `{` as a contraction of `" +`. Meaning, aside from the brace balancing requirement, `"foo {` does the same a `"foo " +` would.

Still alternatively, you could imagine a language that concatenates around string literals by default, similar to how C behaves for sequences of string literals. In C,

    "foo" "bar" "baz"

is equivalent to

    "foobarbaz"

Similarly, you could imagine a language where

    "foo" some_variable "bar"

would perform implicit concatenation, without needing an explicit operator (as in `"foo" + x + "bar"`). And then people might write it without the inner whitespace, as:

    "foo"some_variable"bar"

My point is that

    "foo{some_variable}bar"

is really just that (plus a condition requiring balanced pairs of braces). You can also re-insert the spaces for emphasis:

    "foo{ some_variable }bar"

The fact that people tend to think of `{some_variable}` as an entity is sort-of an illusion.

> How does this change how you highlight either?

You would highlight the `"...{`, `}...{`, and `}..."` parts like normal string literals (they just use curly braces instead of double quotes at one or both ends), and highlight the inner expressions the same as if they weren't surrounded by such literals.

epcoa · 2024-11-03T04:30:41 1730608241

> It's exactly the point that this is one token.

Fair enough. The point, as you have acknowledged, being that unlike + you have to treat { specially for balancing (and separately from the “).

> The fact that people tend to think of `{some_variable}` as an entity is sort-of an illusion.

I guess. I just don’t know what being an illusion means formally. It’s not an illusion to the person that has to implement the state machine that balances the delimiters.

> You would highlight the `"...{`, `}...{`, and `}..."` parts like normal string literals (they just use curly braces instead of double quotes at one or both ends), and highlight the inner expressions the same as if they weren't surrounded by such literals

Emacs does it this way FWIW. But I’m not sure how important it is to dictate that the brace can’t be a different color.

In any event, I can agree your design is valid (Kotlin works this way), but I don’t necessarily agree it is any more valid than say how Python does it where there can format specifiers, implicit conversion to string is performed whereas not with concatenation. I’m not seeing the clear definitive advantage of interpolated strings being an equivalent to concatenation vs some other type of method call.

The other detail is order of evaluation or sequencing. String concat may behave differently. Not sure I agree it is wrong, because at the end of the day it is distinct looking syntax. Illusion or not, it looks like a neatly enclosed expression, and concatenation looks like something else. That they might parse, evaluate or behave different isn't unreasonable.

vidarh · 2024-11-03T10:15:10 1730628910

Ruby takes this to 100. As much as a I love Ruby, this is valid Ruby, and I can't defend this:

    puts "This is #{<<HERE.strip} evil"
    incredibly 
    HERE

Just to combine the string interpolation with her concern over Ruby heredocs.

My other favorite evil quirk in Ruby is that whitespace is a valid quote character in Ruby. The string (without the quotes) "% hello " is a quoted string containing "hello" (without the quotes), as "%" in contexts where there is no left operand initiates a quoted string and the next characters indicates the type of quotes. This is great when you do e.g. "%(this is a string)" or "%{this is a string}". It's not so great if you use space (I've never seen that in the wild, so it'd be nice if it was just removed - even irb doesn't handle it correctly)

jart · 2024-11-03T16:41:38 1730652098

https://pbs.twimg.com/media/GbEfj6fbQAQRUB7?format=png&name=...

That's so going in the blog post later today.

vidarh · 2024-11-03T19:40:41 1730662841

Heh. I love Ruby, but, yes, the parser is "interesting", for values of interesting left undefined for its high obscenity content.

mdaniel · 2024-11-03T17:32:21 1730655141

And don't overlook the fact that the bare-world, or its "HERE" friend, are still in an interpolation context, so...

    puts "hello #{<<onoz.strip} world"
    recursion is #{<<onoz.strip}
    recursive
    onoz
    onoz
    puts "that was fun"

yields

  hello recursion is recursive world
  that was fun

and then there's its backtick friend

    puts "hello #{<<`onoz`.strip} world"
    date -u
    onoz

coughs up

    hello Sun Nov  3 17:25:32 UTC 2024 world

and for those trying out your percent-space trick, be aware that it only tolerates such a thing in a standalone expression context so

  puts (% hello )+" world"
  # or
  x = % hello #
  puts x

because when I tried it "normally" I got

    $ /usr/bin/ruby -e 'puts % hello  + "world"'

    -e:1:in `<main>': undefined local variable or method `hello' for main:Object (NameError)
    $ /usr/bin/ruby -v
    ruby 2.6.10p210 (2022-04-12 revision 67958) [universal.x86_64-darwin21]

but, at the intersection is "ruby parsing is the 15th circle of hell"

    ruby -e 'puts (% #{<<FOO.strip}  )+ " world"
    hello
    FOO
    '

vidarh · 2024-11-04T00:49:16 1730681356

> $ /usr/bin/ruby -e 'puts % hello + "world"'

Yes, it's roughly limited in use to places where it is not ambiguous whether it would be the start of a quoted string or the modulus operator, and after a method name would be ambiguous.

> but, at the intersection is "ruby parsing is the 15th circle of hell"

It's surprisingly (not this part, anyway) not that hard. You "just" need to create a forward reference, and keep track of heredocs in your parser, and when you come to the end of a line with heredocs pending, you need to parse them and assign them to the variable references you've created.

It is dirty, though, and there are so many dark corners of parsing Ruby. Having written a partial Ruby parser, and being a fan of Wirth-style grammar simplicity while enjoying using Ruby is a dark, horrible place to live in. On the one hand, I find Ruby a great pleasure to use, on the other hand, the parser-writer in me wants to spend eternity screaming into the void in pain.

jart · 2024-11-04T02:11:46 1730686306

How are you so awesome?

vidarh · 2024-11-04T14:45:47 1730731547

Thanks. I'm a big fan of your work, so that is appreciated...

mbo · 2024-11-03T07:59:21 1730620761

> Scala

Note about Scala's string interpolation. They can be used as pattern match targets.

  val s"${a} + ${b}" = "1 + 2";
  println(a) // 1
  println(b) // 2

orthoxerox · 2024-11-05T08:04:18 1730793858

One cool feature of C# interpolated strings is that they are lazy. Many loggers used to implement their own interpolation because something like

    log.trace($"Entering iteration {i} for customer {c.ID} [{c.ShortName}]");

in a hot loop would call string.Concat every time it was called before the logger could bail out of the method.

C# lets you declare an overload that accepts a `DefaultInterpolatedStringHandler` (or your own custom implementation of the handler pattern) and this overload will take precedence and allow you to delay the building of the string until after you've checked whether logging it is required.

panzi · 2024-11-02T17:35:26 1730568926

Is this a bash-ism?

    "$x plus $y equals $((x+y))"

jwilk · 2024-11-02T19:01:24 1730574084

No, it's portable shell syntax.

LukeShu · 2024-11-02T20:47:59 1730580479

"$((" arithmetic expansion is POSIX (XCU 2.6.4 "Arithmetic Expansion").

But if I'm not mistaken, it originated in csh.

jonahx · 2024-11-02T18:58:27 1730573907

This works in "sh" as well for me.

panzi · 2024-11-02T19:21:43 1730575303

On some systems (like on mine) sh is just a link to bash, so I couldn't test it.

Izkata · 2024-11-03T02:50:42 1730602242

Isn't bash supposed to act like sh when executed with that name?

saagarjha · 2024-11-03T20:53:14 1730667194

It still has bashisms

susam · 2024-11-02T21:21:55 1730582515

> Is this a bash-ism?

> "$x plus $y equals $((x+y))"

No, it is specified in POSIX: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V...

Izkata · 2024-11-03T02:46:44 1730602004

  Make :)        echo "$(x) plus $(y) equals $(shell echo "$x+$y" | bc)"

I'm guessing this is the reason for the :) but to be clear for anyone else: Make is only doing half of the work, whatever comes after "shell" is being passed to another executable, then make captures its stdout and interpolates that. The other executable is "sh" by default but can be changed to whatever.

bastawhiz · 2024-11-03T19:59:24 1730663964

Python f-strings are kind of wild. They can even contain comments! They also have slightly different rules for parsing certain kinds of expressions, like := and lambdas. And until fairly recently, strings inside the expressions couldn't use the quote type of the f-string itself (or backslashes).

thesz · 2024-11-02T21:47:31 1730584051

VHDL

There is a record constructor syntax in VHDL using attribute invocation syntax: RECORD_TYPE'(field1expr, ..., fieldNexpr). This means that if your record has a first field a subtype of a character type, you can get record construction expression like this one: REC'('0',1,"10101").

Good luck distinguishing between '(' as a character literal and "'", "(" and "'0'" at lexical level.

Haskell.

Haskell has context-free syntax for bracketed ("{-" ... "-}") comments. Lexer has to keep bracketed comment syntax balanced (for every "{-" there should be accompanying "-}" somewhere).

sundarurfriend · 2024-11-02T20:58:19 1730581099

> Many more languages support that:

Julia as well:

    Julia    "$x plus $y equals $(x+y)"

cryptonector · 2024-11-03T19:38:54 1730662734

jq: "\("hello" + "world")!!"

I wish PG had dollar-bracket quoting where you have to use the closing bracket to close, that way vim showmatch would work trivially. Something like ${...}$.

1vuio0pswjnm7 · 2024-11-02T22:33:35 1730586815

Shell "$x plus $y equals $((x+y))"

Shell "$x plus $y equals $((expr $x + $y))"

1vuio0pswjnm7 · 2024-11-04T18:42:36 1730745756

Correction: Shell "$x plus $y equals $(expr $x + $y)"

therein · 2024-11-02T19:10:31 1730574631

> PostgreSQL has the very convenient dollar-quoted strings

I did not know that. Today I learned.