I think this comment/discussion ends up over-emphasizing the 'strangeness' of the line-number situation with this library, the page linked ends with: 'This position is subject to discuss. Also it doesn’t stop anyone from forking the code and maintaining line-numbers implementation separately.', which seems pretty reasonable to me!
Yeah, this is pretty much the behavior that I want to see from open source maintainers: make decisions and build conventions but be open to having your mind convinced of alternatives.
I already like this one better, to be honest. It also already has Julia and Rust support. If it were to include Elixir support as well, it'd be perfect.
Laser pointers don't work if the video is being recorded or screencast at the same time. They only work when the viewer is in the same room, and even then, they are typically pretty poor.
Out of several dozen different uses of laser pointers, I've only ever been able to (easily) see perhaps 3 or 4 of them, plus another small portion if focusing entirely on the game of spot-the-laser-pointer. It's related to my colourblindness.
As such, verbal cues (or a long stick) are preferable to a laser pointer. Of course, use a laser pointer to assist those who can see it, but it's better assume half the people in the room can't and use words to this effect.
It does seem more appropriate to have a separate module for line numbers. I would definitely understand the separation of concerns argument, but the maintainer instead makes some sort of emotional appeal about clutter and simplicity to explain why there is no line number support.
> the maintainer instead makes some sort of emotional appeal about clutter and simplicity to explain why there is no line number support
Clutter and simplicity is not an emotional argument. It becomes verifiably more difficult to maintain a code base with each new line. And since he's the one doing the maintaining, his position seems quite reasonable.
Secondly, he seems to have a strong aesthetic aversion to line numbers, and I can totally respect that. Frankly, among popular open source projects, those whose developers have the strongest sense of style and design and usually (not always) the best.
Many popular syntax highlighting libraries do include this functionality. For example, the popular Python syntax highlighter "Pygments" supports line numbers[0].
I understand the author's point-of-view on this, but disabling line numbers sounds like something that should be user-controllable. Right now it feels like a misfeature, whereas "we don't implement line numbers by default, but if you want them, do this and that and you can have them" would feel way more like a good feature to have.
This is my favorite syntax highlighting JS library, by far.
The approach this library takes to building the highlighted DOM tree is very clever. It represents the original DOM and the highlighted code output as separate streams of tag open / close events and then merges them together. This allows the highlighter to maintain the pre-existing DOM structure of the highlighted code! The language auto-detection and sub-language handling are also very neat.
If you like interesting codebases, I'd recommend giving it a quick read. :)
When you brush the code (extract highlighing information) you provide the set of initial matches which are converted into a tree: https://github.com/ioquatix/jquery-syntax/blob/97e38d08924d7... - there is no merge step, they are merged in place in the tree which is then used to generate DOM (or whatever you like really) output.
I never implemented language auto-detection but I'm sure I've seen that implemented before prism before, perhaps Google's Prettify might have been one of the first to do it. It basically highlights using all available brushes and looks at which one matches the most tokens - not sure if this is how it is currently implemented though. My feeling is that you usually know what language you are highlighting, the auto-highlighter wouldn't always get it right, and the cost of loading all brushes/languages is pretty high (imagine for example you have 100 different languages supported, you'd have to do all of them)..
Basically, if you match some sub-portion of the code, you then highlight that using a different brush. Sometimes it's almost impossible to know though (e.g. diffs which may be incomplete).
In my opinion it's irresponsible to use highlighters like these. At least when you use them like advertised. Why does no one bat an eyelid at letting the clients handle the highlighting?
The highlighting (lexing) could (and should) be done ONCE at the server. Imagine some high traffic blogs with hundreds of millions of hits combined - all these clients downloading the highlighting libs + lexers for 50 different languages that will never be used on the blog anyway. The wasted energy here is probably even measurable.
Cons when letting the clients handle syntax highlighting:
- Much slower load times
- Wasted energy
- Can fail because of JS crashing
- Noscript users won't get any highlighting (and your blog targets technical users, right? They are more likely to use no script)
- Slow mobile devices will have a harder time loading the site (or depending on javascript engine - failing highlight)
- (if no site optimization is made) the client will download lexers for languages not used on the blog
Cons when the server do the lexing ONCE:
- NOTHING
Note that this rant isn't targeted at Highlight.js. It's a good highlighter - I use it myself at my blog. Except I use it on the server via a CGI script. It took me no more than 2 minutes to set up.
Except I use it on the server via a CGI script. It took me no more than 2 minutes to set up.
How about you write a tutorial about this and submit it to HN? This is probably a better way to make people adapt your approach than a rant, no matter how valid your points are :)
Computing is cheap. Setting up and maintaining that is more human seconds than doing it clientside. I dare you to calculate the amount of energy this wastes on a blog that gets a billion hits.
This is a battle you cannot win and should not attempt to fight.
In my opinion it is irresponsible to post strongly worded opinions such as these, because you forget you may not be seeing the whole picture.
Cons when the server do the lexing ONCE:
- NOTHING
You are assuming a specific use case, where static code originally exists on the server and the highlighted version is included in a page that is viewed many times.
My use case for these libraries is different: users actually write 'code' in my app, which I wish to highlight as they type it. Of course many users will write the same fragments and energy could be saved by caching the highlighting of those on the server. However, the latency of that solution (roundtrip to server needed to highlight code) and the code complexity (cache expiration is one of the two 'hard' problems in SE) are much worse than for the client-side highlighting solution.
That's an entirely different problem I'd say, and the perfect use case for these libraries on a client! If you'd do it server side it would be like a local IDE sending the code to be lexed remotely...
My rantiness (I apologize for that) targets programming blogs/static pages.
At one point I got really into the idea of ASTs for languages. I've used pygments, the python syntax highlighter, with the intention of parsing the meaning of code.
These syntax highlighters are great but I think that underlie the lack of a defined and accessible AST parse definition for most languages. Highlight.js and others kind of just rely regexes-- https://github.com/isagalaev/highlight.js/blob/master/src/la...
It'd be great if we could parse through programming languages to get their meaning. I want to get tuples back!
```
a = 23
```
would give
(variable(name=a), assignedTo, number(23))
This does exist for some languages, but mostly compiled ones. But it would make the syntax highlighting even more robust! Antlr is the best one around at the moment. http://www.antlr.org/
I believe CodeMirror does something like this. It uses restartable parsers (or it used to) to make parsing fast while editing.
I experimented editing code using jQuery.Syntax. I used the match tree it generated to figure out where to restart parsing and it was pretty fast, it would only re-evaluate the current line in most cases.
CodeMirror still uses the restartable parsers. But they produce a flat sequence of styles, not a hierarchical AST. E.g. `foo bar* baz` in markdown becomes <span class="cm-em">foo </span><span class="cm-strong cm-em">bar</span><span class="cm-em"> baz*</span>.
Also, most parsers reuse a small set of style names (that are covered by themes) without much regard to semantic appropriateness. E.g. markdown lists cycle through 'variable-2', 'variable-3', 'keyword'.
I've tried Highlight.js before and I was never really a big fan of it. It seemed too heavy last time I tried it. I've been using rainbow.js for a year or two now.
What's the recommended way of using something like this or prism.js to highlight a simple textarea containing code?
I'm not talking a full-on programming editor, that's a much larger scope. Rather, I have an admin tool with a <textarea> that allows entering short code snippets, and it would be handy to see highlighting as you type. Even at this low level of complexity, is it simpler to just embed something like Ace, or is there an easy way to use a highlighting library?
This isn't really related to the act of highlighting syntax, but I noticed the example code for Elixir at https://highlightjs.org/static/demo/ is quite out of date. Records generally aren't used anymore, and the syntax to send a message to a process has changed from `pid <- message` to `send(pid, message)`
Used this in a project[0] a little while ago and found it really easy to work with. The syntax detection was an added bonus that I had expected to need another library for.
edit: I should mention that this might be a cool example for someone looking to use hljs in React/Flux-- I tried to make it fairly clean. There's a github link on the bottom-left of the website.
Out of curiosity, are there any decent libraries you looked at for doing language detection? Thhat's the main reason I'm using Highlight.js in one of my projects instead of Prism or Rainbow.
Unfortunately, the tradeoff is no support for generating a display with line numbers. In my case, that was the lesser of two evils, but I wouldn't mind using two libraries if it meant I could get everything I want.
This is my favorite syntax highlighting library. The best feature is that it returns a continuation of the code being highlighted which means you can highlight code that's being streamed to you line by line. I couldn't find another library that does that.
You may have users actually writing 'code' on the client side. For those cases a client-side solution gives a better experience that requires less code (no roundtrip to server needed).
http://highlightjs.readthedocs.org/en/latest/line-numbers.ht...