Hacker News new | past | comments | ask | show | jobs | submit login
Highlight.js – Syntax highlighting for the Web (highlightjs.org)
167 points by tilt on Feb 20, 2015 | hide | past | favorite | 69 comments



It should be mentioned that this does not support and will indeed never support line numbers due to some strange opinions of the lead developer:

http://highlightjs.readthedocs.org/en/latest/line-numbers.ht...


I think this comment/discussion ends up over-emphasizing the 'strangeness' of the line-number situation with this library, the page linked ends with: 'This position is subject to discuss. Also it doesn’t stop anyone from forking the code and maintaining line-numbers implementation separately.', which seems pretty reasonable to me!


Yeah, this is pretty much the behavior that I want to see from open source maintainers: make decisions and build conventions but be open to having your mind convinced of alternatives.


http://prismjs.com/ is an alternative that supports line numbers.


And in particular prismjs lets you select lines of code without selecting the line numbers, which you probably do not want.


Why wouldn't I want that?

If I am copying and pasting code from a website, it usually goes to my editor and I don't want to have to delete the line numbers from it.


You don't want the line numbers.


Oh if that's what he meant then I read it the wrong way.


Prism is pretty damn good. Although the name sounds like an NSA plan to intercept all the javascript libraries in the world.


I already like this one better, to be honest. It also already has Julia and Rust support. If it were to include Elixir support as well, it'd be perfect.


Line numbers are really helpful when presenting code at an event, in a meeting, etc.


Genuine question, why?


"So if you look at line 7, you'll see..."


... that the presenter should buy a laser pointer!


Laser pointers don't work if the video is being recorded or screencast at the same time. They only work when the viewer is in the same room, and even then, they are typically pretty poor.


Out of several dozen different uses of laser pointers, I've only ever been able to (easily) see perhaps 3 or 4 of them, plus another small portion if focusing entirely on the game of spot-the-laser-pointer. It's related to my colourblindness.

As such, verbal cues (or a long stick) are preferable to a laser pointer. Of course, use a laser pointer to assist those who can see it, but it's better assume half the people in the room can't and use words to this effect.


This also applies to comments within the presentation, not just the verbal part of the presentation.


Is line numbers considered "highlighting"? I mean, I get why you would want it but I never would have thought to ask for it.


It does seem more appropriate to have a separate module for line numbers. I would definitely understand the separation of concerns argument, but the maintainer instead makes some sort of emotional appeal about clutter and simplicity to explain why there is no line number support.


> the maintainer instead makes some sort of emotional appeal about clutter and simplicity to explain why there is no line number support

Clutter and simplicity is not an emotional argument. It becomes verifiably more difficult to maintain a code base with each new line. And since he's the one doing the maintaining, his position seems quite reasonable.

Secondly, he seems to have a strong aesthetic aversion to line numbers, and I can totally respect that. Frankly, among popular open source projects, those whose developers have the strongest sense of style and design and usually (not always) the best.


Many popular syntax highlighting libraries do include this functionality. For example, the popular Python syntax highlighter "Pygments" supports line numbers[0].

[0] - http://pygments.org/docs/formatters/#HtmlFormatter


I personally use Prism.js for my blogging, which is what Smashing Magazine and CSS-tricks uses. Prism includes line numbers.

I like Prism so far and don't have any issues with it yet



I understand the author's point-of-view on this, but disabling line numbers sounds like something that should be user-controllable. Right now it feels like a misfeature, whereas "we don't implement line numbers by default, but if you want them, do this and that and you can have them" would feel way more like a good feature to have.


Has anyone actually tried submitting a patch that implements it yet?

https://groups.google.com/d/msg/highlightjs/UVJaQcQNC1c/1C6U...


It's free software so that's no problem if you feel the need.


This is my favorite syntax highlighting JS library, by far.

The approach this library takes to building the highlighted DOM tree is very clever. It represents the original DOM and the highlighted code output as separate streams of tag open / close events and then merges them together. This allows the highlighter to maintain the pre-existing DOM structure of the highlighted code! The language auto-detection and sub-language handling are also very neat.

If you like interesting codebases, I'd recommend giving it a quick read. :)


Sounds exactly like what I implemented 4 years ago.

Here is where you extract the existing HTML tags: https://github.com/ioquatix/jquery-syntax/blob/97e38d08924d7...

When you brush the code (extract highlighing information) you provide the set of initial matches which are converted into a tree: https://github.com/ioquatix/jquery-syntax/blob/97e38d08924d7... - there is no merge step, they are merged in place in the tree which is then used to generate DOM (or whatever you like really) output.

I never implemented language auto-detection but I'm sure I've seen that implemented before prism before, perhaps Google's Prettify might have been one of the first to do it. It basically highlights using all available brushes and looks at which one matches the most tokens - not sure if this is how it is currently implemented though. My feeling is that you usually know what language you are highlighting, the auto-highlighter wouldn't always get it right, and the cost of loading all brushes/languages is pretty high (imagine for example you have 100 different languages supported, you'd have to do all of them)..

The sub-language handling in jQuery.Syntax is as precise as possible: https://github.com/ioquatix/jquery-syntax/blob/97e38d08924d7...

Basically, if you match some sub-portion of the code, you then highlight that using a different brush. Sometimes it's almost impossible to know though (e.g. diffs which may be incomplete).


In my opinion it's irresponsible to use highlighters like these. At least when you use them like advertised. Why does no one bat an eyelid at letting the clients handle the highlighting?

The highlighting (lexing) could (and should) be done ONCE at the server. Imagine some high traffic blogs with hundreds of millions of hits combined - all these clients downloading the highlighting libs + lexers for 50 different languages that will never be used on the blog anyway. The wasted energy here is probably even measurable.

Cons when letting the clients handle syntax highlighting:

- Much slower load times

- Wasted energy

- Can fail because of JS crashing

- Noscript users won't get any highlighting (and your blog targets technical users, right? They are more likely to use no script)

- Slow mobile devices will have a harder time loading the site (or depending on javascript engine - failing highlight)

- (if no site optimization is made) the client will download lexers for languages not used on the blog

Cons when the server do the lexing ONCE:

- NOTHING

Note that this rant isn't targeted at Highlight.js. It's a good highlighter - I use it myself at my blog. Except I use it on the server via a CGI script. It took me no more than 2 minutes to set up.


Except I use it on the server via a CGI script. It took me no more than 2 minutes to set up.

How about you write a tutorial about this and submit it to HN? This is probably a better way to make people adapt your approach than a rant, no matter how valid your points are :)


That's a good idea! I should make a blog post about it. And maybe be less ranty. ;)


Computing is cheap. Setting up and maintaining that is more human seconds than doing it clientside. I dare you to calculate the amount of energy this wastes on a blog that gets a billion hits.

This is a battle you cannot win and should not attempt to fight.


Sure, the energy argument is probably the least interesting of the ones listed above. Don't ignore the rest of them!


In my opinion it is irresponsible to post strongly worded opinions such as these, because you forget you may not be seeing the whole picture.

  Cons when the server do the lexing ONCE:
  - NOTHING
You are assuming a specific use case, where static code originally exists on the server and the highlighted version is included in a page that is viewed many times.

My use case for these libraries is different: users actually write 'code' in my app, which I wish to highlight as they type it. Of course many users will write the same fragments and energy could be saved by caching the highlighting of those on the server. However, the latency of that solution (roundtrip to server needed to highlight code) and the code complexity (cache expiration is one of the two 'hard' problems in SE) are much worse than for the client-side highlighting solution.


That's an entirely different problem I'd say, and the perfect use case for these libraries on a client! If you'd do it server side it would be like a local IDE sending the code to be lexed remotely...

My rantiness (I apologize for that) targets programming blogs/static pages.


At one point I got really into the idea of ASTs for languages. I've used pygments, the python syntax highlighter, with the intention of parsing the meaning of code.

These syntax highlighters are great but I think that underlie the lack of a defined and accessible AST parse definition for most languages. Highlight.js and others kind of just rely regexes-- https://github.com/isagalaev/highlight.js/blob/master/src/la...

It'd be great if we could parse through programming languages to get their meaning. I want to get tuples back!

``` a = 23 ``` would give

(variable(name=a), assignedTo, number(23))

This does exist for some languages, but mostly compiled ones. But it would make the syntax highlighting even more robust! Antlr is the best one around at the moment. http://www.antlr.org/


I believe CodeMirror does something like this. It uses restartable parsers (or it used to) to make parsing fast while editing.

I experimented editing code using jQuery.Syntax. I used the match tree it generated to figure out where to restart parsing and it was pretty fast, it would only re-evaluate the current line in most cases.


CodeMirror still uses the restartable parsers. But they produce a flat sequence of styles, not a hierarchical AST. E.g. `foo bar* baz` in markdown becomes <span class="cm-em">foo </span><span class="cm-strong cm-em">bar</span><span class="cm-em"> baz*</span>. Also, most parsers reuse a small set of style names (that are covered by themes) without much regard to semantic appropriateness. E.g. markdown lists cycle through 'variable-2', 'variable-3', 'keyword'.


We use highlight.js at Resonant Core for our blog posts. After reading the comments here, I'm considering forking it to support line numbers :)


It would be really helpful when presenting at Events.

A lot of events record the presenters screen and the presenter but expect the presenter to stand in one spot so they don't need a cameraman.

So, it does not work very well to point at a screen when presenting.


It should be a trivial patch. I'll see about getting it done tonight.


Famous last words ;)


jQuery.Syntax supports line numbers and is generally pretty awesome *

* I'm the author so of course I'd say that :)


Anyone know of one of these that supports some kind of callouts?

Basically I want to be able to annotate groups of 1 or more lines, something like this, probably using some kind of inline comment

http://reference.bitreactive.com/reference/images/tutorial/s...


jQuery.Syntax does support preserving the original DOM elements, so you could do this.


I've tried Highlight.js before and I was never really a big fan of it. It seemed too heavy last time I tried it. I've been using rainbow.js for a year or two now.

http://craig.is/making/rainbows/


Rainbow doesn't seem to have line numbers. Is there another module available that allows integration of lone numbers?


What's the recommended way of using something like this or prism.js to highlight a simple textarea containing code?

I'm not talking a full-on programming editor, that's a much larger scope. Rather, I have an admin tool with a <textarea> that allows entering short code snippets, and it would be handy to see highlighting as you type. Even at this low level of complexity, is it simpler to just embed something like Ace, or is there an easy way to use a highlighting library?


Have you looked at CodeMirror? It might be a better fit for highlighting editable text.


I like it, however, it is possible to make a "click all" checkbox for the "other" languages. You now, I mean: "click, click, click, click ..." ;)


Paste in console: $('input[type=checkbox]').prop('checked', true);


You sir are brilliant!


yea! thx!


This isn't really related to the act of highlighting syntax, but I noticed the example code for Elixir at https://highlightjs.org/static/demo/ is quite out of date. Records generally aren't used anymore, and the syntax to send a message to a process has changed from `pid <- message` to `send(pid, message)`


Used this in a project[0] a little while ago and found it really easy to work with. The syntax detection was an added bonus that I had expected to need another library for.

http://cmdv.io

edit: I should mention that this might be a cool example for someone looking to use hljs in React/Flux-- I tried to make it fairly clean. There's a github link on the bottom-left of the website.


Out of curiosity, are there any decent libraries you looked at for doing language detection? Thhat's the main reason I'm using Highlight.js in one of my projects instead of Prism or Rainbow.

Unfortunately, the tradeoff is no support for generating a display with line numbers. In my case, that was the lesser of two evils, but I wouldn't mind using two libraries if it meant I could get everything I want.


Just reading out of my notes, the only non-highlighting detector I was looking at was https://github.com/blakeembrey/node-language-detect, but it's not widely used or developed as far as i can tell.

If you have Ruby in your stack, it's easy -- use github's own https://github.com/github/linguist

You might also be able to use just the language definition files from https://github.com/syntaxhighlighter


Here's a hosted version of the HighlightJS developer tool: http://highlightjs-developer.appspot.com/

Planning to open source an http API (similar to http://markup.su/highlighter).


This is my favorite syntax highlighting library. The best feature is that it returns a continuation of the code being highlighted which means you can highlight code that's being streamed to you line by line. I couldn't find another library that does that.


I use this on my personal site, seen here:

https://www.dougcodes.com/go-lang/building-a-web-application...

Dead easy to implement, just set it and forget it.


Is this news? I've been using this for quite a while.


So, you using it seems like validation that this is a good contribution?


It's like posting jQuery.


Why prefer this over something server-side, like Pygments, when you can use the latter? I don't see many advantages to the client-side approach.


You may have users actually writing 'code' on the client side. For those cases a client-side solution gives a better experience that requires less code (no roundtrip to server needed).


Used this along with markdown to html conversion, works pretty well. will try prismjs.com mentioned here, was unaware of that though.


How does this compare with Prettify?


I did a comparison of Syntax Highlighter, Prettify and Highlight.js for weblogs.asp.net. I ended up preferring Highlight.js. My notes:

• SyntaxHighlighter

generates tons of nested tables, doesn’t work well with Bootstrap (overflows into right rail)

• Google Prettify

Worked okay, but kind of ugly. Had to use jQuery to apply “prettyprint” class to all pre elements.

• Highlight.js

Easy to set up, lots of themes, seems pretty quick, regular updates

Sample post: http://weblogs.asp.net/jongalloway/looking-at-asp-net-mvc-5-...

Highlight.js + styles are hosted on cdnjs, which makes it easy to host on a blog.


This looks really useful. I'm wondering if there is a way to get modern fortran in the fold.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: