Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Tag – instantly jump to your ag matches (github.com/aykamko)
98 points by aykamko on June 13, 2016 | hide | past | favorite | 57 comments



For anyone else confused about what an "ag" match is: https://github.com/ggreer/the_silver_searcher

(Ag = Chemical symbol for silver)


Surprised ag isn't better known here. If you grep through code a lot and you haven't tried it, I'd heartily recommend it.


I have limited time for keeping up with non-standard tools. It's the same reason I use bash rather than zsh/ksh/fish. With grep I have a solid mental model of what it does. ack/ag have some magic that is fine most of the time, but will eventually bite me when I do something unusual.

Still, I will check out ag. Maybe this time I will stick with it.

On a side note: People often say that C is just for legacy, and today everything new should be written in a higher level language unless it's a kernel or otherwise inherently low-level. Yet ag is basically a rewrite of ack in C.


Can someone explain how searching files is "embarrassingly parallel"? I imagine storage drive to be sequential, the reading head can only be in one place at the time. For SSDs, this may be different, that I don't know. Does this speed up only apply to SSDs? Or is the I/O not the bottleneck here and instead the project benefits from the searching of memory cached files being somehow well parallelizable (memory still seems basically sequential, but is much faster, maybe fast enough to make this problem overwhelmingly CPU-bound?)


I'm the author of ag. Hardware may or may not support it, but the algorithms involved are embarrassingly parallel. Searching involves no data dependencies between files. If you're searching 1,000 files, finding results in file #827 doesn't require any information from files 1-826. That's what I meant by "embarrassingly parallel".

Side note: originally, ag was single-threaded. When I added pthreads, it only gave a 15% speedup in my test benchmark.[1] The speedup is larger if checking a file for matches requires more CPU usage (such as when using a complex regex).

1. http://geoff.greer.fm/2012/09/07/the-silver-searcher-adding-...


EDIT: On second thought you basically answered my question in the last paragraph of your reply, I'm keeping it up in case you want to confirm my understanding, but the way I read it, the 15 % speed up comes from the fact that indeed CPU processing (the regexes) was comparable to the file reading in speed and breaking up the bits to individual core caches is what gives way for the speed up, memory is fast enough for this to be possible.

I don't understand the I/O part of the whole deal. AFAIK you can't read multiple files at once off the drive, so are the files read sequentially to memory and then searched in parallel in memory? Does memory allow you to read multiple places at once, even with multiple threads? Surely the files are too large for the processor caches to be of any effect. So if you'd entertain my curiosity a bit more, is what's happening the fact that multiple files are cached in memory and memory is so fast that loading bits of the files to the processor cache takes less time than regexing those bits, moving the balance of this process to the CPU-bound side? Is the actual parallelism in the fact that multiple cores can search their individual caches at the same time and loading those caches from the RAM is fast enough to not become a bottleneck? Sorry for possibly amateurish question, I've never dug deep enough when it comes to parallelism to understand this, but I spent a good amount of time thinking about parallelising I/O stuff and came to the conclusion that I/O must be magnitudes slower and thus always is a bottleneck and any sort of I/O-bound problem (which file search surely is) must be non-parallelizable and instead can only be sped up by keeping indexes the way some OS's do.


One of the more important jobs a system's OS does is manage I/O devices. Modern kernels don't wait for a read request and then go and read from the disk drive. They both read ahead, anticipating future read requests from the pattern of requests already made, and cache as much read data as they can to avoid touching the disk when a file is revisited. (I was a kernel architect on IBM's AIX.)

While coding, your overall system is really just idling and it wouldn't be unusual for many of your projects header files and source files to be cached in memory because of your last compile. This effect is more pronounced on machines with a lot of memory of course. The cost savings from careful disk management is very important overall, but will it speed up the ag application? I'm surprised that ag only gets a 15% speed up with threads, but naturally it will depend on many factors.


Very cool insight, do you know if the Windows kernel does the same? I'd like to learn more about it in context of Windows first, then Linux.


Yes Windows does disk caching. This looks like a nice explanation:

https://msdn.microsoft.com/en-us/library/windows/desktop/aa3...


On a modern linux server the kernel caches files aggressively.

As an example, we have a server that can saturate 40 cores of CPU during load if fed data quickly enough. On the first load after cold boot, it takes about 2 minutes to run (fed by SSDs). Second boot, about 30 seconds.

If we run a sequential data load off SSD, it's more like 8 minutes on the cold run. So, even off non-RAM storage, parallel reads can help a lot.


I didn't know that, thanks for sharing! It's really interesting and might come in handy to know this.


Can accessing data through vfs cache be parallel? On systems with lots of free RAM (that system can use for caching) ag is faster after the first run on same directory tree. Of course speed up is mainly due to cache in itself.


Indeed. One search, and it's all in memory in the disk cache - the files involved are typically small so it should all fit.


There is already a ag plugin for vim that will jump you to the matches.

See: https://github.com/rking/ag.vim

or see: http://codeinthehole.com/writing/using-the-silver-searcher-w...


Yes, you're right! I mention this in my README:

> Inside vim, vim-grepper or ag.vim is probably the way to go. Outside vim (or inside a Neovim :terminal), tag is your best friend.


No, the way to go is to set the `grepprg` option.


And in case you need more flexibility juggling one-off calls to various similar tools, try:

:cex system('grep_or_ag_or_whatever -flags pattern') | copen

With proper ':set errorformat=...', you can call your compiler with this, or 'go test', or much more various stuff (even Go panic stacktraces).


And helm (the... well, not sure what's the actual name of what helm does, it's an emacs plugin) has an ag plugin so you can search and jump to matches


You can also use ag with unite.vim.


Does anyone else have the trouble that I do in watching these short animations introducing some feature? I must be slow because I always have to watch them two or three times to figure out what they are trying to show me. I think if they ran at a more leisurely pace I would comprehend them faster.

The worst are new editor packages; trying to follow what's going on in a Vim session or an Emacs session with the author flying through the features of his or her new package without commentary is always frustrating to me.


TL;DR: it does this by generating shell aliases which use $EDITOR to open the file. Neat, but does it clean up the aliases when you use one? And what if you don't use any? Kinda makes my OCD cringe a bit.


Agreed. I've had this `git grep`-based solution in my `~/.gitconfig` for a while now. It's a bit tidier, but not as featureful.

    [alias]
            grepgvim = "!f() { git grep -l \"$@\" | xargs gvim -p; }; f"

It seems "Tag" is a little fancier in that it lets you go to specific matching lines, for the specific match. Tag is a command-line equivalent to `git cola grep <pattern>`, where you can select the matches visually in a GUI (with keyboard shortcuts) and then launch $EDITOR (e.g. gvim) on the matching line.

To the Tag author, maybe users would find this tidier if `tag` let you edit matches directly by invoking e.g. `tag func -e 96` and tag will do the rest?

In such a scheme someone could even set the alias file to `/dev/null` and it'd allow both the current workflow and a new one that didn't rely on aliases.


    EDITOR="FOO"
    $EDITOR $FILE
Are there issues with this? I can't imagine a reason they'd need to export $EDITOR to the parent shell process.


It looks like the alias file is overwritten with each invocation.


The alias file is indeed overwritten on every invocation, but tag doesn't attempt to clean up stale aliases from previous invocations. I figured it wasn't really worth implementing this since, as far as I know, shells don't slow down from having "too many aliases".


tag is very cool, but please don't implement the shortcuts with shell aliases:

  ~/badcode$ echo "func" > badfile.go\;\ echo\ \"Gotcha\!\"
  ~/badcode$ ls
 badfile.go; echo "Gotcha!"
  ~/badcode$ tag func
 badfile.go; echo "Gotcha!"
 [1] 1:func
  ~/badcode$ e1
 Gotcha! +1
In case that's not clear, I was able to create a file with a name that contains potentially malicious shell commands that is now bound to an alias via tag.

Other ways to implement the shortcuts: create a flag or subcommand to look up an hit from the results file and jump to it (I can alias this command to suit my taste) or prompt me for input before returning to the command line.


This sort of reminds me of fasd[1]. The thing is I think I would prefer a menu instead typing "e6". Typing line numbers to go somewhere seems to be an inherent Vim user trait that I just haven't picked up (ie typing the command plus some number is just too much of a cognitive load for me.. I would rather press the keys over and over again).

Maybe you can pipe ag to the venerable "dialog" (yes that thing is garbage and reminds me of old school RedHat installs) but I wonder if someone has something better.

[1]: https://github.com/clvv/fasd


why not have a separate command rather than shell aliases? i.e.

   $ tag func # returns e1..e4, e5, e6 and dumps them in .tag-locations
   $ tagopen e4 # uses .tag-locations to open $EDITOR

you still have scary global state, but at least it's limited to the `tag/opentag` command. Plus, you can do `tagopen` without arguments and show a menu :)


Cool. For emacs, here's an interface for narrowing down `git grep` and `ag` results that I've been enjoying using: https://github.com/dandavison/emacs-search-files


How does it compare to helm-git-grep / helm-ag ?


Not as good :) Thanks, you got me to finally try out helm and helm-projectile. Mine did a couple of things, like searching for function defense that might be nice to implement in helm-world. I expect someone has already.


Another tool that uses ag to look for tags (emacs): https://github.com/jacktasia/dumb-jump


https://github.com/syohex/emacs-helm-ag is also super cool, although it works from within emacs; you can interactively preview the matched areas without opening files, and jump directly to matches. tag seems really cool for people who prefer working more directly in the terminal, though (or people who don't use emacs).


Cool but I'm not a big fan of the idea by storing "state" in aliases that later commands can reference. (It's better than environment variables though.)

For source code I've had a really good experience with ctags[1].

[1]: https://github.com/universal-ctags/ctags


What exactly is ctags? That README and the accompanying docs seem to only describe how the project has changed from an unseen progenitor on SourceForge.


It seems to me that a more generic solution would be preferable. For example, why should this work only with "ag"? Also, this could work with vim to allow the user to skip to the next match more easily.


Similarly, I'd love a more generic version of this (kind of like Vim's quickfix and `errorformat`). Typing `make` or `cargo build` and jumping directly to any errors would be extremely fast. :)


Cool idea but terrible name. No one will ever be able to google this.



if by "being able to google" you mean type "tag" in search engine, then sure.

but that hardly qualifies as "being able to google".

RTFM


Are we supposed to know what "ag" is?


If you haven't tried https://github.com/ggreer/the_silver_searcher yet you're missing out

The progression is grep with a ton of flags (or even find | xargs grep) -> ack -> ag.

ag is faster than ack, automatically understands .gitignore, gives you .agignore as well, and is just a really nice piece of software.


ag has issues with .gitignore, as patterns are not recognized properly. The matcher that git uses cannot be included due to licensing (see https://github.com/ggreer/the_silver_searcher/pull/614).

Does seem like there is hope, as a BSD licensed version exists. My fingers are crossed that this is solved soon :)


ag uses .agignore

I often need to search only on a subset of the files checked in git. Example, I don't search the min.js files but I need them for the deploys.

It would be nice if they could share the same format.


inside a git repository I would recommend using `git grep`


My experience with "ack" was that it seemed great for a few days and then I realized it was silently skipping searching certain files, causing hits to be suppressed, so I stopped trusting it. :-/


That or follow the link given with the very first mention of "ag" in TFA.


as a long time ag user that is neither using vim or emacs, I salute you!


Not sure about this wrapper for an unknown tool, "ag". I don't know, what's wrong with good ol' grep -Hnir ?


Ag is way faster.


It seems an order of magnitude slower to me:

    $ time grep -r some_token .
    real	0m0.467s
    user	0m0.252s
    sys 	0m0.215s

    $ time ag some_token
    real	0m2.948s
    user	0m0.112s
    sys 	0m3.083s
(Both run twice to ensure the disk cache was warm).

Am I doing something wrong?


Could be to do with where you're searching. The fact that ag skips everything in your .gitignore seems to have helped when I tested. Both of these were run at the root of my projects directory...

  $ time grep -r f_admin .
  35.87s user 6.50s system 65% cpu 1:04.47 total

  $ time ag f_admin
  1.51s user 4.94s system 196% cpu 3.284 total


I'm searching mostly C code. I'm in the src directory, so there's nothing there except the source code.


What were you searching? I want to reproduce this.


It's a proprietary codebase, but it consists of around 500MB of mostly C and C++ files and their headers. There are a few other files (makefiles, a few perl/python scripts).


Is there any chance you could create a reproducible test case? I'm very curious as to what's causing the slowdown.


And it automatically ignores .git, .svn and other distractions, has plenty of handy options (`ag --nojs --noruby --perl some_text`)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: