A Git query language

Erwin · on Dec 14, 2016

This might benefit from SQLite's Virtual tables: https://sqlite.org/vtab.html

With Virtual Tables you can expose any data source as a SQLite table -- then you can use every SQL feature that sqlite offers. You can just tell sqlite how to iterate through your data with a few functions, with an option to push down filtering information for efficiency.

You can also create your own aggregates, functions etc.

Here's an article where the author exposes redis as a table within sqlite: http://charlesleifer.com/blog/extending-sqlite-with-python/

erydo · on Dec 14, 2016

My thoughts went straight to PostgreSQL Foreign Data Wrappers. Something like that would be really helpful!

andrewstuart2 · on Dec 14, 2016

Wouldn't that require a running Postgres server, though? Seems a bit heavy for ad hoc queries.

For something like a GitLab server, though, that would be amazing.

oneweekwonder · on Dec 15, 2016

According to the Ubuntu Dependencies[0] it already uses Postgres, it also needs Redis, so with gitlab we are a bit pass the heavy bit.

[0]: http://packages.ubuntu.com/xenial/gitlab

simonw · on Dec 14, 2016

SQLite virtual tables for git would be phenomenal - you could join your git history against data from other sources! And you wouldn't need to run a huge MySQL/Postgres server process to do it.

A very cursory search suggests no one has built this yet.

koolba · on Dec 14, 2016

This is pretty cool. Looks like it's local to the current repo which makes sense for most usage. Having something like this across a swathe of repos would be useful in different ways (ex: "What has Bob committed over all the repos for our projects that involves the string 'billing'?".

Minor off topic rant about the animated example: Who doesn't put a space at the end of their prompt after the $?! Ugh!

icefox · on Dec 14, 2016

Git map would work for many cases

https://github.com/icefox/git-map

  Git map ql ...

koolba · on Dec 14, 2016

Nice. I can see a need for this as a lot of my projects are structured like that (multiple sibling repos). Running 'git map log --grep ...' seems particularly useful.

Quick eyeballing of the source, it does not handle whitespace in directory names properly. The for loop would treat them as separate, invalid, entries.

Of course I'd also fire someone on the spot that commits a project directory with whitespace in it...

icefox · on Dec 15, 2016

Good point, switched it to a while loop.

somebehemoth · on Dec 14, 2016

Thank you for suggesting git-map. I tried it and intend to include it into my workflow. I thought your suggestion was a good one so I tested it. It did not work for me due to: "dyld: Library not loaded: libgit2.21.dylib". I assume this is something about my setup (mac, zsh, other stuff) but if you got this working I'd like to know so I can keep trying my with my setup. To clarify: both git-map and gitql work for me, I just can't seem to combine them.

icefox · on Dec 15, 2016

git map is just a 8 line bash script so perhaps checking if paths are setup correctly for bash to point to your git binaries if you use zsh most of the time.

iagooar · on Dec 14, 2016

I agree. Since we work with microservices, we have +10 repo per project. Would be nice to be able to scan through all of them.

nothrabannosir · on Dec 14, 2016

Plug: if you ever find yourself wanting to merge everything painlessly: https://github.com/unravelin/tomono :)

accompanying blog post: https://syslog.ravelin.com/multi-to-mono-repository-c81d004d...

OJFord · on Dec 14, 2016

You could submodule them all in an otherwise empty parent repo.

Bit of a hack, but could come in handy for other things (off the top of my head: "welcome to the team! Clone this one thing, it contains everything you need.").

matheusd · on Dec 14, 2016

Seconded on a multirepo tool for that.

I have a big "projects" folder with lots of different repos. I'd like to know all commits I've made on the past month, across all projects.

Currently I have a post-commit hook which sends the one line shortlog to a common file, but would be nicer to have a tool for ad-hoc queries.

OJFord · on Dec 14, 2016

> Who doesn't put a space at the end of their prompt after the $?! Ugh!

I've always had mixed feelings about that. Resolved it now, by using '>', no space, which doesn't look cluttered since there's only ~1px right next to the first char entered.

c8g · on Dec 14, 2016

active gitql https://github.com/gitql/gitql

somebehemoth · on Dec 14, 2016

Thank you for sharing this. I was interested to know if this was a fork of gitql by cloudson. It is not. The following issue clarifies the relationship:

https://github.com/gitql/gitql/issues/83

"The relation is its purpose: SQL + Git, written in Go. There is no relation other than that."

I thought this might help anyone else who was similarly interested.

mnovaes · on Dec 14, 2016

This one actually seems very promising.

masklinn · on Dec 14, 2016

Mercurial has a somewhat similar concept predating this (added circa 2010): revision sets (https://www.selenic.com/mercurial/hg.1.html#revsets) (for selection, and templates for selection but git has that built-in, kind-of, via log --format)

jordigh · on Dec 14, 2016

Mercurial's are completely general, though. Any Mercurial command that can accept a revision as an argument can also accept a revset expression. And templating isn't just for log, but for many other commands, such as grep or annotate (blame), and it's the same templating language for all of them. I also find hg templates a bit easier to read, because they're Djangoish/Jinjaish instead of being printf-ish like git's. Plus, you can save and compose Mercurial templates and revsets.

I was actually hoping that gitql had finally gotten inspiration from Mercurial and git would grow a general purpose query language, but it's read-only. :-(

aardvark179 · on Dec 14, 2016

Revsets are a wonderful feature, and it's something I wish git had. Just being able to say I want to see what has changed between this branch head and its latest common ancestor with trunk is an incredibly simple and useful thing to be able to do.

trolor · on Dec 14, 2016

Complete guess based on [1], but wouldn't

  git diff HEAD $(git merge-base HEAD master)

work?

[1]: https://stackoverflow.com/questions/1549146/find-common-ance...

aardvark179 · on Dec 14, 2016

In this case git can do the same thing, but notice you can only do it because git provides a special command for getting that revision. Recasts are really general (a greatest common ancestor function is provided, but can be easily synthesised from more primitive building blocks) and can be used everywhere, so you can bisect over changes you made that touched files matching a pattern, or whatever. They aren't something I use every day, but they are really useful on occasions and allow for some pretty robust tooling to be written.

jordigh · on Dec 14, 2016

This requires using bash. I find it kind of cheating that git ships bash on Windows so that Windows users can rely on bash for composing git commands. I'm not sure if Windows users are generally that happy about typing bash commands, but I guess nobody really cares what you have to type in as long as it's high in the Google hits for whatever operation you want to perform.

Mercurial's API (i.e. the CLI) makes a point of being usable with powershell and cmd.exe, which I think some Windows users appreciate.

dsp1234 · on Dec 14, 2016

Another way to look at this is that hg needed to bake this into their core, whereas git didn't need to. There is a non-zero cost to all additional code, so leaning on the shell to do work is generally a smart move.

insertnickname · on Dec 14, 2016

Git ships with Git Bash on Windows.

kilotaras · on Dec 15, 2016

Beauty of revsets is their ubiquity.

Diff would be `hg diff -r 'ancestor(default, experiment)' -r experiment`

Want to list commits in experiment? Just change diff to log. Generate patch file with all commits? Change it to export.

luhn · on Dec 14, 2016

You can do that! "master..experiment" means "all commits reachable by experiment that aren’t reachable by master" https://git-scm.com/book/en/v2/Git-Tools-Revision-Selection

jordigh · on Dec 14, 2016

I believe the goal is to get a diff, not a list of commits, in which case you need to figure out an expression for getting the last commit common to master and experiment so you can diff it with experiment.

luhn · on Dec 14, 2016

`git diff master..experiment`

jordigh · on Dec 14, 2016

I just tried that, and it seems to be the same as `git diff master experiment`. We don't want to diff the two heads. The command in Mercurial is `hg diff -r 'ancestor(master, experiment)' -r experiment`. Comrade trolor seems to have found the correct git expression.

dminor · on Dec 14, 2016

Try

    git diff master...experiment

(note there's three periods)

jordigh · on Dec 14, 2016

That works. Can you explain why? Does diff see this as a single commit or as set of commits? If it sees it as a set, how does it decide what two commits to diff from that set? If it's just one, what does it decide to diff?

I'm kind of confused because gitrevisions(7) says the triple dot is symmetric difference, but exchanging master and experiment does not produce the same output from diff.

dminor · on Dec 14, 2016

Not so helpfully it has a different meaning in 'git diff' than in 'git log'. Basically, it means the difference in the second branch from the first common ancestor of the two branches.

masklinn · on Dec 14, 2016

True all the way, but that may be a bit too much to chew on for people whose mind is already blown by gitql.

taspeotis · on Dec 14, 2016

    A Git query language (github.com)
    10 points by bryanrasmussen 1 hour ago

This ought to have (2014) in the title: Latest commit 49c1c17 on 22 Jun 2014.

patkai · on Dec 14, 2016

That makes it even more interesting, I mean the fact that this or something similar didn't get traction. I often have an idea of what I would like to know about my repo, but don't want to start hacking the answer together.

OJFord · on Dec 14, 2016

https://github.com/gitql/gitql (last commit 12 days ago)

palunon · on Dec 14, 2016

Not the same project (look at the number of commits, and the top issue).

OJFord · on Dec 14, 2016

I was responding to:

> interesting, I mean the fact that this _or something similar_ didn't get traction

I think you'll agree this is something very similar and that has traction.

bryanrasmussen · on Dec 14, 2016

sorry, didn't note the date, I just found it because I was needing something like it and was ready to start making it because I figured - better look if someone else did the work for me first.

Tarean · on Dec 14, 2016

Very cool. I always wanted to play around with a git provider for powershell. Powershell's syntax is great for queries and you could use everything that works on the normal file system with anything that has the abstractions implemented.

The syntax seems close enough that this could just replace it, though:

    ls commits | where date < (get-date).AddDays(-4) | where message -like *foo* | select autor, message, date  -First 3 | ft

chadgeidel · on Dec 14, 2016

I'm no powershell guru, but I'm using posh-git [1]. Is it possible to chain commands using that tool?

https://github.com/dahlbyk/posh-git

Coryodaniel · on Dec 14, 2016

Oh I love that the example gif includes:

  select author, message 
  from commits 
  where 'Fuck' in message

I'm pretty sure that query's results would fill my screen buffer.

jordigh · on Dec 14, 2016

Just for comparison, in Mercurial you would do

   hg log --template "{author}, {desc}\n" --rev "desc('fuck')"

Coryodaniel · on Dec 15, 2016

"Fuck" is right up there with "ch-ch-changes" in my git word cloud.

joslin01 · on Dec 15, 2016

andrewchambers · on Dec 14, 2016

I don't see the point in wrapping the data in all that ascii art noise, will make it harder to script with.

oneeyedpigeon · on Dec 14, 2016

Imagine if we could just have this automatically for every program that generated text output. It doesn't seem beyond the realms of possibility that every tool could either a) structure its text output in a way that can guarantee simple command-piping to a general purpose query-language processing tool or b) in the presence of a "--output-json" flag, produce json which can then easily be queried.

shakna · on Dec 14, 2016

Sounds like you'd like the object-based Powershell.

junke · on Dec 14, 2016

Or you could have a single address space (https://en.wikipedia.org/wiki/Single_address_space_operating...), and share objects directly.

amelius · on Dec 14, 2016

Sounds like a security nightmare.

junke · on Dec 14, 2016

But is it really? Couldn't there be a way to isolate things at the system level?

donut · on Dec 14, 2016

It's the UNIX principle/tradition that line oriented text is the universal format. It's quite flexible. But I share your feeling, and I keep hearing nice things about powershell.

Also, if you work with CSV files, look at textql.

0x54MUR41 · on Dec 16, 2016

Wow. It's awesome. I have never been seen project like this before. It seems very useful. Anyway, I think it would be better if they demonstrate the example of usage using asciinema [1].

[1]: https://asciinema.org

seliopou · on Dec 14, 2016

This can easily be accomplished using `git log`, `head`, and `grep`:

  git log --pretty="format:%an, %s, %ad" --after="2014-04-10" | grep "Fuck" | head -3

georgewfraser · on Dec 14, 2016

If you actually want to query git data in production, it's really a better idea to copy all the data into a real SQL data warehouse. If you're using github, my company (Fivetran.com) has a connector that pulls from their API.

ecesena · on Dec 14, 2016

It would be nice to see a plugin for presto

ujjwal_wadhawan · on Dec 14, 2016

support for "SELECT DISTINCT" would be great !

guard-of-terra · on Dec 14, 2016

But why does it have to look like SQL (and not like xpath or jquery)?

Not many people enjoy writing SQL statements on the command line. It's verbose, the order of things is arbitrary...

mwfunk · on Dec 14, 2016

I would assume that the whole point of the project is to be able to do SQLish queries on a git repo, and that it was written by and for people who are familiar with SQL and have a preference for it over other query languages. And they probably do enjoy writing SQL statements on the command line, however uncommon that may or may not be.

guard-of-terra · on Dec 14, 2016

Seems to be very narrow group. How is this relevant to HN? Flagged the story.

jsmeaton · on Dec 14, 2016

You flagged the story because you don't like SQL on the CLI? Come on. Whether or not you enjoy writing SQL on the CLI, SQL is a fine language for querying data, and is probably more common than xpath. JQuery seems like a strange choice too.

guard-of-terra · on Dec 14, 2016

I just fail to praise the attitude. "Let's faithfully reproduce the looks of 30 y.o. technology, pseudographic tables included, with make-believe over git".

Rudisimo · on Dec 15, 2016

Well, if you feel so strongly about it, why not build your own tool to support the "preferred" query language?

rodorgas · on Dec 14, 2016

That's a great idea!