AWS CodeCommit

thewopr · on Dec 2, 2014

Here's why I'm excited about this: Data Versioning

Github is awesome for science. But my code and workflows often are just as reliant on the data I have as the code I've written. With Github, I always have to treat my data separately. A different workflow, a different storage location, different (and often manual) versioning.

Now, I can start integrating my data into the same workflow. No size limits. Just drop it in and version it like everything else. 1GB? No problem. 500GB? Just pay the money.

This is especially awesome because, as a scientist/dev, I do not want to stand up my own infrastructure and servers/vms. I don't want to do linux updates. I don't want to worry about things going down. It just needs to be there when I need it. When I don't need it anymore, I'll take it down. Done and done.

theli0nheart · on Dec 2, 2014

We've build a Git extension (git-bigstore) that helps manage large files in Git. It uses a combination of smudge/clean filters and git-notes to store upload/download history, and integrates nicely with S3, Google Cloud Storage, and Rackspace Cloud. Just plop in which filetypes you'd like to not store in your repo in .gitattributes and you're good to go. Our team has been using it for a while now to keep track of large image assets for our web development projects. Cheers!

https://github.com/lionheart/git-bigstore

m0th87 · on Dec 2, 2014

If you're looking for an open source system for managing fat assets in git, we made a tool that integrates with S3: https://github.com/dailymuse/git-fit

mdonahoe · on Dec 2, 2014

How does this compare with git-fat?

m0th87 · on Dec 2, 2014

Git-fat uses smudge/clean filters. We used git-media which employs the same technique, and it didn't work well for us. See the section about git-media in the readme.

theli0nheart · on Dec 2, 2014

I maintain a similar tool to manage large files in Git [1], but went the clean/smudge filter route. I think git-media gets into states where it can process files twice even though it shouldn't. From the Git docs:

> For best results, clean should not alter its output further if it is run twice ("clean→clean" should be equivalent to "clean"), and multiple smudge commands should not alter clean's output ("smudge→smudge→clean" should be equivalent to "clean").

So this isn't the fault of clean/smudge filters, just the way they were used with git-media.

[1]: https://github.com/lionheart/git-bigstore

m0th87 · on Dec 3, 2014

We experimented with smudge/clean filters with our own implementation and they just didn't seem like the right solution for fat asset management.

The most frustrating problem was that filters are executed pretty frequently throughout git workflows, e.g. on `git diff`, even though assets rarely ever change. The added time (though individually small) created a jarring experience.

I'd also be curious how git-bigstore addresses conflicts. It seems like a lot of the filter-based tools out there don't handle them well for some reason.

thewopr · on Dec 2, 2014

I like the look of this. Thanks. Will investigate

anaisbetts · on Dec 2, 2014

I would ramp down your excitement, this doesn't solve any of the problems that Git has around large files. Dropping in a 1GB data file into your Git repo will cause problems for you, even if the server doesn't block it like GitHub does.

thewopr · on Dec 2, 2014

Oh sure, but at least my limitation in the future will be Git being able to process and handle large files (and maybe the cost of storage) and not Github's file and repo limit.

extesy · on Dec 2, 2014

It's reasonable to assume that Github's limits are caused by Git's limits. When Git begins supporting large files, most likely Github will be able to raise their limits as well.

sliverstorm · on Dec 2, 2014

You sure about that? Github is a SCM company, not a bulk storage company. When your business is not storage, arbitrarily large storage requests aren't interesting...

Xorlev · on Dec 2, 2014

Github has gone through great lengths to begin supporting diffs on all sorts of files, 3d volumes, images, and all sorts of assets. I'd be willing to bet that some sort of bundled asset management is someone's dream project at Github.

twerquie · on Dec 2, 2014

They mention large binaries on the product page enough times to make me think that they're rewriting Git.

anaisbetts · on Dec 2, 2014

Unless that version of Git is also running on your computer, it doesn't change anything. The problem is with local files, not remote ones

gfodor · on Dec 2, 2014

I'm not sure that a server-side rewrite addresses the core issue. Git is designed so that a git fetch pulls the entire repo down. The reason large binary files are a problem is because you are going to pull down the entire history of those files in order to clone the repo. Unless they're also shipping a custom git client (unlikely) then I don't see how they can solve this problem on their end.

edit: Come to think of it, the feature they could potentially implement their side is somehow making it so binary files have their history purged somehow. I'm not a git expert but if they are able to trick git into think the files are new, and have no history, each time they change then I guess that could work.

cowsandmilk · on Dec 2, 2014

the core issue is not fetches. The last time I tried to use git to manage scientific data, commits sometimes took ~10 hours. And this was with all text files, not binaries. This was ~100k files and 20 gigs of data.

rynop · on Dec 2, 2014

Have you ever looked into using an s3 backed git repo via jgit? Not sure how many files/revisions you have but it may be a cost effective alternative that you can start using today.

We use it for auto deploys from autoscaled instances cuz github has poor uptime.

thewopr · on Dec 2, 2014

No, but I don't understand what this is. Java git? Bound to eclipse? I don't really understand how this would be different than me storing a .git repo locally and replicating it to S3.

rynop · on Dec 2, 2014

jgit is a java implementation of git, that just so happens to have built in s3 support.

You can get a self contained cli (jgit.sh) from here: https://eclipse.org/jgit/download/ . It does not have an eclipse dependency.

you can then just add an s3 remote and push to this remote as part of your CI flow.

Here is a decent article that describes the end-to-end: http://www.fancybeans.com/blog/2012/08/24/how-to-use-s3-as-a...

amalag · on Dec 2, 2014

I love my jgit s3 git repos. It is super cheap almost free git storage/backup.

blaedj · on Dec 2, 2014

I've run into problems with large-ish files in git repos, binaries accidentally committed etc. I'm genuinely curious, is there a good way to use git for the size repositories that you mention?

_ondq · on Dec 2, 2014

Git is great for smaller binaries. Ideal in fact, given that it stores differences between revisions as binary deltas. For large >1GB files, I believe the diffing algorithm is the limiting factor (I would be interested in getting confirmation of that, though). For those files something like git-annex is useful (http://git-annex.branchable.com/)

I've used git to push around a lot of binary application packages and it's very nice. Previously I was copying around 250-300MB of binaries for every deployment--after switching to a git workflow (via Elita) the binary changesets were typically around 12MB or so.

blaedj · on Dec 2, 2014

Hmm, I'll have to take a look at git-annex, I've run across it but never investigated. Thanks!

thewopr · on Dec 2, 2014

I've had no trouble with non-github git repos handling text files well into the hundreds of megabytes. I haven't pushed the limit on this yet, but for me, personally, my datasets are often many hundreds to thousands of individual files that are medium sized. So it tends to work fairly well. Singular large files may not scale well with git.

notduncansmith · on Dec 2, 2014

I don't know of a great solution for Git, but I've heard that Perforce is more suited to handling large binaries - I believe it's used in many game development studios, where binary assets can number in the high gigabytes.

Jare · on Dec 3, 2014

Game developers need both support for large files and for exclusive locking, since assets like images, audio and 3d models are almost never mergeable.

AnnaAnisin · on Dec 2, 2014

Domino has a nice solution for version control and reproducibility in the context of analytical / data science work: they keep a snapshot of your code, data, and results every time you run your code. So it’s like version control plus continuous integration. Supports large data files, R, Python, etc. http://www.dominodatalab.com/

thewopr · on Dec 2, 2014

Yeah, but their system is super simplistic and doesn't support half of the diverse and robust operations that Git and Github do. Can I see version history? Are changes compressed into diffs for text files? etc, etc.

merlincorey · on Dec 2, 2014

Git stores whole objects, not diffs, in case you were confused.

yuubi · on Dec 3, 2014

Git stores diffs between revisions too: see https://www.kernel.org/pub/software/scm/git/docs/git-pack-ob...

Packs happen automatically sometimes per https://www.kernel.org/pub/software/scm/git/docs/git-gc.html

joshuak · on Dec 3, 2014

You may not be aware that data versioning is a built in feature of S3[1]. No VMs, no management, just put all stuff in one place and it's versioned automatically.

As others have pointed out Git is not a great way to do big data versioning. Git is almost exactly the wrong tool for the job, almost any other revision control system handles large file versioning better because you're not expected to clone entire archive histories to everyone.

[1]: https://docs.aws.amazon.com/AmazonS3/latest/UG/enable-bucket...

bonestamp2 · on Dec 2, 2014

I came to say the same thing. We have a lot of videos in one of our projects and therefore we can't host it on github due to the size limits.

elwell · on Dec 2, 2014

.gitignore isn't a viable option?

yoklov · on Dec 2, 2014

Not if you need them versioned.

mooreds · on Dec 2, 2014

> I don't want to do linux updates.

I'm no EC2 expert, but I believe you still have to do linux updates if you are running on AWS.

cddotdotslash · on Dec 2, 2014

If you use Amazon's AMIs they do a lot of updates for you. Recently, they updated their AMI for the bash and SSL security exploits. If you have a permanent instance, you would have to run some updates yourself, but if you use auto-scaling, you can just update the group and then when the instances cycle, the new updates will be applied.

Andys · on Dec 3, 2014

This is also the case with Ubuntu (the most popular Linux distro). The AMIs get updated regularly.

thewopr · on Dec 2, 2014

Not for their hosted version management solution, I assume.

gms · on Dec 2, 2014

What are the disadvantages to using plain Git for your data?

chousuke · on Dec 2, 2014

Git's storage model is such that every commit is physically a snapshot of the entire work tree. Git makes this efficient with delta compression (essentially, deduplication) which is extremely efficient for text (in fact an entire git repository can be smaller than a single SVN checkout) but is less effective with large binary files since changes don't produce compressible deltas.

koenigdavidmj · on Dec 2, 2014

> in fact an entire git repository can be smaller than a single SVN checkout

Only because svn downloads the plain text of the server's version of every single file from the revision, so that "svn diff" doesn't have to hit the server.

on Dec 2, 2014

[deleted]

xahrepap · on Dec 2, 2014

I don't think it's branching. From what I understand, it's changing the file. Rather than storing a diff for non-text files, it just keeps the old versions around. So if you store a lot of binary files that change very often, it makes your repo huge.

We had this problem before we moved to a dependency manager. Our repo was almost 2gb, even though the checked out code was well under 1gb.

on Dec 2, 2014

[deleted]

xahrepap · on Dec 2, 2014

The way I understand it, branches in git are essentially just a pointer to a revision, which automatically gets updated as you make changes. Branches don't have any overhead.

thewopr · on Dec 2, 2014

What if your data are formatting in text? Why would branching be any more expensive for data than for code then?

justinsb · on Dec 2, 2014

I've thought about building GitHub / BitBucket tools (issue management, pull requests etc) that work with data stored in git repositories, as opposed to one big tool that owns the git repository and manages the data in a locked-up SQL database. We already have this for code browsers (e.g. cgit), but that alone can't compete with the "full suite" that others offer.

It would have several advantages for the user:

1) You could pick your issue management separately from your code browser

2) You don't have to worry that the best tooling (Github) has some of the least reliable storage.

3) If it stored the data in git, it would work offline!

The big downside is that having to "assemble your own" is more complicated for the user, and that it isn't Github/BitBucket's business model...

swaroop · on Dec 2, 2014

FYI, Fossil[1] does distributed/decentralized bug tracking & wiki just as with the code.

[1] http://fossil-scm.org/ by the author of SQLite.

lnmx · on Dec 2, 2014

There is also Veracity SCM [1], but it seems to have gone dormant...

[1] http://veracity-scm.com/

stephenr · on Dec 2, 2014

I like the idea of some good, open source tools to do the different bits, preferably focused on flexibility and ease of data access (as you mentioned)

to me this would mean an issue tracker, and some kind of merge request/review tool. these could be paired with the likes of Gollum, providing a wiki.

One thing though, I would really love to see them support more than just git. Mercurial shouldn't be hard to support, as it has largely similar concepts, and even though SVN is not as popular as it was at its peak, its still widely used and a very good choice for some workflows/use cases.

In the same vane, I would love to see a decent stand-alone code browser that supports different VCS.

justinsb · on Dec 2, 2014

Thanks - great input (and the Gollum suggestion is a good starting point!)

On multiple VCS: On the one hand, git is so dominant (for better or worse) that I don't know if the complexity is worth it. On the other, perhaps git is so dominant because github is so dominant. The answer in open source land is probably to set up a reasonable abstraction layer and let interested people provide their own implementations of the VCS-interface.

stephenr · on Dec 3, 2014

I think I should also clarify a little..

My key focus for this would be on a more unix-philosophy approach (separate tools that do one thing well).

I like the idea of using a VCS repo (again, ideally any of those supported not just git) to store related information such as issues/tickets, etc however I don't know what the practical constraints of that would be, my first thought is that issues/tickets stored primarily in a (version controlled) filesystem (and then presumably indexed on a server for web views, searching etc) would not be as intuitive as say a wiki like gollum.

If the issues/tickets tool used a flexible SQL model (e.g. choice of mysql, postgresql and sqlite for instance) I would be happy with that, as the data is still quite open (to the system, owner, not as much to the individual developers admittedly)

hyp0 · on Dec 2, 2014

"distributed tools for distributed data"

Customers fear vendor lock-in, so an open platform/ecosystem for git could flourish, like java. Hopefully they'll have a better business model for it than Sun did. Maybe amazon's?

justinsb · on Dec 2, 2014

Figuring out a good business model (or a way to survive without one) is indeed the challenge here! I'm not sure that AWS's margins will support this (and historically they haven't been great at supporting open-source anyway)

The fact that so much of the world's private source code is on a single public website (GitHub), which has repeatedly been hacked via relatively easy exploits, is pretty frightening to me.

hyp0 · on Dec 2, 2014

Not sure it helps, but an observation is that Oracle's relational database occupied a somewhat similar role, in that it tied together data from different sources, and enabled cross-platform interoperation. This was very important at the time, because there were several different hardware vendors, and companies didn't want to get locked in. And of course, a whole ecosystem of third-party tools grew up around Oracle - esp reporting tools, BI, warehousing, and everything had bindings for relational databases.

Not sure how that translates as a business model, as no one owns git (like Oracle owned oracle). However, although Sun didn't make money from java, everyone else did - so there is a business model, just as a user of git, not an owner. e.g. reporting tools based on git; CI tools, code quality tools.

So maybe the answer is to just to treat git as infrastructure, for the app you sell, as opposed to trying to make money from the infrastructure itself. Linux is a similar product.

grandalf · on Dec 2, 2014

This is the key point. Would be great if all the other tools were integrated into git itself and worked in a distributed fashion.

In theory Github should be leading the way on this (per its open source mission).

stephenr · on Dec 3, 2014

No, that would not be "great" to have things like that built directly into your version control system.

Built on-top of it (like how Gollum is a wiki using git for storing documents) is not necessarily bad, but a built-in system is the worst possibly solution IMO

gry · on Dec 3, 2014

Both GitHub and BitBucket wikis are already managed by the respective SCMs.

    git clone https://github.com/snowplow/snowplow.wiki.git
    hg clone ssh://hg@bitbucket.org/BruceEckel/python-3-patterns-idioms/wiki

Also, a project atop git does issues: https://github.com/jeffWelling/ticgit

The idea is interesting to me.

kcorbitt · on Dec 2, 2014

To me, it looks like the biggest advantage over existing solutions is point 4: "Faster Development Lifecycle." If you're running all your infrastructure out of an AWS datacenter anyway moving your source control servers into the same center will make checkouts in automated deploys marginally faster.

rynop · on Dec 2, 2014

Have you ever used github for automated/autoscaled deploys? Github is too unstable to rely on. We autodeploy from s3 backed git repo via jgit.

fooforge · on Dec 2, 2014

Actually, things are running pretty smooth for months now - see for yourself at: https://status.github.com/graphs/past_month. If that's not your experience, you can reach out to support@github.com and we'll look into it.

rynop · on Dec 2, 2014

We love github, don't get me wrong (and we pay decent $ for it). But I'm looking at track record / experience with it doing production auto deploys for years.

S3 has 9 9s of durability and is "closer" to where our servers run so there are less possible points of failure. We have had 0 issues since migrating to s3 backed git repo 2 years ago.

We still use github for day-to-day scm mgmt as you guys have tons of extra value add.

softdev12 · on Dec 2, 2014

Not only faster, but better. I love AWS and love how they keep rolling out new features to make deployment easier. My current development lifecycle isn't yet close to perfect because of all the gaps with piecing together different providers and tool sets. To the degree that things can be streamlined and automated, all the better. In the end, I'm hoping AWS comes out with a 1 click continuous integration solution - from dev to deploy.

gexla · on Dec 2, 2014

Bezos to kids being born 3 years from now. "You kids are just features. Keep n eye on our announcements page."

It's interesting how they seem to be adding all the stuff in the workflow that we are okay with just being good enough and don't benefit from bells and whistles. Github is very cool for managing an open source project, but for my own stuff just about anything which makes managing my repos relatively simple and saves me a bit of time is fine. I don't need much for an interface or features. Just being able to push a button and having a hosted repo ready for other users and with instructions for less technical contributors is awesome.

If I'm already on Github, I'm probably staying there. Competition is good though. Competition will push Github to continue doing more to differentiate themselves as being something much greater than simply hosting Git repo's.

whichdan · on Dec 2, 2014

Does GitHub need to do anything beyond hosting git repos, though?

vladimirralev · on Dec 2, 2014

Strange use of "git-based" wording. Do I understand correctly, they implemented their own git command line binaries that you install to replace the original git binaries? Possibly a fork of git client and server?

justinsb · on Dec 2, 2014

Probably based on JGit/Gerrit. It is much easier to build "cloud" git (i.e. reliable, redundant git) that way than it is to make your disk reliable. Github went the reliable-disk route with DRBD, AFAIK, which is why Github is always going down ;-)

I wrote a blog post about this: http://blog.justinsb.com/blog/2013/12/14/cloudata-day-8/

btown · on Dec 2, 2014

Intriguing post! For those who hadn't heard about DRBD, it's http://en.wikipedia.org/wiki/Distributed_Replicated_Block_De... - it makes failover incredibly simple to integrate by pushing a lot of distributed-systems responsibility down to the filesystem level, but in doing so it forces you to have a single-master setup, and you can't take advantage of parts of your domain model that are well-suited to eventual consistency. Thinking about it, now that Riak supports strong (quorum-based) consistency on a per-bucket basis, Riak-backed scalable Git hosting would probably be relatively easy to implement. Looks like you were building your own system - did you ever release the code?

justinsb · on Dec 2, 2014

Ooops... looks like I didn't link to the code in that post. The whole series was open source; here's the git code itself: https://github.com/justinsb/cloudata/tree/master/cloudata-gi...

The git object data itself goes into a blob-store (like S3); it can be stored without strong consistency. It turns out you only need to keep track of a very small amount of metadata consistently (the refs). Riak, etcd, DynamoDB or the Google Cloud DataStore would all be good choices, I think.

I was working on an open-source implementation of Raft as part of this (called barge), but it isn't as reliable as the alternatives above - yet!

wmf · on Dec 3, 2014

As a counterpoint, here's the rationale for why GitHub uses regular git repo storage instead of webscale object storage: http://devslovebacon.com/conferences/bacon-2013/talks/my-mom... (maybe skip to 16m)

Given Amazon's completely different philosophy, it's natural for them to implement git on S3.

justinsb · on Dec 3, 2014

Very interesting. I hope that's not the real rationale inside Github, as it totally ignores the fact that the blobs are immutable and thus trivially cacheable.

frutiger · on Dec 2, 2014

The traditional git server is made up of primarily two programs: 'git-send-pack' and 'git-receive-pack'. These tools work based on stdout/stdin in extremely well-documented ways. It is not inconceivable to implement your own git server that adheres to the existing protocol.

flurdy · on Dec 2, 2014

Probably some wrapper layers or some complete complementary functionality. Like their RDS and Redshift are MySQL, Postgres - like but not really.

TorKlingberg · on Dec 2, 2014

Do they expect every user to install their modified git client? Is there no longer a full local repository on each users computer?

msoad · on Dec 2, 2014

The best thing about GitHub is it's UI. It will take a long time to polish a UI like GitHub's

djtidau · on Dec 2, 2014

Though many won't see any sort of advantage to switching, it would be very interesting to know what kind of effect this has on both Github and Bitbucket. There doesn't seem to be any major drawcard that will have people chomping at the bit to change over that I can see. I'm happy to be enlightened though.

On another note, is it just me or has Amazon really been ramping up with their releases lately? It feels like there is something new each week at the moment.

gfunk911 · on Dec 2, 2014

Competition is good, so I say hooray for this.

That said, doesn't seem to offer a compelling alternative to Github, based on what they're currently saying. There could be value in tight integration with other Amazon tools, but that seems like it'll come more from intentional lock-in than added value.

timsco · on Dec 2, 2014

Github is also very vulnerable on price if you have the need for a lot of small, private repos. If Amazon charges on storage (like they do with S3) instead of the number of repos, it will make this a compelling alternative for a lot of businesses.

0x0 · on Dec 2, 2014

Bitbucket is excellent for that use case. In fact, github often isn't even an option since even the largest plan is limited to 125 private repos. Bitbucket, meanwhile, has no limits on the number of repos (it's a fee per user instead)

icey · on Dec 2, 2014

> even the largest plan is limited to 125 private repos

Leaving the conversation about small plans aside, we can set up a plan for you that has as many private repositories as you need. Just email sales@github.com and we can get it set up for you.

sandGorgon · on Dec 2, 2014

Bitbucket is superb overall - the issue tracker, wiki and pricing are so much better than Github.

The only thing that Bitbucket devs wont add is gist :(

MichaelGG · on Dec 2, 2014

As far as I can tell, VS Online also allows "unlimited" projects and repos per project. And TFS's non-SCM features are pretty nice. Plus it's free for the first 5 users.

aragot · on Dec 2, 2014

GitHub is priced by repo, BitBucket is priced by user, and both non-negligible. I'm looking forward to knowing Amazon's pricing and web UI.

chrisan · on Dec 3, 2014

Surely they will follow suit as their other apps and price by actual usage with no 'unlimited' this or that.

Something like http://aws.amazon.com/s3/pricing/ combo of storage + requests

krisdol · on Dec 2, 2014

Our company has more than 125 private repos on GitHub. It just requires contacting their sales department first.

ruairinewman · on Dec 2, 2014

I don't know. I used to work for AWS, and this sounds very link an internal tool that is used to manage the entire Amazon retail infrastructure. That being the case, this is going to be a lot more powerful and useful for online infrastructure (in AWS, obviously) than anything else out there. Even if it's not all that initially, I suspect the extra functionality will come sooner rather than later.

It's somewhat annoying I know, but I'm not going to go into more detail as I'm not sure about the legality of my position should I do so.

Edit: On review of their website, I had missed the announcements for Amazon CodePipeline and Amazon CodeDeploy, which between them provide all the missing functionality I hinted at above. And so yes, it looks like this is the internal tool I was referring to.

idunno246 · on Dec 2, 2014

In the announcement they said they are basically releasing Apollo.

ignoramous · on Dec 2, 2014

CodeDeploy is essentially Apollo (lite, so to speak). CodePipelines is Amazon Pipelines. CodeCommit is persumably Amazon GitFarm (according to our former AWS engineer here).

We've got an amazing Builder Tools team here at Amazon, I must say.

yazaddaruvala · on Dec 2, 2014

Apollo(http://aws.amazon.com/codedeploy) is nice, but Pipelines are easily my favorite internal tool, glad they are releasing it.

http://aws.amazon.com/codepipeline/

frutiger · on Dec 2, 2014

> That said, doesn't seem to offer a compelling alternative to Github

I have limited (but at lease some) experience in working with GitHub Enterprise, and for the longest time, their answer to backup was to turn the entire system offline while you performed the backup. I believe they have since improved in this area, but it was clear that while GitHub itself offers an extremely available service, GHE is severely lacking in this respect.

CodeCommit on the other hand promotes high availability from the start.

fooforge · on Dec 2, 2014

> I believe they have since improved in this area, but it was clear that while GitHub itself offers an extremely available service, GHE is severely lacking in this respect.

That is correct. On earlier versions of GitHub Enterprise backing up data was quite a hassle. For a consistent (repository) backup taken on the VM level you didn't have to shut it down, but you've had to switch the appliance into maintenance mode, effectively preventing people from getting things done. This isn't the case anymore though as we've shipped new backup utilities [1] and support for HA setups with the 2.0.0 major release [2] some time ago.

[1] https://github.com/github/backup-utils

[2] https://enterprise.github.com/releases/2.0.0

MichaelGG · on Dec 2, 2014

Weird. Why not just leverage, LVM snapshots?

rodgerd · on Dec 2, 2014

> Competition is good, so I say hooray for this.

I'm not sure how the overwhelmingly dominant player in retail and hosting moving to become the overwhelmingly dominant player in source hosting can count as "competition", except in a Gatesian sense.

lentil_soup · on Dec 2, 2014

Off topic, but the sign up process is very bad. Why do they need full name, email, company name, role, address, phone number, catcha, etc to just sign up for more information?

andsmi2 · on Dec 2, 2014

It actually isn't out till 2015.... so this is just asking if you are interested. Perhaps it's an MVP -- with a signup form....and if there is enough interest they will pursue it...

lentil_soup · on Dec 2, 2014

and while we're at it, why the need for a "State" field that's only relevant to one country?

ceejayoz · on Dec 2, 2014

Did you click the state field? There are states for Australia, Canada, China, India, and others listed.

riveralabs · on Dec 2, 2014

I'm more interested in their pricing structure. Knowing Amazon, will we see something in the line of $0.25 per repo plus bandwidth.

aragot · on Dec 2, 2014

Knowing Amazon, will we see something like $0.0013 per thousand non-ASCII character, "because this is what it costs them after compression".

TillE · on Dec 2, 2014

A nominal fee on top of their usual storage and bandwidth rates sounds about right.

But I'm not sure if this gives you much compared to just pointing Mercurial/largefiles (or git-annex) at S3.

valarauca1 · on Dec 2, 2014

>Worry about scaling your own infrastructure

Does git have scaling issues? I know git manages the 60mil+ SLOC of the Linux Kernel, that's what it was designed for.

I don't see how even medium/large size enterprise will have difficulty. Or am I missing something?

cdr · on Dec 2, 2014

Very few organizations use git the way the linux kernel project does. Most organizations use github, gitlab, etc, which definitely have had scaling issues.

tedchs · on Dec 2, 2014

You might be surprised to learn the Linux Kernel is on Github: https://github.com/torvalds/linux

stephenr · on Dec 3, 2014

I don't think thats their sole central hosted copy of the repo though. I believe they use git.kernel.org

maccard · on Dec 2, 2014

Doesn't handle large binary files

MichaelGG · on Dec 2, 2014

As I understand, this is precisely why companies like Plastic SCM exist: to service game dev and other projects that need to manage large assets.

maccard · on Dec 4, 2014

Yep. We use Perforce in work, and from my experience, most other game dev's do too.

btucker · on Dec 2, 2014

This + GitLab might prove to be a great alternative to GitHub Enterprise.

flurdy · on Dec 2, 2014

Surely this replaces GitLab as well?

ceejayoz · on Dec 2, 2014

I don't see any indication it comes with a web UI like Github/Gitlab.

jdub · on Dec 2, 2014

There were screenshots shown during the re:Invent keynote. It looks remarkably similar to GitHub.

ceejayoz · on Dec 2, 2014

The only screenshot I can find is https://media.amazonwebservices.com/blog/2014/code_commit_ma... but that looks like it's just a code browser. No issues, PRs, etc.

jdub · on Dec 3, 2014

Yep, it's only revision control plus user interface.

oblio · on Dec 2, 2014

I'm shocked that at this point AWS is not offering a build service.

Or are they?

kuhhk · on Dec 2, 2014

They just announced Amazon CodePipeline and Amazon CodeDeploy.

http://aws.amazon.com/codepipeline/

http://aws.amazon.com/codedeploy/

kgrin · on Dec 2, 2014

AWS CodePipeline and AWS CodeDeploy (both announced simultaneously I think)

latch · on Dec 2, 2014

I recommend Gitlab for anyone looking for something Github-like. On a $10/m DO instance you can be up and running quickly [1]. There's a rake task for backups and it's easy to get it to sync to S3.

[1] https://www.digitalocean.com/community/tutorials/how-to-use-...

tthayer · on Dec 2, 2014

Their omnibus installer is ridiculously easy to use if you don't mind seeing Chef messages scroll past for 20 minutes. I had it tied into Active Directory and everything in less than an hour after downloading it. Compare that to using apache-svn and those goddamn LDAP sync scripts and apache conf folder definitions and it feels like you're using some shit from the future.

serkanh · on Dec 2, 2014

Atlassian Stash is also a good solution. I believe it is $10/annual for the first 10 users. https://www.atlassian.com/software/stash

forrestthewoods · on Dec 2, 2014

Interesting. I recently started mixing GitHub and S3 for blog content and it's a pain in the ass. Asking the internet about pushing GitHub content to S3 results in just a bunch of mediocre scripts. I like it.

I also look forward to Perforce's response. For versioned binary files Perforce is still the best. Competition, hooray!

arenaninja · on Dec 2, 2014

You mean to tell me I just spent a day last week setting up Jenkins, GitHub and a VM when I could've waited for my second week on the job and move it all to Amazon?

That aside.. This looks great. If I'm not fully satisfied with my setup (right now I'm not) I might make the move to this. Lower friction is always welcome

arafalov · on Dec 2, 2014

"CodeCommit will be available in early 2015" - not all is lost.

Gigablah · on Dec 2, 2014

I switched from Jenkins to CircleCI (they offer a free plan now). Took me only 20 minutes to get everything running, and with Docker integration to boot. Really pleased with it.

mtbcoder · on Dec 2, 2014

Personally I'd hold off committing any source code (outside of personal projects) to something this new anyway, especially if it's only your first week on the job.

jarito · on Dec 2, 2014

Doesn't seem like this addresses Jenkins? Just source hosting.

arenaninja · on Dec 2, 2014

> CodeCommit integrates with AWS CodePipeline and AWS CodeDeploy to streamline your development and release process. CodeCommit keeps your repositories close to your build, staging, and production environments in the AWS Clou

I think you're right

foolinaround · on Dec 2, 2014

If they can integrate gitolite into this and make a breeze to set up, then that would be awesome.

If they use custom permissions ( directly tied to IAM ), then it would be another closed solution like github.

sk5t · on Dec 2, 2014

A good start for AWS, but I doubt they will ever quite fill it out to Beanstalk's level for online browsing, commit reports, etc.

EpicDavi · on Dec 2, 2014

My uncle is a DevOps person and said all of the new tools that Amazon is rolling out would put him out of his job if they caught on.

hga · on Dec 2, 2014

But that would also a 100% bet on Amazon.

There will also be a number of companies where a "small" DevOps teams leverages a large number of cheaper systems (owned, leased or whatever) for lower cost.

jakejake · on Dec 2, 2014

If I were a DevOps person I would learn and master all of the AWS tools (as well as competitor services). It could be a great opportunity rather than a threat.

elchief · on Dec 2, 2014

Not to be a negative Nelly or anything, but with the speed at which Amazon is releasing this stuff, they are going to have to be a Google eventually and pull the rug out from under our feet, discontinuing several products.

debaserab2 · on Dec 2, 2014

These aren't being given out free, they are charged per usage - there is obvious incentives to keep the products alive.

Correct me if I'm wrong, but most failed Google products had no direct revenue stream other than advertisements.

ceejayoz · on Dec 2, 2014

AWS SimpleDB is an example of a product that Google would've likely killed years ago but Amazon maintains. It hasn't gotten any significant update since 2010 (http://aws.amazon.com/blogs/aws/amazon-simpledb-consistency-...) and, judging from the fact that it only gets a couple posts a month in its AWS Forums, it's very sparsely used.

hga · on Dec 2, 2014

Maybe they'll distinguish themselves by not doing any or very much of that at all.

Do you have any candidates for service termination?

cheshire137 · on Dec 2, 2014

Uggh no way. The AWS Web Console is a beastly enough UI to get around, I have no interest in migrating away from the very comfortable, friendly Github interface.

pearjuice · on Dec 2, 2014

I still have trouble trusting a company which I primarily know as "that all-round webshop" to get involved withy my tech stack. It feels as-if McDonalds would over night offer server infrastructure and get away with it.

mrjatx · on Dec 2, 2014

McDonalds doesn't make money from their website, which is the problem in your comment. If we were comparing McDonalds we would be comparing their fast food infrastructure.

If you were an up and coming fast food burger joint that was going from 1 shop to 2 shops, and possibly more shops, you sure as hell would want to ask McDonalds for it's strategic advice and further advice on how to run your burger shop as well as possible. They've been there, they've done it, not that you need to follow their guidelines, but you definitely want their knowledge in the back of your pocket. They've got about 70 years of experience.

Same with Amazon, who has to keep Amazon.com online or they lose massive amounts of money.

fletchowns · on Dec 2, 2014

Does Amazon actually use AWS to host Amazon.com?

mrjatx · on Dec 2, 2014

Geez, oh yes. They built their platform and decided to start reselling it. That's why Netflix, Pintrest, Airbnb, Expedia, Foursquare, NASA, the MLB all use AWS.

They converted 100% in 2011 according to Jon Jenkins https://www.youtube.com/watch?v=dxk8b9rSKOo#t=449

discodave · on Dec 2, 2014

Most, but not all of Amazon.com runs on AWS.