Amazon CodeGuru – Preview

dakna · on Dec 3, 2019

So let me get this straight:

Amazon packages open source software (Linux, Postgres etc) in a way that is an abstracted service (RDS, EBS, Elastic Load Balancer). They add so many abstracted building blocks that you need a special skill set to manage them (Aws Certified Solutions Architect) instead of knowing how to do this with bare metal or a container image running in your own data center.

And now that things are complicated and developers might make mistakes using those services, they add a profiler that inspects your code running in production and a reviewer that ties into the stage before deployment. All just to optimize the use of their own services.

From a business perspective this is an awesome way to get vendor lock-in to a much higher degree. They are basically the certifying authority that tells you if your intellectual property (your code) conforms to their own standard. Yes, they show examples of standard Java optimizations, but it clearly says it detects deviation from best practices for using AWS APIs and SDKs.

And people were mad at Microsoft for shipping a non standards compliant browser as default and enriching it with HTML tags and plugins that would only work in that browser. Little did we know.

I personally wait for the "Amazon Compliant Code" label in the not too distant future as a selling point for business people.

seibelj · on Dec 4, 2019

Amazon has made scalable, performant, high-availability systems constructable by a 10 person team that serves millions or even billions of people. Before AWS it took thousands of people and billions in capital investment to do so.

Sure, google and Microsoft and IBM joined the party, but AWS was first and remains the best holistically. This is their moment of domination, and eventually something will knock them down, but they have made so many companies so nimble and powerful in ways that were impossible before. Go Amazon.

mythz · on Dec 4, 2019

>Before AWS it took thousands of people and billions in capital investment to do so.

WhatsApp Stats (2014):

- 450 million active users, and reached that number faster than any other company in history.

- 50 billion messages every day across seven platforms (inbound + outbound)

- 32 engineers, one developer supports 14 million active users

- $60 million investment from Sequoia Capital

Which they managed their own FreeBSD servers hosted on SoftLayer.

[1] http://highscalability.com/blog/2014/2/26/the-whatsapp-archi...

YouTube (2008):

"YouTube grew incredibly fast, to over 100 million video views per day, with only a handful of people responsible for scaling the site"

- 2 sysadmins, 2 scalability software architects

- 2 feature developers, 2 network engineers, 1 DBA

"They went to a colocation arrangement. Now they can customize everything and negotiate their own contracts."

"Sequoia invested a total of $11.5 million in two separate rounds and was the only venture firm to invest in the company." [3]

[2] http://highscalability.com/youtube-architecture

[3] https://www.nytimes.com/2006/10/09/business/09cnd-deal.html

kbenson · on Dec 4, 2019

I'm pretty sure GP is mistaking valuation for capital. Serving half a billion or more people probably nets you a billion dollar valuation or more these days, but it in no way requires a billion dollars to provide that service in the vast majority of cases.

There is a sweet spot where cloud is good and provides some benefit but, once you're serving hundreds of millions of people and have double-digit millions in investment, you can probably do significantly better cost-wise rolling your own servers. Worst case, you just throw your own hypervisor management system on them and have most of the same features you got from a cloud service. If you're smart, you can probably architect it so you have on-demand overflow capacity from a cloud provider in case there's a spike you can't account for, which is the best of both worlds.

pard68 · on Dec 4, 2019

This is how we do it. Two on prem datacenters, one colo, and a handful of on-the-ready cloud providers. We serve far fewer users, but we also are getting 20 to 50k per user per year. Needless to say, at the scale we have cloud is out of the question except in catastrophic scenarios.

notatoad · on Dec 4, 2019

Back in the day, reddit was definitely serving a couple million users with like three staff and on-prem servers.

and yeah, they were down all the time, but that didn't seem to matter to their growth.

vidarh · on Dec 4, 2019

I've built high-availability systems that served millions with a <10 person team years before AWS even existed, at a time where our server racks had less combined capacity than my laptop does now, and the dual fridge sized storage array we used had less storage (and IO capacity) than the M.2 drive in my laptop does now.

The part of that solution which was related to making the system scalable was written by two of us, who also did other things (it involved a partionable backend storage service, and a user registration service, that combined to let us migrate users between servers to even out load and partition storage; everything else was stateless).

This idea that AWS is necessary to build to scale with small staff just does not match reality. My years of consulting also showed me that I'd earn more from clients who insisted on AWS - they typically spent far more time and resources on devops (and spent far more on hosting overall).

AWS is convenient, and it's great when you can afford it, but it's expensive and still requires substantial devops effort.

EliRivers · on Dec 4, 2019

Go Amazon.

Indeed. Hopefully, soon they'll stop selling physical items in that online store they have and focus on their strengths, so that other companies, who might be able to do better at selling things that aren't so frequently counterfeit that I no longer buy anything from Amazon, can have a go.

chance_state · on Dec 4, 2019

>Before AWS it took thousands of people and billions in capital investment to do so.

Could you expand on this?

I can't tell if you mean to launch a service/company or if you're talking about some large scale.. thing.. I haven't heard of.

dumbfoundded · on Dec 4, 2019

Instagram comes to mind. 13 employees and $1B acquisition.

Hard to imagine that without AWS.

jakear · on Dec 4, 2019

WhatsApp? Similar story, no AWS.

dumbfoundded · on Dec 4, 2019

Well, WhatsApp was started by industry experts in scaling. If you're making a point about functional programming, I'd tend to agree but from a business perspective I'd look to why Netflix still uses AWS: https://www.quora.com/Why-does-Netflix-still-use-AWS

gurrone · on Dec 4, 2019

Still they place FreeBSD running on prem hardware in the ISP PoPs as a caching layer because it absolutely makes sense. https://papers.freebsd.org/2019/FOSDEM/looney-Netflix_and_Fr...

cupofsludge · on Dec 4, 2019

I think people tend to forget how often WhatsApp experienced outages in the early days.

jakear · on Dec 7, 2019

This actually speaks volumes. Yes, people forget. Yes, it’s possible to suffer severe growing pains and still get acquired for $$$$. No, you don’t need to start with everything-AWS to ensure 99.9-whatever% uptime. People forget.

nine_k · on Dec 4, 2019

An $1B acquisition may have nothing to do with any colossal infrastructure. Was it indeed colossal?

dumbfoundded · on Dec 4, 2019

I don't know what you'd consider colossal but the migration doesn't sound fun: https://www.wired.com/2014/06/facebook-instagram/

vidarh · on Dec 4, 2019

Why? Sharding and caching blob storage and activity feeds for Instagram type sites is among the easiest category of sites to scale.

sitkack · on Dec 4, 2019

Not it didn't. This is hyperbole.

randomidiot666 · on Dec 4, 2019

Moore's Law made it possible, not AWS.

didibus · on Dec 4, 2019

I'm not saying lock in doesn't exist. But I can't really envision how you would design a full cloud offering without vendor lock in of some sort? Beyond offering all services as open source so you can run them on your own data-center.

I'm being serious, how do you design an API and a set of distributed intercommunicating systems in a way that doesn't couple you with their specific APIs, communication channels and semantics?

I don't think it's possible.

I can see suggesting using open source solutions instead that you run on your own, but that still couples you to those specific solutions, except they're open source so in theory you could fork it and have more control over them. I get that. But this is a different argument I feel. Since the cost of maintaining these open source products on your own is high, and the cost of switching to a different open source solution is as much as moving to another cloud provider.

I think the only form of lock in right now that might seem designed by the business, and not an artifact of the tech itself, is the high price of exporting your data out.

dakna · on Dec 4, 2019

The fact that millions of people were able to read this message almost instantly after it was written, without knowing anything about the device it was written or the location it is coming from, shows it is possible to decouple specifics and adhere to open standards for all players involved. There is just no incentive to do that for what AWS provides. It is a cash cow, exactly because of how useful it is if you have this set of problems they solve.

I think AWS is basically a large SaaS that sells you solutions to problems you have at scale. I don't think the lock in is only in the effort to export data, the lock in is also that a company will use the same building blocks for every new project or new feature on existing projects because their current staff is already trained and new hires don't know how to do it without those services either.

So just like in the 90's nobody got fired for buying IBM, today nobody gets fired for using AWS, even though they don't have problems at the scale AWS is great at.

phillipcarter · on Dec 4, 2019

Lock-in is unavoidable with the cloud and cloud vendors - but I think the issue here is the degree of lock-in and the way people get there. This sort of stuff by AWS is very clearly seen as luring people in with OSS, industry standard, vendor-neutral technologies and then pulling a fast one on them. Whether or not that's the _actual_ goal I can't say, but I can see how it looks that way.

msluyter · on Dec 4, 2019

I'm not saying lock in doesn't exist. But I can't really envision how you would design a full cloud offering without vendor lock in of some sort?

I believe this is what Rackspace was attempting a while ago -- leveraging OpenStack to provide cloud services, so you'd be free from lockin in the sense that you could move to some other OpenStack compatible provider.

Thorentis · on Dec 3, 2019

> "Amazon Compliant Code" label

Wow, this is a scary but very real thought.

Though, the "Certified Windows XP / 7" stickers on hardware and video games / other software was quite common back in the day and isn't too dissimilar.

I would argue that the level of vendor lock-in Amazon is going for is far greater than Microsoft's.

danhhieu910 · on Dec 4, 2019

It's not that evil. The first time used AWS, I really enjoyed the power and easy to set up the whole system from scratch everything (network, servers....) If it's on-premise, it'd cost a ton of effort of a lot of ppl. AWS is not that's hard, the console + UI is user friendly and they have a great document site so you don't really need to get these AWS's certs (I had one about 3-4 years and honestly the information in these tests are not helpful, rarely 1 people need to remember a lot of details in 20+ services of amazon, whenever you want to use one just start reading document about it). A new service from AWS - that's great. At least that is a new option for end user.

adreamingsoul · on Dec 4, 2019

sheeshkebab · on Dec 3, 2019

Maybe at some point there will be an antitrust lawsuit to unbundle them - and say have compute or storage be provided by a different vendor, all from within their aws management console.

ken · on Dec 4, 2019

I don't care if AWS is big and bundled as long as they use standard interfaces. Linux is big and it doesn't really matter that there's not much competition. I don't care what system (or even virtual machines or containers) runs my program.

If there were an open standard for "way to upload and store and serve files on the web", and S3 happened to implement that standard, and other companies and open-source projects did as well, then it wouldn't matter to me if AWS was the bundling king or not.

There was nothing magical about the design of Unix, either. It's not the only way to make an operating system, or even the best way. It survived because we got many competing implementations which were basically source-compatible. It really took off when we got free clones that anyone could run on their PC.

didibus · on Dec 4, 2019

At this point, S3 IS the standard for that. And Google's and Microsoft's offering all mimic that API. This could happen for all other of their services.

apocalyptic0n3 · on Dec 4, 2019

Not just Google and MS, many smaller services too. Linode and Digital Ocean's object storage both use an S3-compatible API and a number of open source, self-hosted services do as well from what I have seen. The S3 API is the defacto standard at this point.

gregdunn · on Dec 3, 2019

Disclaimer: I work at AWS on an unrelated team. I was not involved in development of this product. Opinions stated are my own, and not necessarily a reflection of my employer. Nothing here is being posted in any sort of official capacity.

There's lots of focus here in the comments on the code reviewer portion, but one of the things I'm most excited about is the profiler - https://aws.amazon.com/codeguru/features/

I do a lot of performance engineering work, and one of my go to tools for visualizing where programs are spending their time is flamegraphs. While you can certainly create them with profilers besides CodeGuru (and I do not work with Java, so I haven't yet had the chance to check out CodeGuru for any of my use cases), I'm super excited about anything that gets more people using them. They make it very easy to see where your optimization opportunities are, and I have personally found them very useful when working with our customers - they're way easier, in my opinion, to go through and explain than just looking at raw perf output or similar.

richdougherty · on Dec 3, 2019

A profiling tool I want to try out—it seems almost magical—is Coz. It can estimate the effect of speeding up any line of code. It does this by pausing (!) other threads, so it gives a 'virtual' speed up for that line.

What's interesting is that this technique correctly handles inter-thread effects like blocking, locking, contention, so it can point out inter-thread issues that traditional profilers and flame graphs struggle with.

Summary: https://blog.acolyer.org/2015/10/14/coz-finding-code-that-co...

Video presentation: https://www.youtube.com/watch?v=jE0V-p1odPg&t=0m28s

Coz: https://github.com/plasma-umass/coz

JCoz (Java version): http://decave.github.io/JCoz/ and https://github.com/Decave/JCoz

antpls · on Dec 4, 2019

I have never heard of this kind of profiling before, thanks for sharing

skynetv2 · on Dec 3, 2019

Have you tried https://cloud.google.com/profiler/ ?

antonhag · on Dec 3, 2019

Yes, indeed! Especially when paired with an continuously running profiler, one can learn quite a bit about one's code. It's actually rather surprising to me that they have not quite caught on earlier.

A bit of an (almost) shameless plug is a project I have been working on at https://blunders.io. A bit similar to the Code Guru profiler, but with a different feature set.

tclancy · on Dec 3, 2019

Seconded. I used them a lot when I worked for myself on old-school single server apps but have struggled to convince my team now that I work on something spread across AWS instances. I'd just brought the concept up again this week for a hackathon project but this looks like we could buy our way to what I want for cheap (compared to overall hosting). I suspect it may pay for itself.

kugelblitz · on Dec 4, 2019

The Profiler reminds me a bit of PHP Symfony's https://blackfire.io/. Even the graphs (flamegraphs) have some strong similarities (e.g. https://blackfire.io/docs/reference-guide/analyzing-timeline...).

I've used Blackfire for a while, and this type of visualization is definitely helpful for finding bottlenecks in web performance. I've been able to reduce page load by caching big chunks that I was able to see in the graph / timeline.

pacoverdi · on Dec 4, 2019

I, for one, am very interested in flamescope [1]. Not tried it yet but it's like a time machine that allows to zoom on a given time interval and look at a flamegraph of what happened at that moment.

You can look at the introductory video [2] to get an idea

[1] https://github.com/Netflix/flamescope [2] https://www.youtube.com/watch?v=cFuI8SAAvJg

EDIT: missing anchor

bentcorner · on Dec 4, 2019

Does anyone know of a good flame graph visualizer for callstacks? Particularly one that allows you to drill down into a stack. Bonus points if you can diff two data sets. I recently built an in-app profiler and am trying to work out the analysis side of things to make life easier for the other developers on my team.

redler · on Dec 3, 2019

One of their screenshot examples flags inefficient code in crypto libraries, and the suggested "fix" is "Evaluate switching to the Amazon Corretto Crypto Provider ACCP". I don't know enough about the subject matter area to know whether that's the right move, but it's interesting that CodeGuru is apparently, among other things, an opportunity to pay Amazon to upsell you on replacing some of your code with one of the panoply of services in the AWS universe.

ensignavenger · on Dec 3, 2019

Amazon Corretto is their OpenJDK Java distro that is free and Open Source. I don't know if the project was already using Corretto or not, but it makes sense for them to recommend their own, supported, open source solution.

redler · on Dec 4, 2019

Your point is a fair one, and I admit unfamiliarity with Corretto, specifically. But they're telegraphing, right on the tin, that this new service will indeed recommend solutions of the form "we see a problem pattern in your code; try Amazon _____". The fact that Corretto is actually open source further muddies the waters.

weego · on Dec 4, 2019

We'll need to see what comes out in the wash. Maybe the more OSS they encounter the more suggestions it will be able to make. Or maybe not and it will indeed be a sales pitch masquerading as a feature.

quin3 · on Dec 3, 2019

In the future, You should do a quick search before naysaying. This crypto lib is free and offers non-negligible performance gains.

NegativeLatency · on Dec 3, 2019

It's still increasing your dependence on AWS systems and software.

(The upsell price may be free for now, but who knows maybe they add a premium version or enterprise features in the future)

I imagine that the tool will be used for recommending more amazon services in the future, and this is possibly a poor POC of more to come.

Sebguer · on Dec 3, 2019

It's not an AWS system or service at all though...

This is like saying that using React will lead you to be locked into Facebook?

quin3 · on Dec 4, 2019

Why does Google have google in the Guava namespace?

ghostpepper · on Dec 4, 2019

Why does it have Amazon in the name then? Amazon Coretto?

cowsandmilk · on Dec 4, 2019

Amazon Corretto Crypto Provider ACCP Is a Java library you can install using Maven or Gradle that works with any JDK 8 on Linux x86_64. It has Amazon in the name because Amazon wrote the library....

The name might be awkward, and codeguru might be slanted towards suggesting open source libraries written at Amazon, but it is about as neutral as can be, not even requiring you to use Amazon’s OpenJDK distro.

Why does Amazon have an OpenJDK district though? Because sometimes Amazon sees performance issues at scale that they have a hot fix for. Then they share the patch with the wider OpenJDK community and have discussions about if there are better approaches to fix. Amazon has been one of the top contributors to OpenJDK releases recently (typically in top 3 for contributions to a given release), so they really are upstreaming patches.

Sebguer · on Dec 4, 2019

Because it's maintained by Amazon? But it's entirely open source: https://github.com/corretto/corretto-8 with a very permissive license (GPLv2)

Yizahi · on Dec 4, 2019

More like locked out of Facebook. They stated that you can't sue Facebook corporation for any possible reason and can't react to lawsuites by FB to you in any way (even write a post about it), even in unrelated cases to React, or they will revoke your React license. I remember several highly upvoted posts here and on other resources about this.

takeda · on Dec 4, 2019

So why is it under AWS[1]?

[1] https://aws.amazon.com/corretto/

Sebguer · on Dec 4, 2019

The same reason React is under Facebook's github org? https://github.com/facebook/react

FYI, as I commented below, it's also available on Github under its own user and with a very permissive (GPLv2) license: https://github.com/corretto/corretto-8

takeda · on Dec 4, 2019

No, that's not the same. This is github, http://aws.amazon.com is explicitly AWS.

cthalupa · on Dec 4, 2019

Does the (sub)domain something is hosted on actually matter when the ownership situation is the same?

takeda · on Dec 4, 2019

It does, it means it was created for the use of AWS. That means the primary reason for the patches it is to make sure it will work well on AWS.

While the product is free it doesn't mean the patches will be beneficial anywhere else, in fact if it will work better anywhere it would be merged back to OpenJDK and we wouldn't need the fork.

Same thing with Amazon Linux, sure you can use it on premise, but it is tuned to work best on AWS and might actually work worse outside than other distros.

Sebguer · on Dec 4, 2019

As someone else pointed out, AWS does consistently upstream things to OpenJDK and is in fact regularly one of the largest contributors.

Additionally, the entire thing, again, is open source and with a permissive license meaning nothing is stopping anyone from forking it and doing what they'd wish with it.

You are in fact right that it was created to work well with AWS, but I fail to see how that is 'lock in', since most of those benefits are probably benefits on any modern cloud - since AWS does not generally run on a particularly unique architecture.

5ersi · on Dec 4, 2019

ACCP is an Apache licensed crypto library that has a standard JCA/JCE interface, meaning it's a drop-in replacement for the standard java crypto.

https://github.com/corretto/amazon-corretto-crypto-provider

They claim to be 25% faster than standard implementation: https://aws.amazon.com/blogs/opensource/introducing-amazon-c...

rwiggins · on Dec 3, 2019

The code review feature seems too expensive to run on every PR automatically (to me): $0.75 per 100 lines of code. From their example pricing: "if you have a typical pull request with 500 lines of code, it would only cost $3.75 to run CodeGuru Reviewer on it." I wonder if it's actually good enough to justify that price.

femto113 · on Dec 3, 2019

If it’s trained on software written by Amazon it’s probably worth the $3.75 just so you can do the exact opposite of what they recommend.

danpalmer · on Dec 3, 2019

I don’t have much context, but I’ve never seen Amazon as a technical leader in the industry. They’re absolutely a business leader, and the services they provide can be good, but at a code level I’ve always thought of them as very MVP, if it works it’s good enough.

For code review services I’d expect a level far above this. Maybe they are able to do that, but I don’t have any existing positive bias towards this, and a few things against it.

JaRail · on Dec 4, 2019

It'll probably generate irrelevant stats on the engineers to send directly to their managers to use against them in their next review.

areactnativedev · on Dec 4, 2019

You have no idea how much this resonated ^^

Just needded to add an AWS library in my code base and BAM! here is how my console will look on every reload from now on :

:8081/index.bundle?platform=ios&dev=true&minify=false:93 Require cycle: node_modules/aws-sdk/lib/react-native-loader.js -> node_modules/aws-sdk/lib/credentials/temporary_credentials.js -> node_modules/aws-sdk/clients/sts.js -> node_modules/aws-sdk/lib/react-native-loader.js

Require cycles are allowed, but can result in uninitialized values. Consider refactoring to remove the need for a cycle. metroRequire @ :8081/index.bundle?platform=ios&dev=true&minify=false:93 :8081/index.bundle?platform=ios&dev=true&minify=false:93 Require cycle: node_modules/aws-sdk/lib/react-native-loader.js -> node_modules/aws-sdk/lib/credentials/cognito_identity_credentials.js -> node_modules/aws-sdk/clients/cognitoidentity.js -> node_modules/aws-sdk/lib/react-native-loader.js

Require cycles are allowed, but can result in uninitialized values. Consider refactoring to remove the need for a cycle. metroRequire @ :8081/index.bundle?platform=ios&dev=true&minify=false:93 :8081/index.bundle?platform=ios&dev=true&minify=false:28851 Warning: AsyncStorage has been extracted from react-native core and will be removed in a future release. It can now be installed and imported from '@react-native-community/async-storage' instead of 'react-native'. See https://github.com/react-native-community/react-native-async... reactConsoleErrorHandler @ :8081/index.bundle?platform=ios&dev=true&minify=false:28851 :8081/index.bundle?platform=ios&dev=true&minify=false:93 Require cycle: node_modules/@aws-amplify/analytics/lib/Providers/index.js -> node_modules/@aws-amplify/analytics/lib/Providers/AWSKinesisFirehoseProvider.js -> node_modules/@aws-amplify/analytics/lib/Providers/index.js

Require cycles are allowed, but can result in uninitialized values. Consider refactoring to remove the need for a cycle. metroRequire @ :8081/index.bundle?platform=ios&dev=true&minify=false:93 :8081/index.bundle?platform=ios&dev=true&minify=false:93 Require cycle: node_modules/@aws-amplify/predictions/lib/types/Providers/AbstractConvertPredictionsProvider.js -> node_modules/@aws-amplify/predictions/lib/types/Providers/index.js -> node_modules/@aws-amplify/predictions/lib/types/Providers/AbstractConvertPredictionsProvider.js

Require cycles are allowed, but can result in uninitialized values. Consider refactoring to remove the need for a cycle. metroRequire @ :8081/index.bundle?platform=ios&dev=true&minify=false:93 :8081/index.bundle?platform=ios&dev=true&minify=false:93 Require cycle: node_modules/@aws-amplify/predictions/lib/types/Providers/index.js -> node_modules/@aws-amplify/predictions/lib/types/Providers/AbstractIdentifyPredictionsProvider.js -> node_modules/@aws-amplify/predictions/lib/types/Providers/index.js

Require cycles are allowed, but can result in uninitialized values. Consider refactoring to remove the need for a cycle. metroRequire @ :8081/index.bundle?platform=ios&dev=true&minify=false:93 :8081/index.bundle?platform=ios&dev=true&minify=false:93 Require cycle: node_modules/@aws-amplify/predictions/lib/types/Providers/index.js -> node_modules/@aws-amplify/predictions/lib/types/Providers/AbstractInterpretPredictionsProvider.js -> node_modules/@aws-amplify/predictions/lib/types/Providers/index.js

Require cycles are allowed, but can result in uninitialized values. Consider refactoring to remove the need for a cycle. metroRequire @ :8081/index.bundle?platform=ios&dev=true&minify=false:93 :8081/index.bundle?platform=ios&dev=true&minify=false:93 Require cycle: node_modules/@aws-amplify/predictions/lib/Providers/index.js -> node_modules/@aws-amplify/predictions/lib/Providers/AmazonAIPredictionsProvider.js -> node_modules/@aws-amplify/predictions/lib/Providers/index.js

avip · on Dec 3, 2019

[flagged]

heyoni · on Dec 4, 2019

Definitely no bias there ;P

auslegung · on Dec 3, 2019

That’s incredibly cheap, assuming it provides good suggestions. How much time does it take you to review 500 lines of code change, and what’s your time worth? If it takes 10 minutes and your time is worth about $20/hour or more, this service will part for itself immediately.

EpicEng · on Dec 3, 2019

Our code reviews are far more "is this the right way to solve the problem?" than "hey, you never use that variable you declared." The latter would be picked up by our linter; I'm having a hard time seeing the value proposition here.

>It’s like having a distinguished engineer on call, 24x7

I don't believe that, regardless of how many times they sprinkle in the words "machine" and "learning".

dickjocke · on Dec 3, 2019

I'd be very surprised if the service they've announced is a linter.

The announcement says it can even analyze parts your code that are more computationally expensive than they need to be. I'm not sure I understand the skepticism--surely they have among the largest code repositories in the world. Why couldn't they train models on it to look at best practices and even compare code practices to different metrics.

ShamelessC · on Dec 3, 2019

Isn't this just calculating the cyclomatic complexity?

ralmeida · on Dec 3, 2019

No - OP meant computationally expensive, not cognitively expensive. Two nested for-loops can be O(nˆ2) but can have a cyclomatic complexity as low as 1.

ehsankia · on Dec 3, 2019

It may not be as good as a human, but I highly doubt it's a linter either. It's somewhere in between. If their claims are true, and it has been trained on hundreds of thousands of their own reviews, the AI could have picked up common patterns that are beyond lint but still real mistake a good reviewer would spot.

rwiggins · on Dec 3, 2019

I agree 100%: if it provides good enough suggestions, it could pay for itself pretty easily on regular day-to-day PRs. (Although: not all 500 line PRs are made equal.)

My original comment was definitely unclear. I actually had two separate thoughts (that I didn't communicate well at all):

(1) if your team has occasional large, automated PRs (code generation, automated refactors, etc), you probably don't want to run this tool on them because of cost, so anyone that has these large PRs and uses CodeGuru probably needs to build a way into their automation to suppress CodeGuru (or build a way to invoke it for specific PRs)

(2) I also wonder if it's good enough to justify the price on regular PRs

We don't have many situation (1) PRs where I work now, but they do come up occasionally. For example, I've used IntelliJ IDEA's structural find-and-replace to do very large automated refactors where CodeGuru would be very expensive and probably provide little value. We also do check in some generated code (we usually don't do this, but there are a couple exceptions where we weighed the tradeoffs and decided checking in the generated code was a better solution, in our eyes).

wpietri · on Dec 3, 2019

Only if CodeGuru gets a lot of the value of a code review. But I think finding actual bugs is a pretty small of it.

A good code base is a team-created intellectual work. For that to happen, you need a ton of collaboration, shared learning, evolution of norms, interpersonal bonding, and practice of key social behaviors (e.g., principled negotiation, giving good feedback, recognizing and rewarding good actions). Automated code review gets at none of that.

dajohnson89 · on Dec 3, 2019

You seem to assume that the code review tool can do everything that a human code reviewer can.

kerpele · on Dec 3, 2019

On the other hand, a machine will not get tired or bored where a human most definitely will if the diff is anywhere near that 500 lines

scriptkiddy · on Dec 4, 2019

I disagree. I regularly review PRs with more than 500 changed lines/20+ changed files. I read every single line. I put the same amount of effort into reviewing code as I do writing it; every software engineer should.

mc3 · on Dec 4, 2019

You don't replace the human code review. You supplement it. An AI that can replace the code review would need to be an AGI that "sat" down with your team and understood the architectures and meaning in your code. If it can't say "WTF?" it can't do a full code review.

thayne · on Dec 3, 2019

Although this could certainly supplement a human reviewer, I don't think it could replace one.

takeda · on Dec 4, 2019

You really believe this will free you from traditional code reviews? I would treat it as an advanced linter, and from the pictures looks like that's how it integrates itself in github.

tempsolution · on Dec 3, 2019

Yeah or looking at it the other way around. If this would replace your human reviewer, then maybe you would do good to have some serious discussions with your HR department...

Eclyps · on Dec 3, 2019

Don't accidentally commit `node_modules` - that'd be a costly mistake!

james_s_tayler · on Dec 3, 2019

Or a 35,000 line XML configuration file.

ehsankia · on Dec 3, 2019

Now that I think of it, if it's paid by lines of code, it perversely incentives people to minimize the lines of code, no? Does it count white space and comments? Can I minify my code before passing it to this, then unminify it?

bryanrasmussen · on Dec 3, 2019

Many languages can have code written in them minimized down to a single line. I guess they must have a character count number equals a line qualifier somewhere.

ehsankia · on Dec 4, 2019

But even then, still pushes people to shorten variable names and other kind of minification.

stefano · on Dec 3, 2019

Can you just remove newlines from files? In most languages they're optional.

earenndil · on Dec 4, 2019

Count semicolons then.

takeda · on Dec 4, 2019

If it is as smart as some people here think it would probably output: "are you f#$king with me?"

dizzy3gg · on Dec 3, 2019

Maybe it’s on recommended line formatting

philshem · on Dec 3, 2019

“Time to hire some code golfers, we’ll put the whole app into one line of code.”

skykooler · on Dec 3, 2019

"CodeGuru says this should be separated into more lines..."

bowmessage · on Dec 3, 2019

Really though, why charge by-the-line on this kind of product? Imagine if CodeCommit or Lambda charged you by the line too!

mitchty · on Dec 3, 2019

Nightmares in perl...

Aeolun · on Dec 3, 2019

That sounds pretty terrible to be honest. I cannot imagine getting that kind of value out of it (that I would not get with a simple linter).

Nican · on Dec 3, 2019

I was thinking about the same thing. Even for a small project of 3 developers, it seems like this would rise easily to the $100+/month, for suggestions that may not even be that useful.

rstupek · on Dec 3, 2019

at which point you'd stop paying for it I would imagine?

jtcruthers · on Dec 3, 2019

There doesn't seem to be a point to start paying for it, and the time to stop paying for it seems mighty early

m0zg · on Dec 3, 2019

PR I sent yesterday, line changes: +2,703 −3,529. That's like 50 bucks just for that PR.

perlgeek · on Dec 4, 2019

If another developer reviewed your code, how much time it would it take them, and how much is that time worth?

If you divide the 50 bucks by that number, you get a cost ratio. If it's lower than the ratio of (benefit expected by automatic code review) / (benefit expected by manual code review), it's worth using.

I guess we can speculate all we want; in the end, only experience will show if the service is worth it or not.

aledalgrande · on Dec 4, 2019

dude I hope that was the result of updating yarn.lock or similar, otherwise good luck to your reviewer!

m0zg · on Dec 4, 2019

Nope, it's mostly code. It's a gnarly PR that atomically delivers a feature (and removes the feature that the new one replaces), but this looks about right for my weekly productivity overall, except I usually submit it in smaller PRs, and the delta is mostly lines added.

tkahnoski · on Dec 3, 2019

Trying to compare it to another code analyzer... https://sonarcloud.io/about/pricing 100k lines for €10/mo.

stingraycharles · on Dec 3, 2019

Is that all code or just the diff, though? 100k lines of code in diffs seems like a lot, all code - not so much.

notjustanymike · on Dec 3, 2019

Or when a junior dev switches from tabs to spaces

nivenhuh · on Dec 4, 2019

Or when a junior dev switches from spaces to tabs ;)

billman · on Dec 3, 2019

heh.. I wish it were only junior devs.

spyspy · on Dec 3, 2019

that's insanely expensive if you're doing any type of code generation.

pc86 · on Dec 3, 2019

This seems like a good incentive not to be generating thousands of lines of codes with each PR, which most would probably consider a feature as opposed to a bug.

Someone1234 · on Dec 3, 2019

If you split the same number of lines over two, three, ten, etc PRs it still costs the same. If anything it is incentivising code-golf via line minimization.

derision · on Dec 3, 2019

exactly. IMO if you're generating code, it should happen at build/compile time not at checkin

bentcorner · on Dec 4, 2019

I'm not a fan of this. My team does this a lot and invariably it leads to having stuff be unfindable because the source you really want is in your build directory and not in code search. Which is annoying but manageable if I build the project, but partners who don't will have an even harder time of things.

More recent efforts have us check in generated code alongside the "config" files, and automated processes ensure you check in the generated code if you touch the config file. It's much better this way.

tikkabhuna · on Dec 4, 2019

We generate source code into a "src/main/generated" directory. IntelliJ picks it up like any other source code with proper Gradle configuration and we ignore it with .gitignore.

Only downside we've found is it can be a pain with searching for references in Github and you have to remember to generate the code, but for the most part it is seamless.

I guess how you manage it depends on your IDE, if you can configure it to work nicely, and how much the generated source changes/needs to be read.

dharmab · on Dec 3, 2019

One exception case in our project is documentation- parts of it, like the index, are generated prior to commit. We like having the docs updated in the same commit as the change so there's never an opportunity for mismatch.

scarejunba · on Dec 3, 2019

What would the rationale be for that?

pc86 · on Dec 3, 2019

Generated code is an artifact of the source code. If you need it for something specific, regenerate it from the source when you pull that from your version control system. You're not getting any benefit by storing something that can be generated alongside the means to generate it.

scarejunba · on Dec 3, 2019

Thank you for your response. The advantages you get:

* Hermetic builds are faster because code-gen only occurs when changes occur in the base code

* Lots of docgen tools don't support incremental compilation

* Diffs in generated code show up as diffs when you change the code-gen tool, easier to isolate changes that occur if your code-gen tool is upstream (say you want entire org on Thrift 0.9.2 from Thrift 0.8)

Downsides I can see:

* Large repo.

* Source of truth is now the generated code, not the source, so someone else using the source could get a different result.

Essentially acting in an empirical mode of operation (i.e. does it provide benefits for cost), and ignoring any philosophical objections, this seems like it could go either way depending on the situation.

spyspy · on Dec 3, 2019

> Source of truth is now the generated code, not the source, so someone else using the source could get a different result.

This seems to apply to the other side, I think. Generating from source with different tooling or tool versions could create different results, whereas using the generated code guarantees consistent behavior.

panda88888 · on Dec 3, 2019

IMO it’s because generates code is not “source code”. It’s more similar to object files—both are generated by running a compiler.

JaRail · on Dec 4, 2019

Tools like this should be built into your IDE. No developer ever wants automated feedback at the end of the process in a code review.

There are lots of academic ML review/suggestion tools. Those people come to the table with trials and statistics to assess the quality of their results. Amazon probably copied one of those papers, added a rules-engine to recommend their own APIs, and slapped a hefty price tag on it.

binary_vitamin · on Dec 4, 2019

> I wonder if it's actually good enough to justify that price.

If it can spot a lot of issues (performance, security, bug, etc), $3.75 is definitely a good deal to do it once a while but not on every single changes (e.g. fixing a typo in the code comment)

veselin · on Dec 3, 2019

I found what it generates.

https://github.com/pediredla/Algorithms/pull/3/files

It looks like a linter, but maybe there is more.

faitswulff · on Dec 3, 2019

I've never seen a linter tell me problems with code in this detail before:

> You are using a `ConcurrentHashMap`, but your usage of `get()` and `put()` may not be thread-safe at lines: 110, 113, 135, and 137. Two threads can perform this same check at the same time and one thread can overwrite the value written by the other thread.

senko · on Dec 3, 2019

Many linters are state-aware, for example to catch use-before-init bugs in various languages.

This one could be a fairly simple rule ConcurrentHashMap.get() followed by some code that branches on the result, followed by put() is unsafe. These warnings can be very helpful, but no fairy ML magic needed.

scarejunba · on Dec 3, 2019

Pretty sure you can turn on that inspection in IntelliJ. The other ones are much more impressive. The 'waiters' one for instance is gold.

stock_toaster · on Dec 3, 2019

staticccheck for Go does _some_ of this kind of thing (documenting improper usage). https://staticcheck.io/docs/checks

Yeroc · on Dec 3, 2019

SonarQube does as well. I've been shocked at some of the analysis it does.

farslan · on Dec 3, 2019

Next step is to provide automated fixes. I've have a side project that does it for Go source code: https://fixmie.com (have plans for other languages and protocols).

But due my Visa situation here in the US (H1B), I'll be never able to monetize it as it's illegal to have a side income. But I think this is just the start and there is an huge opportunity for new startups and projects.

chirau · on Dec 3, 2019

It is not impossible for you to earn second income on H1-B, it's just that the secondary source would need its on visa petition.

DISCLAIMER: I am not an attorney. More importantly, I am not your attorney. The above is not legal advice. If you desire legal advice, consult a competent, licensed attorney in your area.

falcor84 · on Dec 3, 2019

What's the probability that such an unsponsored visa would be approved, and within even a remotely relevant timeframe?

8ytecoder · on Dec 4, 2019

Zero. H1B, by definition requires a specialty and a sponsor who can fire you (I don't know the exact phrasing but this is what prevents someone with an H1B from starting their own company)

chirau · on Dec 4, 2019

Before I proceed...are you sure and how sure are you? Would you like to entertain a bet against that nonsensical zero of yours, as if you actually know this.

I am always fascinated by deducive ignorami who pretend to have done the research, like yourself.

pokler · on Dec 3, 2019

How is your project different than Getafix from Facebook?

https://engineering.fb.com/developer-tools/getafix-how-faceb...

paulddraper · on Dec 3, 2019

My hot take is that if you can automatically detect meaningful bugs or author fixes, you need to level up your abstraction.

I think these things make the most sense for Java and Go where there tends to be lots of repetition and lower-order programming patterns.

Unlike say, Python, Lisp, or Rust.

justwalt · on Dec 3, 2019

I’ve never programmed in Java or Go in any serious context. What do some of those repeated patterns look like?

paulddraper · on Dec 4, 2019

Error handling in Go.

Lots and lots of getters and setters in Java.

haimez · on Dec 4, 2019

Loops

outworlder · on Dec 3, 2019

> Next step is to provide automated fixes.

That's a pretty deep rabbit hole. But considering that old "IDEs" with crappy "Intellisense", "Quickfix" or similar were widely sold, there's potential there.

farslan · on Dec 3, 2019

It's not only about the code. For example it also could fixe your import paths if one of your libraries has a CVE and a new version was released. In the case of Fixmie, all the fixes are "suggestion" and GitHub nicely allows you to batch them all and submit with them in a single commit.

(Disclaimer: I'm working for GitHub, but on a different project)

meerita · on Dec 3, 2019

Wouldn't that create worse programmers?

farslan · on Dec 3, 2019

How so? There is so many things that we sometimes forgot. Even experienced developers will make mistakes.

meerita · on Dec 3, 2019

If you relieve the programmer of thinking where his error is and give him the fix, the programmer will not bother to reason out what the solution is, he will simply expect it from you.

zeendo · on Dec 3, 2019

Do you really think that what we do and don't have to think about today is at some holy division of things that are best left automated (e.g. garbage collection, platform independence, serialization) and things we have to do by hand? Why is this particular point in time special?

It's a spectrum. Now isn't special.

meerita · on Dec 5, 2019

I think if you provide an IDE that solves everything it will become like a calc, when people stoped making mental excersices in favor of typing the problem and get the insntant result of it.

ytjohn · on Dec 4, 2019

> Is it really me who is coding if I can't get forward without searching the web and without IDE pleasantries? The programmer as an individual is an outdated idea.

https://twitter.com/andrestaltz/status/1195388056855031814

013a · on Dec 4, 2019

Something doesn't sit right with me concerning their use of "over 10,000" open source projects on Github to train the AI, then immediately turning around and telling those same projects "thanks, that'll be $0.75/100 lines of code scanned."

I feel like this should have a generous free tier for open source projects. I feel that very, very strongly.

eigenvalue · on Dec 3, 2019

They should compute the SHA512 hash of lines of code or code blocks from well-known open source projects and then just give you pre-computed "reviews" for those lines/blocks, and then only charge for "novel" code. Otherwise you would need to waste time segregating your original code from the various packages you use. And it seems unfair to charge customers for canned results that can be cached and served at very low cost.

drchewbacca · on Dec 3, 2019

I think you can set it to only scan when new pull requests are made. So you could commit your libraries etc without asking for review and then turn it on only for code you have written.

I might be wrong though.

randomidiot666 · on Dec 4, 2019

Yes obviously you would just choose not to submit those irrelevant PRs to this extremely overpriced linter (it's not a code reviewer)

shubidubi · on Dec 3, 2019

Code review is not linter. Code review is a chance to discuss design, scaling, trade-offs and mentor others. I don't think this solution will offer it.

MisterPea · on Dec 3, 2019

I think this does what you mention and not the former. I would imagine this works best when you have a codebase that heavily utilizes the AWS SDK so it can internally 'paint a picture' of what's going on and provide better architectural decisions and other best practices.

How well it works is beyond me though

randomidiot666 · on Dec 4, 2019

Bullshit. You are vastly overestimating the "intelligence" of this overpriced linter. It mechanically detects patterns. See this example: https://d1.awsstatic.com/re19/Screenshot_Catch-Code-Issue_2%... The kind of human-level artificial intelligence that you're suggesting this would have, is science fiction.

MisterPea · on Dec 4, 2019

Well I stand corrected, would've expected more from a company that knows all the best practices for their own services

BinaryIdiot · on Dec 3, 2019

> Code review is a chance to discuss design, scaling, trade-offs and mentor others.

Trade offs sure but design and scaling need to be considered _before_ the code review. Maybe an architecture review of sorts? Once you hit code review it's a little too late to reconsider design and scale unless it's a serious issue.

> mentor others

Mentoring is mostly outside of a code review. Sure it can help with that but I don't think that really counts. IMO anyway.

GordonS · on Dec 3, 2019

> Mentoring is mostly outside of a code review

Strongly disagree, at least for remote teams.

Working remotely, I've personally found code reviews to be a great way of mentoring less experienced team members.

I also encourage junior team members to review code of more experienced team members.

For big changes, we discuss proposal/API/code reviews as a team.

I've had several people provide feedback that they've learned a lot from reviews like this, and honestly I wish I'd had this kind of mentoring when I started out (I was basically a one-man cowboy-coder for the first 5 years or so of my career).

I know mentoring can be seen as a chore for many, but it can be seriously rewarding too!

jhall1468 · on Dec 3, 2019

If you're discussing design and scaling at code review, you have a serious, SERIOUS, problem. That's what design docs are for.

ur-whale · on Dec 3, 2019

Fair enough about the design discussions, but I also think this is quite a little bit more than a linter.

cangencer · on Dec 3, 2019

I wanted to try it on a single repository, but it requested access to all repositories, public or private and also needs admin access for webhooks. No thanks.

udkl · on Dec 3, 2019

Reminds me of the security analysis tool from FB

https://engineering.fb.com/security/zoncolan/

https://www-wired-com.cdn.ampproject.org/c/s/www.wired.com/s...

Also reminds me of sonatype or findbugs which does something similar but works on a set of rules instead of on ML.

richnich · on Dec 4, 2019

There is ZERO lock-in with this service. If people want to continue to do hand code reviews themselves, they can. There is also ZERO in Amazon's announcement that implies this is just about improving code specific to Amazon's platform. Stop being open source purists and start understanding reality. Many (most?) of the posters about this topic critiquing Amazon live in open source unicorn land, and seem to have almost no understanding of business realities. (Sorry for not saying what I really think but I thought I should be polite.)

totally · on Dec 3, 2019

If you squint this looks like a baby step towards computers writing code.

JustSomeNobody · on Dec 3, 2019

Not even close. We'll have to nail down how to write unambiguous design specifications in a format that the AI can consume, first.

aripickar · on Dec 3, 2019

Maybe we could come up with a consistent language to tell machines what to do first.

origami777 · on Dec 3, 2019

It definitely is. Automation is the end game. Devs are expensive and error prone.

randomidiot666 · on Dec 4, 2019

Not even close. This is simple mechanical pattern recognition. It's an overpriced, overhyped code linter. See this example https://d1.awsstatic.com/re19/Screenshot_Catch-Code-Issue_2%...

mping · on Dec 3, 2019

I'm wondering did anyone actually tried it? It's not impossible for an automated tool to give valuable feedback over a PR guys. Should be easier than a self driving car I guess

TheOperator · on Dec 3, 2019

Chess masters have long been combining human and computer analysis... even before computers were able to actually beat humans at chess.

efdee · on Dec 3, 2019

Poor naming choice, considering CodeGuru.com has been around for what, decades?

mavsman · on Dec 3, 2019

I worked on the relaunch of Cloud9 --> AWS Cloud9 and there was an extensive naming process shortly before the launch of the service, both for AWS Cloud9 and for the term "environments" (which was previously workspaces).

I can tell you that there were lots of managers, PMs, directors, etc involved and they considered tons of naming options. They took into account third party services/products, first party services/products, and other things that might have overlap. This was likely the situation here and they accepted this as a drawback.

That said, you're free to disagree and that doesn't mean it was the right choice, just wanted to point out that this was not an oversight.

GordonS · on Dec 3, 2019

> just wanted to point out that this was not an oversight

With codeguru.com being a long established site, doesn't that make it worse?

criddell · on Dec 3, 2019

I'm thinking Amazon doesn't care a whole lot about the IP of others. I say that based on reports of rampant counterfeit goods and complaints about Amazon copying product design for their in-house brands.

CodeSheikh · on Dec 3, 2019

$0.75 per 100 lines of code scanned per month. Wonder if it automagically ignores new lines and javadocs?

yellowait44 · on Dec 3, 2019

Oh my! Son of Anton is watching my code!

dipthegeezer · on Dec 4, 2019

In case people are wondering about the reference. https://www.youtube.com/watch?v=xzx5Hwg24xw

SlowRobotAhead · on Dec 3, 2019

Was there a list of supported languages somewhere? I couldn't find it.

hsaliak · on Dec 3, 2019

Only java, it's in the FAQ.

gcbw3 · on Dec 3, 2019

If it was minimally not garbage, they would have run it on any high profile open source project and promoted the results.

Hence, it is pure garbage.

millstone · on Dec 3, 2019

Many (most?) of the checks appear to be specific to Amazon libraries.

GordonS · on Dec 3, 2019

> CodeGuru is inexpensive enough to use for every code review and application you run

> For example, if you have a typical pull request with 500 lines of code, it would only cost $3.75 to run CodeGuru Reviewer on it

Wat?!

Come on, $4 per review is not inexpensive, especially for what is essentially a glorified SAST!

seanwilson · on Dec 4, 2019

Is there a reason AWS products have names where you can rarely guess what they do?

Why not something more obvious like one of AWS Code Auditor/Reviewer/Checker?

z3t4 · on Dec 3, 2019

Code review if done right is a place to learn. Criticizing (and automatically fixing) code style issues and nit-picky can be done by a machine.

ecuaflo · on Dec 3, 2019

Why do this in the code review stage as opposed to in the code editor linting stage? I'd wanna have these suggestions before pushing.

eddywebs · on Dec 3, 2019

An open source alternative is PMD tool - https://pmd.github.io

miheermunjal · on Dec 3, 2019

Assuming trust in the AWS code reviews (I mean, that dataset is huge), I suspect this has use in the review portion even without considering profiling. Hoping there is more detail on the ML used, as it appears more adaptable than current rule based code reviewing solutions... here come more and more Dev-focused integrations coming to the code level

whb07 · on Dec 3, 2019

Neat idea and a good place to start. But most of the time, people are willingly and sometimes violently opposed to automated free tooling like linters, formatters etc. I’m not holding my breath for that cohort (majority of devs).

djhworld · on Dec 3, 2019

I got to preview this service (the code review service) a few weeks ago.

The best thing about it was the recommendations on how to use the AWS SDK better as that's probably got the most potential to drift or make mistakes on

buboard · on Dec 3, 2019

considering how it can be a driver for AWS sales, they should give the service for free though

conwy · on Dec 3, 2019

It seems good for performance, and probably only that. I don't see anything in the pitch about readability, maintainability, extendability, security, usability/accessibility or portability.

amai · on Dec 4, 2019

What can it do better compared to solutions like https://www.sonarqube.org/?

mirchibajji · on Dec 4, 2019

If anyone from AWS is reading this, is there a plan to support GitLab? We have a self-hosted GitLab enterprise on AWS, and wondering if we can try this out

dchichkov · on Dec 4, 2019

Now, is it a code review with ML, or is it a data collection service with a human backend. That wants to be code review with ML ;)

rodgerd · on Dec 4, 2019

Your programming job is Jeff's opportunity.

formalsystem · on Dec 3, 2019

Is this vaporware?

scblock · on Dec 3, 2019

Please do us a favor and consolidate all these Amazon announcements into a single announcement page link. This is ridiculous.

dang · on Dec 3, 2019

This always comes up when the big tech cos do their annual conference day thing. We don't consolidate the posts, but we do downweight some of them, so you're actually getting less (Amazon|Google|Apple|Microsoft)iness then the system would otherwise be letting through.

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

kyriakos · on Dec 3, 2019

https://aws.amazon.com/blogs/aws/?sc_icampaign=launch_aws-ne...

cosmodisk · on Dec 3, 2019

I love Salesforce for this.Every release, there's one place with all the changes and new features.No need to play a detective trying to figure out

outworlder · on Dec 3, 2019

AWS Reinvent is going on. Announcements are done piece meal.

CodeSheikh · on Dec 3, 2019

"Amazon CodeGuru is a machine learning service for automated code reviews and application performance recommendations. It helps you find the most expensive lines of code that hurt application performance..." I suspect if AWS is using customers code bases to train its AI models? Another source is to scavenge open source repositories.

Dunedan · on Dec 3, 2019

Maybe you should have continued reading the second paragraph as well?

> CodeGuru’s machine learning models are trained on Amazon’s code bases

> comprising hundreds of thousands of internal projects, as well as over

> 10,000 open source projects in GitHub. Tens of thousands of Amazon

> developers have contributed to CodeGuru’s training based on decades of

> experience in code review and application profiling.