Launch HN: DeepSource (YC W20) – Find and fix issues during code reviews

jakearmitage · on March 11, 2020

> Enable analyzers by adding .deepsource.toml

Why is this needed? Can't you imply the necessary analyzers from my codebase?

There is no support for Javascript, Typescript, PHP, Java or C#. No HTML or CSS support. Is there a roadmap?

Also, the name implies use if Deep Networks and AI. Am I mistaken? If not, what kind of AI is used here? Seems like just an automatic runner of static analysis tools.

dolftax · on March 11, 2020

> Why is this needed? Can’t you imply the necessary analyzers from my codebase?

Sure. We can probably infer the languages used in the repository. But we need metadata like test glob patterns, exclude patterns, runtime versions (Python 2, Python 3) to improve the accuracy of issues. For ex: Usage of assert statement in application logic is discouraged as it is removed when compiling to optimised byte code (python -o producing *.pyo files). Ideally, assert statement should be used only in tests. Also, we haven’t found a way to infer Python 2 vs Python 3 accurately. Can you think of a way? That would be helpful.

> There is no support for Javascript, Typescript, PHP, Java or C#. No HTML or CSS support. Is there a roadmap?

We strongly believe in starting out with a few languages and add as many issues as we can (with the ability to autofix most of them) -- before we go broad. That said, we released Ruby in beta couple weeks back and are currently working on the stable release. We’ve also started working on JavaScript (with TypeScript support) a month back and we should release the beta version of JavaScript analyzer in approx a month from now.

> Also, the name implies use if Deep Networks and AI. Am I mistaken? If not, what kind of AI is used here? Seems like just an automatic runner of static analysis tools.

It’s just the name :) We do not use Machine Learning or AI at the moment — the reason being we’re optimizing for high accuracy, and a rules engine that uses AST parsing helps us do that reliably. We do plan to use learning in the future to capture data around which issues are being fixed the most and which are not, and then show issues in the most relevant order to users depending on their context.

swsieber · on March 11, 2020

> We’re working on supporting Ruby and JavaScript and integrations for Bitbucket and Azure DevOps.

sobolevn · on March 13, 2020

There's a better solution: use open-source cli tools that do just that!

1. 520 Python checks? Use `wemake-python-styleguide` (wrapper around flake8) that has bigger amount of checks: https://github.com/wemake-services/wemake-python-styleguide There's also `pylint` with a set of awesome checks as well.

2. Type checking? Use `mypy`: it just a single command!

3. Autofixing? Use `black` / `autopep8` / `autoflake` and you can use `pybetter` to have the same ~15 auto-fix rules. But, it is completely free and open-source

I don't like this whole idea of such tools (both technically and ethically):

- Why would anyone want to send all their codebase to 3rd party? We used to call it a security breach back in the days

- On moral side, this (and similar) projects look like thin wrappers around open-source tools but with a monetisation model. How much do these companies contribute back to the original authors of pylint, mypy, flake8? Ones who created and maintained them for years. I will be happy to be wrong here

dolftax · on April 13, 2020

> There's a better solution: use open-source cli tools that do just that!

We do not deny that you can't run the open-source tools locally. Be it one line command, or be it setting up pylint or flake8 with dedicated configurations. DeepSource is a tool meant to eliminate the need to set up all those open source tools locally or in your CI pipeline. So that you don't need to

- Fish for issues amongst hundreds of lines of logs in the CI

- Figure out and update linter config to remove duplicates and false positives (for ex: Bandit throws errors like `assets statement used` in a test file — which is a false-positive. Bandit doesn’t know that it is a test file by default)

- Some issues needed better description of why is that an issue, for ex: why should default file permissions be 0600? Justification on why is it necessary,.

- By default on every commit or pull request, linters run on all the files.

- If there are issues that occur in say 50 places, one have to manually fix it.

> 1. 520 Python checks? Use `wemake-python-styleguide` (wrapper around flake8) that has bigger amount of checks: https://github.com/wemake-services/wemake-python-styleguide There's also `pylint` with a set of awesome checks as well.

Our focus at the moment is not on style issues. In fact, amongst the categories of issues we raise (anti-patterns, bug-risks, performance, security, style, documentation), style issues are the most debated on by our users as it is really subjective. We’re thinking of removing style issues by default (as an opt-in) and are working on running formatters like `black`, `yapf`, .. with a single line config in `.deepsource.toml`. Our analyzer team actively adds custom rules which you don’t get from the open-source tools. The following issues for example:

- Raising another exception when `assert` fails is ineffective. For ex: `assert isinstance(num_channels, int), ValueError('Number of image channels needs to be an integer')`

- If the condition would not be satisfied, user would be expecting a `ValueError`, but this would be raised: `AssertionError: Number of image channels needs to be an integer` which should be

- `yield` used inside a comprehension (which breaks code in Python 3.8)

- Write operation on file that is opened in read-only mode

- I/O detected on a closed file descriptor

> 2. Type checking? Use `mypy`: it just a single command!

Sure. If one prefers running it locally (or) as part of their CI. But if you already use DeepSource to flag issues, it can be enabled by a single line in .deepsource.toml file.

> 3. Autofixing? Use `black` / `autopep8` / `autoflake` and you can use `pybetter` to have the same ~15 auto-fix rules. But, it is completely free and open-source

We are working on adding support for autopep8, black and autoflake in coming weeks. They mostly auto-patch stylistic issues [1]. Thanks for letting us know about pybetter. It looks like a great tool and fixes ~9 issues [2]. DeepSource’s autofix aim is to fix more than 3/4th of issues we detect and we detect 522 issues in our Python analyzer. We have dedicated engineering team actively working on the analyzers. As of today, following are some of the issues our Python analyzer can autofix (which I couldn’t find it among the open-source tools):

- No use of `self`

- Usafe of dangerous default argument

- Module imported but unused

- Function contains unused argument

- Debugger import detected

- Debugger activation detected

- Unnecessary comprehension

- Unnecessary literal

- Unnecessary call

- Unnecessary typecast

- Bad comparison test

- Empty module

- Built-in function `len` used as condition

- Unnecessary `fstring`

- `raise NotImplemented` should be `raise NotImplementedError`

- `assert` statement used outside of tests

Same goes with Go and other analyzers we support.

> I don't like this whole idea of such tools (both technically and ethically): > Why would anyone want to send all their codebase to 3rd party? We used to call it a security breach back in the days.

We follow strict security practices [3]. In a gist, 1) We do not store your code, 2) Source code is pulled in an isolated environment that has no access to any of our internal systems or the external network, 3) As soon as the analysis is completed, the environment is destroyed and all logs are purged. Also, there are many tools that developers use everyday (Travis CI, Circle CI, GitHub) where the source code is sent to the cloud — I don't think it is accurate to call it a security breach. That said, we have on-premise setup of DeepSource in the roadmap. We’re working on SOC 2 Type 2 compliance as well [4].

> On moral side, this (and similar) projects look like thin wrappers around open-source tools but with a monetisation model. How much do these companies contribute back to the original authors of pylint, mypy, flake8? Ones who created and maintained them for years. I will be happy to be wrong here

We have kept the tool completely free to use for open-source projects. We’ve also partnered with GitHub Education and made it free for students. We’re an early stage company trying to build a business in automating objective parts of code review and making it easier for every developer to adopt and use static analysis. With all transparency, we had plans to sponsor open-source projects but got sidetracked due to various reasons. We will be backing some of the open-source projects, in next couple of weeks.

[1] https://gist.githubusercontent.com/jaipradeesh/6ad8404fef253...

[2] https://gist.githubusercontent.com/jaipradeesh/b8a0e6b526f73...

[3] https://deepsource.io/security

[4] https://vanta.com/guides/vantas-guide-to-soc-2

FanaHOVA · on March 11, 2020

Congrats on the HN launch guys :) Excited to see Javascript being added to the list of supported languages soon.

futhey · on March 12, 2020

How can we get notified when Javascript (Node.js?) support launches?

dolftax · on March 12, 2020

We'll tweet about it at https://twitter.com/deepsourcehq

punkohl · on March 12, 2020

How is this different (or better?) than existing products that offer the same service, such as Codacy.

Have been a paying customer of Codacy’s for ~2 years and they support most languages out of the box at this point, with Git integration similar to your own.

Curious on your thoughts.

sanketsaurav · on March 12, 2020

A few differentiators:

* More issue coverage — for Python, we detect 520+ issues. We also enable you to run things like type checking (if you're using type hints) just by enabling it in the config.

* Custom issues — we have an analyzer team that keeps adding new, novel checkers to the analyzer for common bugs and anti-patterns.

* Fewer false positives — we've optimized our analyzers for reporting less than 5% false positives. On the lowest level, we write augmentations to each checker to remove known false-positives and noise. On the application level, we enable users to very easily ignore issues (for a file, all test files, some file patterns), and also report a false positive. We monitor all false-positive reports and proactively improve our analyzers to resolve them.

* Autofix — we just released this, which allows you to automatically fix some commonly occurring issues directly from DeepSource. In future, we will add more autofixers for issues, so at least 70% issues that we detect can be reliably autofixed.

punkohl · on March 13, 2020

Based on the points above, I am still not convinced this is significantly better than existing players, but I could be wrong. Additionally, some of the problems you've mentioned in other comments have already been solved by your competitors.

Do you think the fact that you're a late entrant into this market makes it difficult and/or challenging for your team? Why have your customers chosen you over other platforms? I'm mostly curious and not trying to put you down.

bradleybuda · on March 11, 2020

Congratulations on the launch! Have been following the team's progress for a while and truly impressed with the pace of feature development while keeping the core product extremely simple. Happy to be a customer.

soumyadeb · on March 12, 2020

This looks awesome - congrats on the launch.

Quick question: I tried setting it up but its asking for Write access to the pull requests. I am a bit wary about giving write access - is this required?

dolftax · on March 12, 2020

There are two GitHub apps we maintain. One with read access (DeepSource) and one with write access (DeepSource Autofix).

By default, on signup, you would be installing the app with read access -- this enables us to pull source code from GitHub on every commit and pull-request, run analysis and report issues as GitHub checks. This is sufficient if you would like to use DeepSource only to flag issues.

With the release of Autofix -- when a fix is available for a flagged issue, DeepSource creates a pull request to the repository with the patch. For this, you would be asked to install the app with write access (DeepSource Autofix). Note that, DeepSource always creates a separate branch with the fixes and creates a pull request. We do not perform any write operations beyond the above mentioned scope.

Supermancho · on March 11, 2020

This is a serverside pre-PR hook for an analyzer, as I understand it.

I was hoping this was a code review tool that allows you to modify the PR without making a commit-merge-push loop, which could have approved changes automagically pulled locally (for the loop). This would save a TON on small edits that many PRs require, including any additional comments that people might want to add to code that come up during PR...modern PRs is where context goes to die.

dolftax · on March 11, 2020

We went ahead with integration with providers like GitHub and GitLab to have these checks in a central place as it is the easiest way for a team to adopt a tool like ours. Also, just having a local or IDE plugin doesn't ensure these issues never make it to trunk unless everyone in the team follows it strictly.

That said, for the convenience of developers, we're working on the ability to run the analysis and the fixes using our CLI. [1] This opens up doors to use the CLI and build IDE plugins in the near future.

[1] https://github.com/deepsourcelabs/cli/issues/15

jujodi · on March 14, 2020

So our CI pipelines are always set up so that failed linting means blocked merge capability. Your PR isn't ready for review if it's failing rubocop for example. Do you intend to integrate your tool into this type of workflow but by making the lint issues apparent via comment on the PR in GitHub vs in the CI?

dolftax · on March 15, 2020

DeepSource integrates with GitHub checks [1] and via the dashboard, you can select the issue types (anti-patterns, bug risks, performance and security issues, style, type checks and documentation), which when detected, will cause analysis runs to fail and pull requests to be blocked.

[1] https://pasteboard.co/IZfSThC.png [2] https://pasteboard.co/IZfT8uw.png

slewis · on March 11, 2020

Cool! Can you provide some real world examples of issues you flag? I poked around on your site and didn’t see any.

sanketsaurav · on March 11, 2020

A few issues from our Python analyzer:

* Dangerous mutable default argument passed in functions

``` def some_func(arg=[1,2,3]):

...

```

which should be

``` def some_func(arg=None):

    if arg is None: arg = [1,2,3]
    ...

```

* `yield` used inside a comprehension (which breaks code in Python 3.8)

* file opened with the "r" flag, but a write is attempted on the file

* i/o detected on a closed file descriptor

* providing an unexpected keyword argument in a function call

pabs3 · on March 12, 2020

Is there a list of the open source static analysis tools that you are using? Do you have any proprietary tools you have written?

einpoklum · on March 11, 2020

When you offer support for C++, we'll talk. More challenging to parse and analyze, of course.

dolftax · on March 12, 2020

Sure. I've left you an email.

pedro596 · on March 11, 2020

Congrats! Any plans to add support for more languages?

sanketsaurav · on March 11, 2020

Ruby is already in beta, stable release in the next 3-4 weeks. Next up is JavaScript. Rust, Java, and PHP are further down the line.

ftonobo · on March 11, 2020

How does it compare to static analysis as rubocop actually does. Especially in who decides what anti-patterns are

sanketsaurav · on March 11, 2020

For our analyzers, we actually do use existing static analysis behind the scenes in addition to our custom checkers that we write by hand. So our Ruby analyzer, which is in beta at the moment, does use Rubocop behind the scenes. We’re working towards the stable release of Ruby analyzer which uses augmentations to remove false positives and decrease the noise — since guaranteeing less than 5% false positives is one of the primary values that DeepSource adds. As the anlayzer moves towards stable, we'll add custom issues to it.

The general categorization of anti-patterns is based on the consensus of the community around the language, and also some obvious things based on objective reasons. Although we understand that everyone has their own flavor of conventions — so it is very easy to triage and ignore specific issues in DeepSource.