Hacker News new | past | comments | ask | show | jobs | submit login
Launch HN: DeepSource (YC W20) – Find and fix issues during code reviews
105 points by dolftax on March 11, 2020 | hide | past | favorite | 27 comments
Hi HN! We're Jai and Sanket — founders of DeepSource (https://deepsource.io). We’re automating objective parts of code review using static analysis to ensure the code is free of common issues (anti-patterns, bug risks, performance bottlenecks, and security flaws) before a reviewer looks at it. This prevents the reviewer from having to manually point out objective issues and ensures they don’t make it to production.

After college, Sanket co-founded DoSelect where I joined as the first engineer. Both of us have been contributing to open-source projects for a few years then. In the beginning, we didn’t have any processes setup around code reviews. We had some IDE plugins to run the linters, and some team members used them as pre-commit hooks. We didn’t have any tests back then and used to spend too much time on some pull requests pointing out improvements and if the pull request was very large, we never reviewed it — direct merge. Then the engineering team started to grow, multiple folks started contributing to the same repositories and pull requests were often stuck for 5-7 days without any activity. To make sure the new commits are free of the common issues, we added multiple static analysis tools as part of our CI jobs. This became a pain sooner than expected as they were throwing hundreds of lines of logs in the CI and we had to fight through duplicate issues. Critical issues were hidden amongst other minor issues and false-positives, and often missed. Once a while, we tweaked the linter config files with the issues that didn’t make sense to us — to reduce noise in the CI logs. It didn’t work out after a while and we invested in a couple of commercial code quality tools but ended up disabling them as well. Their issues weren’t categorized or prioritized, analyzers were never updated with new rules, didn’t have any way to report false-positives.

We came across a paper — Lessons from building static analysis at Google [1]. It is a beautiful paper with the following insights: 1) Static analysis authors should focus on the developer and listen to their feedback 2) Careful developer workflow integration is key for static analysis tool adoption 3) Static analysis tools can scale by crowdsourcing analysis development.

We started building DeepSource in December 2018. The initial release supported Python and integrated with GitHub. Our approach was to first curate all the issues available from open-source static analysis tools, de-duplicate them, add better descriptions with external reference links — so you just add python analyzer to the `.deepsource.toml` file with some metadata (version, test patterns, exclude patterns,.) and analysis will run on every commit and pull request. To cut down the noise, we only show you newly introduced issues in the pull-request by default, based on the changeset — and not all the issues present in the changed files. We also provide a way for you to report false-positive issues directly from the dashboard. If the report is valid, we update the analyzers to resolve it within 48 - 72 hours. After this release, we started writing our own rules by walking through the Abstract Syntax Tree to find patterns. So far, we’ve 520+ types of issues in the Python analyzer. Some of the custom issues we added recently are: File opened without the `with` statement, using `yield` in comprehension instead of a generator expression, use items() to iterate over a dictionary.

A few months back, we released the Go analyzer and also added support for GitLab. We’re working on supporting Ruby and JavaScript and integrations for Bitbucket and Azure DevOps. The analyzers are not limited to programming languages, and we added one for Dockerfile and Terraform as well. DeepSource is free to use for open-source repositories and we make money from private repositories based on a per developer per month/year subscription.

Lately, we realized some of the issues were occurring in tens of files. Though DeepSource reports them, one had to manually fix all the occurrences. We just released autofix support in Python for 15 most commonly occurring issues to start with. Autofix uses Concrete Syntax Tree to visit the issue location and make modifications in the code for which the issue is raised, and then generate a patch for that modification. When an autofix is available for an issue, you can view the suggested patch and on approval, a pull request will be created with the fixes. We're working on improving the coverage of issues we can autofix across the analyzers we support.

Give us a try: https://deepsource.io/ Here is the documentation: https://deepsource.io/docs/

We would love to hear your experience using these tools and feedback/suggestions on how can we improve! Please let us know in the comments. We’re also at founders [at] deepsource.io.

[1] https://research.google/pubs/pub46576/




> Enable analyzers by adding .deepsource.toml

Why is this needed? Can't you imply the necessary analyzers from my codebase?

There is no support for Javascript, Typescript, PHP, Java or C#. No HTML or CSS support. Is there a roadmap?

Also, the name implies use if Deep Networks and AI. Am I mistaken? If not, what kind of AI is used here? Seems like just an automatic runner of static analysis tools.


> Why is this needed? Can’t you imply the necessary analyzers from my codebase?

Sure. We can probably infer the languages used in the repository. But we need metadata like test glob patterns, exclude patterns, runtime versions (Python 2, Python 3) to improve the accuracy of issues. For ex: Usage of assert statement in application logic is discouraged as it is removed when compiling to optimised byte code (python -o producing *.pyo files). Ideally, assert statement should be used only in tests. Also, we haven’t found a way to infer Python 2 vs Python 3 accurately. Can you think of a way? That would be helpful.

> There is no support for Javascript, Typescript, PHP, Java or C#. No HTML or CSS support. Is there a roadmap?

We strongly believe in starting out with a few languages and add as many issues as we can (with the ability to autofix most of them) -- before we go broad. That said, we released Ruby in beta couple weeks back and are currently working on the stable release. We’ve also started working on JavaScript (with TypeScript support) a month back and we should release the beta version of JavaScript analyzer in approx a month from now.

> Also, the name implies use if Deep Networks and AI. Am I mistaken? If not, what kind of AI is used here? Seems like just an automatic runner of static analysis tools.

It’s just the name :) We do not use Machine Learning or AI at the moment — the reason being we’re optimizing for high accuracy, and a rules engine that uses AST parsing helps us do that reliably. We do plan to use learning in the future to capture data around which issues are being fixed the most and which are not, and then show issues in the most relevant order to users depending on their context.


> We’re working on supporting Ruby and JavaScript and integrations for Bitbucket and Azure DevOps.


There's a better solution: use open-source cli tools that do just that!

1. 520 Python checks? Use `wemake-python-styleguide` (wrapper around flake8) that has bigger amount of checks: https://github.com/wemake-services/wemake-python-styleguide There's also `pylint` with a set of awesome checks as well.

2. Type checking? Use `mypy`: it just a single command!

3. Autofixing? Use `black` / `autopep8` / `autoflake` and you can use `pybetter` to have the same ~15 auto-fix rules. But, it is completely free and open-source

I don't like this whole idea of such tools (both technically and ethically):

- Why would anyone want to send all their codebase to 3rd party? We used to call it a security breach back in the days

- On moral side, this (and similar) projects look like thin wrappers around open-source tools but with a monetisation model. How much do these companies contribute back to the original authors of pylint, mypy, flake8? Ones who created and maintained them for years. I will be happy to be wrong here


> There's a better solution: use open-source cli tools that do just that!

We do not deny that you can't run the open-source tools locally. Be it one line command, or be it setting up pylint or flake8 with dedicated configurations. DeepSource is a tool meant to eliminate the need to set up all those open source tools locally or in your CI pipeline. So that you don't need to

- Fish for issues amongst hundreds of lines of logs in the CI

- Figure out and update linter config to remove duplicates and false positives (for ex: Bandit throws errors like `assets statement used` in a test file — which is a false-positive. Bandit doesn’t know that it is a test file by default)

- Some issues needed better description of why is that an issue, for ex: why should default file permissions be 0600? Justification on why is it necessary,.

- By default on every commit or pull request, linters run on all the files.

- If there are issues that occur in say 50 places, one have to manually fix it.

> 1. 520 Python checks? Use `wemake-python-styleguide` (wrapper around flake8) that has bigger amount of checks: https://github.com/wemake-services/wemake-python-styleguide There's also `pylint` with a set of awesome checks as well.

Our focus at the moment is not on style issues. In fact, amongst the categories of issues we raise (anti-patterns, bug-risks, performance, security, style, documentation), style issues are the most debated on by our users as it is really subjective. We’re thinking of removing style issues by default (as an opt-in) and are working on running formatters like `black`, `yapf`, .. with a single line config in `.deepsource.toml`. Our analyzer team actively adds custom rules which you don’t get from the open-source tools. The following issues for example:

- Raising another exception when `assert` fails is ineffective. For ex: `assert isinstance(num_channels, int), ValueError('Number of image channels needs to be an integer')`

- If the condition would not be satisfied, user would be expecting a `ValueError`, but this would be raised: `AssertionError: Number of image channels needs to be an integer` which should be

- `yield` used inside a comprehension (which breaks code in Python 3.8)

- Write operation on file that is opened in read-only mode

- I/O detected on a closed file descriptor

> 2. Type checking? Use `mypy`: it just a single command!

Sure. If one prefers running it locally (or) as part of their CI. But if you already use DeepSource to flag issues, it can be enabled by a single line in .deepsource.toml file.

> 3. Autofixing? Use `black` / `autopep8` / `autoflake` and you can use `pybetter` to have the same ~15 auto-fix rules. But, it is completely free and open-source

We are working on adding support for autopep8, black and autoflake in coming weeks. They mostly auto-patch stylistic issues [1]. Thanks for letting us know about pybetter. It looks like a great tool and fixes ~9 issues [2]. DeepSource’s autofix aim is to fix more than 3/4th of issues we detect and we detect 522 issues in our Python analyzer. We have dedicated engineering team actively working on the analyzers. As of today, following are some of the issues our Python analyzer can autofix (which I couldn’t find it among the open-source tools):

- No use of `self`

- Usafe of dangerous default argument

- Module imported but unused

- Function contains unused argument

- Debugger import detected

- Debugger activation detected

- Unnecessary comprehension

- Unnecessary literal

- Unnecessary call

- Unnecessary typecast

- Bad comparison test

- Empty module

- Built-in function `len` used as condition

- Unnecessary `fstring`

- `raise NotImplemented` should be `raise NotImplementedError`

- `assert` statement used outside of tests

Same goes with Go and other analyzers we support.

> I don't like this whole idea of such tools (both technically and ethically): > Why would anyone want to send all their codebase to 3rd party? We used to call it a security breach back in the days.

We follow strict security practices [3]. In a gist, 1) We do not store your code, 2) Source code is pulled in an isolated environment that has no access to any of our internal systems or the external network, 3) As soon as the analysis is completed, the environment is destroyed and all logs are purged. Also, there are many tools that developers use everyday (Travis CI, Circle CI, GitHub) where the source code is sent to the cloud — I don't think it is accurate to call it a security breach. That said, we have on-premise setup of DeepSource in the roadmap. We’re working on SOC 2 Type 2 compliance as well [4].

> On moral side, this (and similar) projects look like thin wrappers around open-source tools but with a monetisation model. How much do these companies contribute back to the original authors of pylint, mypy, flake8? Ones who created and maintained them for years. I will be happy to be wrong here

We have kept the tool completely free to use for open-source projects. We’ve also partnered with GitHub Education and made it free for students. We’re an early stage company trying to build a business in automating objective parts of code review and making it easier for every developer to adopt and use static analysis. With all transparency, we had plans to sponsor open-source projects but got sidetracked due to various reasons. We will be backing some of the open-source projects, in next couple of weeks.

[1] https://gist.githubusercontent.com/jaipradeesh/6ad8404fef253...

[2] https://gist.githubusercontent.com/jaipradeesh/b8a0e6b526f73...

[3] https://deepsource.io/security

[4] https://vanta.com/guides/vantas-guide-to-soc-2


Congrats on the HN launch guys :) Excited to see Javascript being added to the list of supported languages soon.


How can we get notified when Javascript (Node.js?) support launches?


We'll tweet about it at https://twitter.com/deepsourcehq


How is this different (or better?) than existing products that offer the same service, such as Codacy.

Have been a paying customer of Codacy’s for ~2 years and they support most languages out of the box at this point, with Git integration similar to your own.

Curious on your thoughts.


A few differentiators:

* More issue coverage — for Python, we detect 520+ issues. We also enable you to run things like type checking (if you're using type hints) just by enabling it in the config.

* Custom issues — we have an analyzer team that keeps adding new, novel checkers to the analyzer for common bugs and anti-patterns.

* Fewer false positives — we've optimized our analyzers for reporting less than 5% false positives. On the lowest level, we write augmentations to each checker to remove known false-positives and noise. On the application level, we enable users to very easily ignore issues (for a file, all test files, some file patterns), and also report a false positive. We monitor all false-positive reports and proactively improve our analyzers to resolve them.

* Autofix — we just released this, which allows you to automatically fix some commonly occurring issues directly from DeepSource. In future, we will add more autofixers for issues, so at least 70% issues that we detect can be reliably autofixed.


Based on the points above, I am still not convinced this is significantly better than existing players, but I could be wrong. Additionally, some of the problems you've mentioned in other comments have already been solved by your competitors.

Do you think the fact that you're a late entrant into this market makes it difficult and/or challenging for your team? Why have your customers chosen you over other platforms? I'm mostly curious and not trying to put you down.


Congratulations on the launch! Have been following the team's progress for a while and truly impressed with the pace of feature development while keeping the core product extremely simple. Happy to be a customer.


This looks awesome - congrats on the launch.

Quick question: I tried setting it up but its asking for Write access to the pull requests. I am a bit wary about giving write access - is this required?


There are two GitHub apps we maintain. One with read access (DeepSource) and one with write access (DeepSource Autofix).

By default, on signup, you would be installing the app with read access -- this enables us to pull source code from GitHub on every commit and pull-request, run analysis and report issues as GitHub checks. This is sufficient if you would like to use DeepSource only to flag issues.

With the release of Autofix -- when a fix is available for a flagged issue, DeepSource creates a pull request to the repository with the patch. For this, you would be asked to install the app with write access (DeepSource Autofix). Note that, DeepSource always creates a separate branch with the fixes and creates a pull request. We do not perform any write operations beyond the above mentioned scope.


This is a serverside pre-PR hook for an analyzer, as I understand it.

I was hoping this was a code review tool that allows you to modify the PR without making a commit-merge-push loop, which could have approved changes automagically pulled locally (for the loop). This would save a TON on small edits that many PRs require, including any additional comments that people might want to add to code that come up during PR...modern PRs is where context goes to die.


We went ahead with integration with providers like GitHub and GitLab to have these checks in a central place as it is the easiest way for a team to adopt a tool like ours. Also, just having a local or IDE plugin doesn't ensure these issues never make it to trunk unless everyone in the team follows it strictly.

That said, for the convenience of developers, we're working on the ability to run the analysis and the fixes using our CLI. [1] This opens up doors to use the CLI and build IDE plugins in the near future.

[1] https://github.com/deepsourcelabs/cli/issues/15


So our CI pipelines are always set up so that failed linting means blocked merge capability. Your PR isn't ready for review if it's failing rubocop for example. Do you intend to integrate your tool into this type of workflow but by making the lint issues apparent via comment on the PR in GitHub vs in the CI?


DeepSource integrates with GitHub checks [1] and via the dashboard, you can select the issue types (anti-patterns, bug risks, performance and security issues, style, type checks and documentation), which when detected, will cause analysis runs to fail and pull requests to be blocked.

[1] https://pasteboard.co/IZfSThC.png [2] https://pasteboard.co/IZfT8uw.png


Cool! Can you provide some real world examples of issues you flag? I poked around on your site and didn’t see any.


A few issues from our Python analyzer:

* Dangerous mutable default argument passed in functions

``` def some_func(arg=[1,2,3]):

    ...
```

which should be

``` def some_func(arg=None):

    if arg is None: arg = [1,2,3]
    ...
```

* `yield` used inside a comprehension (which breaks code in Python 3.8)

* file opened with the "r" flag, but a write is attempted on the file

* i/o detected on a closed file descriptor

* providing an unexpected keyword argument in a function call


Is there a list of the open source static analysis tools that you are using? Do you have any proprietary tools you have written?


When you offer support for C++, we'll talk. More challenging to parse and analyze, of course.


Sure. I've left you an email.


Congrats! Any plans to add support for more languages?


Ruby is already in beta, stable release in the next 3-4 weeks. Next up is JavaScript. Rust, Java, and PHP are further down the line.


How does it compare to static analysis as rubocop actually does. Especially in who decides what anti-patterns are


For our analyzers, we actually do use existing static analysis behind the scenes in addition to our custom checkers that we write by hand. So our Ruby analyzer, which is in beta at the moment, does use Rubocop behind the scenes. We’re working towards the stable release of Ruby analyzer which uses augmentations to remove false positives and decrease the noise — since guaranteeing less than 5% false positives is one of the primary values that DeepSource adds. As the anlayzer moves towards stable, we'll add custom issues to it.

The general categorization of anti-patterns is based on the consensus of the community around the language, and also some obvious things based on objective reasons. Although we understand that everyone has their own flavor of conventions — so it is very easy to triage and ignore specific issues in DeepSource.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: