EvalAI: An Open-Source Alternative to Kaggle

thadk · on June 25, 2021

This article "50 Years of Data Science" by David Donoho (2017) had a sharp point about Kaggle-like competitions being under-valued by stats:

  "To my mind, the crucial but unappreciated methodology driving predictive modeling’s success is what computational linguist Mark Liberman (Liberman 2010) has called the Common Task Framework (CTF). An instance of the CTF has these ingredients:
  (a) A publicly available training dataset involving, for each observation, a list of (possibly many) feature measure- ments, and a class label for that observation.
  (b) A set of enrolled competitors whose common task is to infer a class prediction rule from the training data.
  (c) A scoring referee,to which competitors can submit their prediction rule. The referee runs the prediction rule against a testing dataset, which is sequestered behind a Chinese wall. The referee objectively and automatically reports the score (prediction accuracy) achieved by the submitted rule.
  ...
  The general experience with CTF was summarized by Liberman as follows:
  1. Error rates decline by a fixed percentage each year, to an asymptote depending on task and data quality.
  2. Progress usually comes from many small improvements; a change of 1% can be a reason to break out the champagne.
  3. Shared data plays a crucial role—and is reused in unex- pected ways.
  ...

  The author believes that the Common Task Framework is the single idea from machine learning and data science that is most lacking attention in today’s statistical training.

https://www.tandfonline.com/doi/full/10.1080/10618600.2017.1...

usmannk · on June 25, 2021

> which is sequestered behind a Chinese wall.

hm, hadn't heard this one before (https://en.wikipedia.org/wiki/Chinese_wall). Seems a bit anachronistic.

studentrob · on June 25, 2021

How does this relate to or differentiate eval.ai from kaggle?

tmabraham · on June 25, 2021

To me this looks more like a competitor to CodaLab than Kaggle, and seems to target the ML research competition market. It seems unlike Kaggle, it doesn't seem to support things like notebooks, discussions, etc.

I might add that the comparison to Kaggle is added by the OP, I don't see it mentioned on the website anywhere.

studentrob · on June 25, 2021

> the comparison to Kaggle is added by the OP, I don't see it mentioned on the website anywhere

It's on their github page,

https://github.com/Cloud-CV/EvalAI#platform-comparison

tmabraham · on June 25, 2021

Ah that's unfortunate, I didn't know.

That said, my point still stands. Additionally, the comparison they did is massively unfair. I'm pretty sure Kaggle competitions have most of the feature they claim it doesn't have (ex: multiple phases, custom metrics, evaluation in environments).

studentrob · on June 25, 2021

> Ah that's unfortunate, I didn't know.

No sweat, we can't know everything, god save the interwebs!

> the comparison they did is massively unfair.

Agreed. This is all puff

lopuhin · on June 25, 2021

Kaggle has two parts - more popular "featured" competitions which have most of those features, but cost money to host. And then also self-hosted competitions which indeed lack all those features and are more relevant for this comparison IMO.

seasily · on June 25, 2021

Their comparison chart is outright defamatory, as Kaggle features all of the below:

Custom metrics Multiple phases/splits Remote evaluation Human evaluation Evaluation in Environments

The actual rankings are: #1 Kaggle #2 DrivenData

Honorable mention but poorly managed: AiCrowd

No one else has any level of funding to incent performance.

retrovrv · on June 25, 2021

The comparison is indeed quite old and hasn't been updated. I work with AIcrowd and would love to know why you think it is poorly managed. I know we have a long way to go!

silcoon · on June 25, 2021

I'm an open source supporter, but why do a open version of Kaggle? Is not needed. Stop cloning ideas and claim an open source version, a good product is much more than just the source code.

jonbaer · on June 25, 2021

They make better points on their readme, https://github.com/Cloud-CV/EvalAI ... I wonder about the automation of the entire model and putting the winning solution right into production and it can be shared and openly available. Imagine you just describe a problem and let the crowd produce the best and the top solution is always available with zero work.

malux85 · on June 25, 2021

I agree that a good product is more than just the source code but you shouldn’t discourage cloning ideas, because building more specialised versions, or enhancing features is much easier if the source code is available.

dexter1691 · on June 30, 2021

I am one of the members managing EvalAI. We started EvalAI in order to meet the demands of our own research lab (https://github.com/batra-mlp-lab). In 2016, 2017, we were running the VQA competition on Codalab and it was not a pleasant experience -- the submissions would get stuck; the evaluation was slow; etc. We had the in-house expertise (two grad students) to build something better. We could manage the entire stack ourselves and easily add custom features (code-upload; custom metrics; private/remote evaluation; human-in-the-loop evaluation). Turns out that a lot of folks in the computer vision community were looking for something similar (130+ challenges; 30+ organizations). As we became more mature, multiple companies (Ebay, IBM, Mapillary) cloned EvalAI to host their own versions. They often contributed back their features and it was overall a net positive to have an open-source version. It also allows for faster iteration, experimentation (human-in-the-loop evaluation, Challenge Entries to Demo). For instance, recently PapersWithCode collaborated with us to deep-link leaderboard results on EvalAI with their own leaderboard tables https://twitter.com/paperswithcode/status/134108528597578957...). I agree that the comparison to Kaggle is a bit old and we have removed it (https://github.com/Cloud-CV/EvalAI/pull/3502). :-)

croes · on June 25, 2021

Because Kaggle is owned by Google, so it's unpredictable how long it will exist.

wodenokoto · on June 25, 2021

Since 2017! I had no idea until now.

I guess that explains the access to computing power for all the notebooks.

studentrob · on June 25, 2021

What is closed about kaggle that this does differently? I've always viewed that community as extremely open, even allowing hosts the option to open source solutions or not, thus attracting more challenges and competitors.

The about page [1] for this project says,

> With EvalAI, we want to standardize the process of evaluating different methods on a dataset

Standardization often does come from openness, I'm just not sure juxtaposing against kaggle is the right move.

[1] https://eval.ai/about

erwincoumans · on June 25, 2021

The Web Backend, Web Frontend, Submission Scoring Service, Code Execution Service (docker ikages). See also https://www.quora.com/What-is-the-technology-stack-used-in-K...

studentrob · on June 25, 2021

The service itself? Okay, so in theory anyone could host competitions with this backend.

The lion's share of Kaggle's effort lies in their expertise in (1) making deals, and (2) making sure the challenge is legit (no bias in training data, not too easy, etc.). That's not easily replicable, so while an "Open-Source alternative" sounds good up front, the bulk of the work remains.

retrovrv · on June 25, 2021

Benchmarking SOTA solutions is an interesting differentiator from Kaggle. Kaggle's community and scale is unparallelled though, which is actually what attracts very talented and competitive data scientists to its challenges.

AIcrowd(https://www.aicrowd.com/) in that sense is interesting, because it hosts research challenges like the ones that EvalAI does and it also hosts more attractive Kaggle-like challenges. It's also the most modular platform to accommodate unorthodox evaluation and submission formats.

yewenjie · on June 25, 2021

Is there a cheap way to run deep learning on the server but not in a Jupyter notebook?