I would suggest that anyone considering entering a kaggle.com competition take a look at their Terms and Conditions : http://kaggle.com/Legals/terms Note section 9 which appears to require the winner to assign copyright on the submission to the organizer.
Giving up all rights to your code has never seemed to me to be a good strategy for a developer (except possibly in the case of full time employment).
Submissions do not consist of code, merely predictions (at least for the competitions I'm familiar with). I don't think there is anyway they could compel you to give up your code, but I'm no lawyer.
The winning entry has to be a general algorithm that can be implemented by the RTA. That means that it takes a timestamp as an input and can generate predictions for the next 15 mins, 30 mins, 45 mins, 60 mins etc. Results must also be replicated by the RTA before any prize money is awarded.
I'm not sure it's entirely useless. By comparing the traffic patterns before and after removing the tollbooth, you could probably quantify pretty accurately the influence it had on the traffic patterns, and then apply that to refine your models of other congestion points.
NSW has been suffering for the last decade from a totally clueless government, and an opposition totally owned by its fringe right-wing. Labor couldn't govern their way out of a paper bag and the Libs figure that instead of drifting to the centre to meet the voters, all they need to do is wait until the voters give up and move to them.
NSW doesn't have any money, so they can't do any major roads project without farming it out to private investors and putting up a toll booth to pay for it.
This has led to some really bad policy decisions. For one thing, most of the new roads are bypasses that allow drivers to skip congested areas like the middle of the city. The toll then discourages people from using the bypass and the city remains clogged. Then to bail out the investors, the government closes lanes on the free alternative routes to force people to take the tollway.
There was an amusing couple of years where you could get away with not paying tolls just by driving through the electronic gate and ignoring the infraction warnings. Because the RTA was doing all the work of collecting the money but not getting to keep any of it, they didn't care enough to follow up people who didn't pay.
I have no idea how accurate those times are, I always assumed they would just be basic approximations but you may well be right that they could be more in depth.
When I lived there up to a couple of years ago, they were surprisingly accurate, even during fairly heavy traffic.
I can't recall the exact details, but from memory they use road counters at entries and exits to measure traffic numbers within sectors and, based on theoretical maximum number of cars per sector, predict whether traffic is capable of flowing at the speed limit or at some calculated reduced speed for each sector. The speed calculations, along with each sectors' distance, give you predicted trip time to the various exits on the display.
As I said, don't quote me, but that was my understanding.
I'm little bit sad to see that HN'ers are discussing the prize and money/time ratio instead of the challenge itself. What happened to the "hacker" in "Hacker News"?
I agree, money is a factor - but it doesn't have to decide your every move. This is an interesting challenge whether the prize offered is worth your time or not. The entrepreneurs are supposed to bunch of people who likes solving interesting challenges - not a bunch of people who likes solving interesting problems if only it makes you money.
I've fired up R and am doing some exploratory data analysis. I think its an interesting problem and will definitely submit something if I can get decent predictions with the downloaded data.
I've looked at the literature and research has tackled such problems using computer science(neural nets) and statistics (linear regression, Kalman filters). In my opinion, its easier to approach it as a statistics problem than a computer science one and this may explain why there is some hostility on HN. For instance, getting $10000 for a linear regression is easy money (although I would be surprised if the winning entry used this).
That's fair enough I guess (I'm one of the persons who commented about the money & IP factors.)
However, there are literally millions of problems like this that you could choose to solve for your own challenge. As I see it, this specific problem is interesting enough but not especially so compared to the total set of possible computer-oriented challenges. Except for the fact it's being offered up with competition, fame and fortune as a reward.
Also, the IP factor seems to imply you can't, say, release an open source project with your winning algorithm and let anyone go away and use it. It seems you have to assign it entirely to the NSW RTA when you're done. That's actually the part that grates on me the most, it seems a lot more "contractor ethic" than "hacker ethic".
I thought like this for much of my life until I realized how badly the system exploits those who create value in science and technology.
I suspect we live in a world where about 90% of the value is created by about 1-2% of the people who receive maybe 5-10% of the economic benefit. Personally I'm not very satisfied with that distribution but I don't see it improving unless we become less naive about economic/legal/political matters.
Unfortunately I don't have time to ponder with this (explaining would be beyond the scope of a quick HN comment), but I think that using Markov chains/HMMs would be a nice way to attack this problem...
(Disclaimer: my area of expertise is not ML or AI, but I remembered this from one of the more interesting courses at university and always wanted to code something using it. If you know your way around these fields, I would love to read about other ideas/approaches...)
I've never looked at commute time data before, so I'm just an armchair analyst.
It seems to me though, as a driver, that commute times are dominated by accidents. And certainly while you could estimate the mean probability of an accident based on the number of travelers, the daily variance seems insanely high and its unlikely that any amount of training data is going to decrease that much. I wonder how good these results are going to be.
Still, something could be done - for example, I know that the M4 heads west out of the city, and so is subject to poor visibility during afternoon peak due to sunset.
While the estimates for < 1 hour might be thrown off by an accident, you could get fairly accurate 1+ hour traffic accounting for accident probabilities, and the time it takes for them to disperse.
I would assume they want to build a realtime system and can feed in accidents to change the prediction time for a web/mobile application and/or electronic boards on the motorway.
So in that case, you'd have to be able to input an accident at place x and have that reflect in the projected times.
This is slightly susceptible to result intervention - algorithm not doing too well? Drive a tractor slowly down the inside lane until reality matches your prediction!
The competition seems artificially restricted based on what it's trying to accomplish. Why limit yourself to past data? We have immense sources of data for events months before they actually happen. Why wouldn't you account for a cricket match/product release/concert that's scheduled to happen a month in the future?
"This competition requires participants to predict travel time on Sydney's M4 freeway from past travel time observations". This line seems to suggest that the past travel time is the most important part of the experiment; however, as one other (rrrhys) pointed out, the data is useless, since the road has changed, and the grandparent of this post mentioned sporting events affecting traffic.
All of that said, perhaps a strong model can be generated using just historical data.
Would you consider this kind of action as 'scraping the bottom of the barrel'? In the design community so-called 'competitions' are considered rorts and shunned by anyone worth their weight.
What is ironic is that the NSW govt. probably have spent 50 times the figure working with private companies attempting to develop an acceptable solution, and this stinks of a last ditch effort.
There is a fundamental difference between working on a problem like this and doing design work. If you work on this problem you will probably advance our knowledge in some domain and can even apply the results to other problems (such as network routing). Look at the groups that worked on the Netflix Prize. Most of the leading teams had academics seriously working on them.
With a design competition you are basically doing work for free and the only one that benefits is the one giving the competition.
I'll say. There is a lot of work in this, and $10k is not going to attract many professionals. They're obviously out for the talented hobbyist or beginner.
Considering how much the RTA budget would be, it's almost insulting. Just having a policeman sit in a car by some roadworks (so-called 'specials') would cost them about $1000/shift.
The last time a new road project got opened near me the launch party would have cost nearly $10,000 by the time it was done.
The prize should be at least 50 or 100,000. It would be a tiny spec in the budget but that would motivate people to spend some real time on the problem.
Take a look at some of the other projects on Kaggle - a lot of them have no money or roughly similar amounts. One of the main crowds Kaggle attracts is University students and staff, and they do these projects because they're interesting and fun. It's similar to a shared task at an academic conference, which is the sort of thing most of the active ML/KD&DM community are interested in.
Agreed. The Netflix Prize was $1M, and this doesn't seem like just 1% of the work. Not to mention Netflix only required an non-exclusive unrestricted license from the winning entry, rather than full reassignment of copyright (see tgflynn's comment.)
Well Netflix also offered $50k each year for a "Progress Prize", but you're right.
I guess it is a question of "you get what you pay for", and I expect they'll get something quite usable but not great. Certainly good value for them, compared to contracting someone to do it.
I think you're mis-applying the example of design contests (which I agree are generally lame, but for a different reason). This is a chance for someone to do a public service and solve an interesting prediction problem. It's for the government and benefits commuters, not some for-profit corporation.
That said, I think this would be a more interesting contest if Sydney showed their current work on the problem and measured progress against it (a la Netflix)
Yeah. Probably 90% of the people who end up doing this competition would end up enjoying it anyway if there was no prize; the prize just provides the activation energy to actually sign up.
Not worth it. A better way to host a public contest is the way NYC is doing it with NYCBigApps. Take a look at their ToS, http://nycbigapps.com/rules, note section D. Those terms look fair to me.
Giving up all rights to your code has never seemed to me to be a good strategy for a developer (except possibly in the case of full time employment).