More

nicklo · on Aug 23, 2017

Check out http://reddit.com/r/cryptomarkets. Lots of good discussion (just ignore the HODL/moon trolls).

nicklo · on Aug 3, 2017

This article seems to be trying to disarm tech elitism by diving into the deep end of it, but by and large the humor is pretty forced and the tone condescending.

I'd also recommend that the author remove the attempt at being edgy at the end with the snide reference to the reports of sexual harassment at Uber. Especially the cheeky "too soon?", which simultaneously acknowledges the gravity of the situation for the victims while also dismissing it entirely in favor of an attempted joke.

imartin2k · on Aug 3, 2017

So it is not only that you don't find it funny (which is a common thing about jokes - many jokes are bad in the eyes of many people), but also that you want someone else to therefore remove this part.

Where does this idea come from that just because one doesn't like something or finds it stupid/silly/inappropriate/not funny, it has to disappear?

dreae · on Aug 3, 2017

Before you get too defensive consider that they said they'd 'recommend' the author remove the joke, not that it has to go. That's how constructive criticism works.

I think the comment can be better interpreted as 'the tone of the article could be improved, in my opinion, if the author would remove the joke about sexual harassment at the end'

nicklo · on June 13, 2017

Super exciting seeing on-par performance of RL tasks with dramatically less supervision.

Really looking forward to a follow-up where they explore 2.2.4 further. Sampling examples which provide maximal information game seems like it could result in another huge reduction in the amount of human oversight necessary. Could see an adversarial scheme which could learn to sample these examples optimally from the manifold. This kind of thing is powerful in human learning of complex tasks to ask for clarification or feedback in specific places of uncertainty.

nicklo · on Feb 2, 2017

Hey HN! One of the organizers here- let me know if you have any questions!

Omnipresent · on Feb 2, 2017

I'm very interested in these lectures and am looking forward to digging into it.

I was wondering if you could provide some feedback on whether deep learning would be useful in classifying images that have text or not. For example, looking at a set of images I wish to classify the ones that have text and the ones that don't have text. A dataset could be like this:

text: http://bit.ly/2k0IXPv http://imgur.com/SvyoEo9 http://imgur.com/mu7vHRa

no text: http://bit.ly/2kUt1wA http://bit.ly/2ku4eSh

Thanks for organizing this.

nicklo · on Feb 2, 2017

Yeah for sure - these images are pretty different in their composition so it should be pretty easy to classify them. How large is your dataset? Do you need to collect one?

With small amounts of data, transfer learning is the most effective approach. There's a great tutorial on retraining inception for your own categories in TensorFlow: https://www.tensorflow.org/how_tos/image_retraining/.

Omnipresent · on Feb 2, 2017

I don't have a dataset at the moment. I would have to build one. I was thinking of about 200 images in the dataset with 100 of text and 100 of non text. Would that be big enough dataset for transfer learning? Please let me know if there is a dataset you know of that I could leverage

I'll follow that tutorial. Thanks!

yogeshp · on Feb 2, 2017

Are the lectures going to be mathematically rigorous or they would be inclined towards applications/concepts.

ematvey · on Feb 2, 2017

Thank you for doing this openly. Will you be posting the recordings of project brainstorm sessions?

nicklo · on Nov 13, 2016

This problem has already been solved with decent success using deep learning.

See: https://homes.cs.washington.edu/~jxie/pdf/deep3d.pdf

mrfusion · on Nov 13, 2016

That's a good start. I was thinking you could generate unlimited training data by using a game engine. You'd have the actual 3D model for every single frame.

phorese · on Nov 13, 2016

Yup, the community is on it!

http://www.cv-foundation.org/openaccess/content_cvpr_2016/ht...

https://link.springer.com/chapter/10.1007/978-3-319-46475-6_...

And there's more every week... Blender, Unity Engine, Unreal Engine, you name it. (Disclaimer: am author on one of these papers)

davesque · on Nov 13, 2016

I'm aware that certain automotive companies are already doing this.

nicklo · on Oct 2, 2016

Would be super cool if someone applied deep generative models to synthesize new natural images from input drawings using this dataset.

xentripetal · on Oct 2, 2016

I believe one came out recently, I saw it on GitHub a few days ago. It mainly did landscape and searched for similar images then combined them to form a new one. It also supported image manipulation, eg turning a brown square purse into a green rounded one. I'll see if I can find it and edit the link in.

EDIT: https://github.com/junyanz/iGAN

Heres a good video showing it off https://www.youtube.com/watch?v=9c4z6YsBGQ0&feature=youtu.be

nicklo · on Sept 22, 2016

Deep image captioning is essentially what you are describing.

Check out:

http://cs.stanford.edu/people/karpathy/deepimagesent/

http://cs.stanford.edu/people/karpathy/densecap/

nicklo · on Sept 17, 2016

I use a plain pdf resume that is written in Latex. I host it on Overleaf so its always easy to update.

http://nicklocascio.com/resume.pdf

nicklo · on Sept 14, 2016

It absolutely is. DeepMind reported that 1 second of audio generation takes about 90 minutes to generate.

throwawaymsft · on Sept 14, 2016

Assuming it's computation bound, it's a factor of 5400 (~13 doublings in CPU power required to get to real-time, assuming no algorithmic improvements).

mdsteph · on Sept 15, 2016

If I'm not mistaken, it seems that the current limitation is that it needs to be produced sequentially for a dependent sequence of audio, perhaps some independent sentences can be run simultaneously using copies of the net assuming no memory limitations. I wonder if it's already possible to create an auidobook for instance in reasonable time.

mattnewton · on Sept 15, 2016

Do they mention it was CPU trained? I assumed GPU. If it was CPU trained, I wonder what the operations keeping it off the GPU were?

Houshalter · on Sept 15, 2016

Google has special neural net ASICs now.

ogrisel · on Sept 15, 2016

Google never stated they use those to train models as far as I know. It seems that they are primarily used to spare energy when deploying trained models at scale.

Houshalter · on Sept 15, 2016

Theres no reason they couldn't use them to train, as long as they can account for the lower precision operations. I think it would be much cheaper to train on them, at that scale anyway.

dharma1 · on Sept 15, 2016

Afaik the Google TPU does inference only, at 8 bits. I don't think it's possible to train a neural network at 8 bit precision at this point in time. FP16 works for training though, and is twice as fast as FP32 on certain nvidia chips

Houshalter · on Sept 15, 2016

Backpropagation can work with any precision, as long as you use stochastic rounding (so that the rounding errors are not correlated.) Without stochastic rounding even 16 bits will have rounding error bias.

http://arxiv.org/abs/1412.7024

dharma1 · on Sept 15, 2016

OK. I was going by this - https://petewarden.com/2016/05/03/how-to-quantize-neural-net...

I haven't seen 8bit training implemented in any (public) frameworks yet - that's not to say it's not possible. If it works then that's great, especially for specialised hardware.

jcannell · on Sept 15, 2016

That doesn't imply they can run WaveNet yet - for inference this net is sort of worst-case serial. Their TPU ASIC is almost certainly highly parallel, like a GPU - actually has to be that way for energy efficiency (which is it's claimed benefit).

Wavenet actually looks like it could possibly have been designed to run on CPUs in production, at least after they can further optimize it some. Sampling is super slow right now because it requires an enormous number of tiny dependent TF ops and thus kernels that have huge overhead for tiny amounts of work. A custom implementation could probably circumvent that by evaluating all the layers sequentially in local cache on a fast CPU.

Or they just designed it without much concern for production plausibility yet.

Houshalter · on Sept 16, 2016

I'm not sure how this algorithm is serial. The neural net layers still involve huge convolutions that can all be done in parallel.

jbpetersen · on Sept 15, 2016

Building an ASIC for it would be another option to speed things up on the computation side.

Itsdijital · on Sept 14, 2016

Was that in the paper? I was looking for a source for it last night but couldn't come up with it

nshm · on Sept 15, 2016

Why would a honest researcher mention downsides of his work in a paper. No, it was on twitter https://www.reddit.com/r/MachineLearning/comments/51sr9t/dee...

confluence · on Sept 15, 2016

https://news.ycombinator.com/item?id=12463263

Looks like the source deleted their tweet.

lallysingh · on Sept 15, 2016

Can we just use 90 cores?

rryan · on Sept 15, 2016

Unfortunately no, see Amdahl's Law.

https://en.wikipedia.org/wiki/Amdahl%27s_law

arcanus · on Sept 15, 2016

If we did, it is not likely that the strong scaling is perfect.

nicklo · on Aug 30, 2016

Super cool stuff in this paper.

At its heart, this is a new training architecture that allows parameter weights to be updated faster in a distributed setting.

The speed-up happens like so: instead of waiting for the full error gradient to propagate through the entire model, nodes can calculate the local gradient immediately and estimate the rest of it.

The full gradient does eventually get propagated, and it is used to fine-tune the estimator, which is a mini-neural net in itself.

Its amazing that this works, and the implications that full back-prop may not always be needed shakes up a lot of assumptions about training deep nets. This paper also continues the trend of this year of using neural nets as estimators/tools to improve the training of other neural nets. (I'm looking at you GANs).

Overall, excited to see where this goes as other researchers explore the possibilities when you throw the back-prop assumption out.