Hacker News new | past | comments | ask | show | jobs | submit login
How to get started learning modern AI?
58 points by gofaifan on March 30, 2023 | hide | past | favorite | 40 comments
Neural networks! Bah! If I wanted a black box design that I don't understand, I would make one! I want rules and symbolic processing that offers repeatable results and expected outcomes!

...and maybe there's still a place for that.

But for someone who has been possibly foolish and ignoring neural network-oriented AI, where's the best place to start learning?

Go back to the basics of recurrent neural networks, deep learning texts, and so forth? Or is there a shortcut into the hip and popular transformer-based technology at this point?




Here’s the thing about modern AI.. They are black boxes. Not in a negative way, they just are.

Logically, predictions are extremely simple: data (encoded in some way) goes in, answer (encoded in some other way) comes out. As far as training goes, the modern and “useful” models are so big that you cannot train them yourself anyway.

This simply means there are 2 very separate ways to approach them:

-If you want to understand the internals, I absolutely suggest to go the traditional way: start with linear algebra/ vector spaces, then understand how MLPs work, then CNNs, (by now I would skip RNNs), and finally transformers. Other important topics include: latent spaces/embeddings/autoencoders/etc

This is almost academic knowledge though.

-On the other hand, if you want to play with them, really all you need is: first learn python (if you don’t know it already), and one DeepLearning library (probably Pytorch). Then go to Huggingface, and download some models. You pretty quickly get a feeling of what are the common formats for models and data, and you can start putting them together..

Even many of the popular AI papers today are creative ways of plugging the output of some model into the input (or training objective) of some other..

This is my 2 cents anyway!


I hear nothing but good things about Andrej Karpathy's videos and courses. I plan on working my way through this playlist when I've got some spare time:

https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThs...


I started on this about 24h ago and he has the perfect teaching style:

* Gets you excited about what you are going to learn through vivid descriptions

* Makes it accessible to people as long as they can code and do further-high school calculus.

* Makes good decisions about where to go deep and gloss over

* Lots of practice of the same concepts

* Pause and think opportunities (although arguably there should be more)

* Charisma

* Always explaining things vs. real world so you know why you are learning this

* Use of the right tools so you are focused on learning the topic not setups

* Follow up exercises are available, are challenging but completable.

* Discord community

It is very good!

I was going to FastAI but now that is second on the playlist. This one is addictive!


Great resource. Wish more CS-related educational content was made in a similar way.


https://d2l.ai (Zhang et al) is a very nice book with the right amount of theory and tons of code.

If you want a bit more than just DL, and better foundations, https://probml.github.io/pml-book (Murphy) is the place to begin with.

Personally, I think a broader perspective, such as the one offered by the second one (Murphy) is the way to go.


Anti-recommend Murphy's book as a place for anyone to begin with. I can barely understand what it's trying to say about the topics I already know.


depends on what you think you know and what you want to learn. murphy focuses on a bayesian perspective, which is more useful from a theoretical pow. other books take a frequentist perspective, which is the dominant view in practice, unless you are using something like stan. think of it like theoretical vs experimental physics


What do you think of Murphy approach in practice/engineering.


what do you mean ?


You said Murphy is more Bayesian, so I was wondering that was Bayesian approach working in practice?


oh its certainly used in practice. you should look into frameworks like Stan[1] and pyro[2]. i think bayesian models are seen as more explainable so they will be used in industries that value that sort of thing

[1] https://mc-stan.org/


Or when you have just moderate amounts of data.


What would be your recommendations?


Try Deisenroth et al.: https://mml-book.github.io


It seems quite simple… maybe a little too easier than “understanding machine learning”?


its a good foundational book. dont underestimate foundations


Thank you. I also learned that Caltech has a course and book called “Learning from Data” which also seems to focus on foundations.


I'm a math graduate student, so some of the earlier chapters are relatively easy for me, hence the comment.


yep i was in the same boat. and one of the things that would have save me a lot of time was not being dismissive of fundamental chapters. you will go through them quickly, but they might contain one or more insights that will be new or a useful refresher. also theoretical ml is rather math heavy (even for math graduates) so the word 'fundamentals' might be a bit deciving


Yeah I agree. To be honest, I'm more into theoretical ML. Do you have any other similar recommendations?


maybe have a look at the courses released by university of tuebingen

https://www.youtube.com/channel/UCupmCsCA5CFXmm31PkUhEbA?app...


But thank you for the suggestion.


Those both look real good, and price effective for those of us just wanting to explore. Thank you.


Murphy is really long…


You don't need to cover it all! Just the foundations.

Companion code is in JAX mainly, which is quite good for learning.


I’m still going through Dive Into Deep Learning, it’s long, but seems to have a good balance between theory and codes.


Some older books that are good reading:

https://www.amazon.com/Networks-Recognition-Advanced-Econome...

https://machinelearningmastery.com/neural-networks-tricks-of...

There is something to say for downloading models from huggingface and just going from there. The fact is you are never going to train a foundation model but you can do useful tasks with one in minutes and if the one you use isn't good for your application try another one. See

https://sbert.net/

particularly the "usage" example that is right there. If you have 1000-10,000 short texts and put them through k-means clustering

https://scikit-learn.org/stable/modules/clustering.html#k-me...

your jaw will drop or at least mine did because for years I have done clustering with bag of words, LDA and methods like that and when I applied to my RSS feed all the sports articles ended up in one place, the ukraine articles in another place, deep learning there, and reinforcement learning there... In like 30 minutes worth of work and faster than my LDA clustering engine. With DBSCAN clustering I get all the articles about the same news event clustered... It's just amazing.


grab a gpu and just start building and tinkering with pytorch.

find a project to work on.

make sure whenever your pc used to be idle it is now always training something.

comma ai has two good projects on their github, the speed and calibration challenges. there are many others around.


I’m thinking about getting something cheap like 1080ti or 2080ti. However, I have an Intel MacBook Pro that’s 12 years old, so it definitely cannot support Nvidia GPUs.

I’m currently learning on Google Colab. I’m considering getting a second hand thinkpad laptop, but would running a GPU locally take up a lot of energy, thus electric?


a second hand gaming pc would be a better yet. work on your macbook and ssh to the pc for training.

24/7 gpu usage does use energy. whether or not it is a lot depends on your current energy usage.


I’m still a grad student in math, so I cannot really buy a gaming PC.


If you don't have space for it, that's one thing, but PCs are actually cheaper than laptops on most metrics, and can be assembled piece by piece. GPU prices are insane nowadays, but you can find them used from mining ops for pretty low prices. A few generations old Ryzen is still way faster than any laptop CPU, RAM and SSDs are cheap as hell, PSU and box can be found for reasonable prices if you shop around.


Value for money, hard to beat Colab Pro if you're just learning.


What do you think of lambda labs or paper spaces compared to it?


Any suggestions on how to turn this second PC into a remote thing? (i.e. code on one device, and this second PC just runs training)


echo change code && rsync -ahvc $(pwd) pc: && ssh pc python train.py


> Go back to the basics of recurrent neural networks, deep learning texts, and so forth? Or is there a shortcut into the hip and popular transformer-based technology at this point?

Sure there is, if you want a black box design that you don't understand.


You can take nanoGPT and work backwards from it, googling all concepts and papers. A book on PyTorch would be fine, too.


This gave me the theoretical foundation I needed to grok most of the recs on HN:Grokking Machine Learning book.


What do you think of it comparing to “Understanding Machine Learning” and Andrew Ng course?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: