How to get started learning modern AI?

Micoloth · on March 30, 2023

Here’s the thing about modern AI.. They are black boxes. Not in a negative way, they just are.

Logically, predictions are extremely simple: data (encoded in some way) goes in, answer (encoded in some other way) comes out. As far as training goes, the modern and “useful” models are so big that you cannot train them yourself anyway.

This simply means there are 2 very separate ways to approach them:

-If you want to understand the internals, I absolutely suggest to go the traditional way: start with linear algebra/ vector spaces, then understand how MLPs work, then CNNs, (by now I would skip RNNs), and finally transformers. Other important topics include: latent spaces/embeddings/autoencoders/etc

This is almost academic knowledge though.

-On the other hand, if you want to play with them, really all you need is: first learn python (if you don’t know it already), and one DeepLearning library (probably Pytorch). Then go to Huggingface, and download some models. You pretty quickly get a feeling of what are the common formats for models and data, and you can start putting them together..

Even many of the popular AI papers today are creative ways of plugging the output of some model into the input (or training objective) of some other..

This is my 2 cents anyway!

rcarr · on March 30, 2023

I hear nothing but good things about Andrej Karpathy's videos and courses. I plan on working my way through this playlist when I've got some spare time:

https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThs...

quickthrower2 · on April 1, 2023

I started on this about 24h ago and he has the perfect teaching style:

* Gets you excited about what you are going to learn through vivid descriptions

* Makes it accessible to people as long as they can code and do further-high school calculus.

* Makes good decisions about where to go deep and gloss over

* Lots of practice of the same concepts

* Pause and think opportunities (although arguably there should be more)

* Charisma

* Always explaining things vs. real world so you know why you are learning this

* Use of the right tools so you are focused on learning the topic not setups

* Follow up exercises are available, are challenging but completable.

* Discord community

It is very good!

I was going to FastAI but now that is second on the playlist. This one is addictive!

neural_thing · on March 31, 2023

Great resource. Wish more CS-related educational content was made in a similar way.

nextos · on March 31, 2023

https://d2l.ai (Zhang et al) is a very nice book with the right amount of theory and tons of code.

If you want a bit more than just DL, and better foundations, https://probml.github.io/pml-book (Murphy) is the place to begin with.

Personally, I think a broader perspective, such as the one offered by the second one (Murphy) is the way to go.

jstx1 · on March 31, 2023

Anti-recommend Murphy's book as a place for anyone to begin with. I can barely understand what it's trying to say about the topics I already know.

medo-bear · on April 1, 2023

depends on what you think you know and what you want to learn. murphy focuses on a bayesian perspective, which is more useful from a theoretical pow. other books take a frequentist perspective, which is the dominant view in practice, unless you are using something like stan. think of it like theoretical vs experimental physics

hedgehog0 · on April 3, 2023

What do you think of Murphy approach in practice/engineering.

medo-bear · on April 3, 2023

what do you mean ?

hedgehog0 · on April 3, 2023

You said Murphy is more Bayesian, so I was wondering that was Bayesian approach working in practice?

medo-bear · on April 3, 2023

oh its certainly used in practice. you should look into frameworks like Stan[1] and pyro[2]. i think bayesian models are seen as more explainable so they will be used in industries that value that sort of thing

[1] https://mc-stan.org/

nextos · on April 3, 2023

Or when you have just moderate amounts of data.

hedgehog0 · on March 31, 2023

What would be your recommendations?

nextos · on April 3, 2023

Try Deisenroth et al.: https://mml-book.github.io

hedgehog0 · on April 3, 2023

It seems quite simple… maybe a little too easier than “understanding machine learning”?

medo-bear · on April 3, 2023

its a good foundational book. dont underestimate foundations

hedgehog0 · on April 3, 2023

Thank you. I also learned that Caltech has a course and book called “Learning from Data” which also seems to focus on foundations.

hedgehog0 · on April 3, 2023

I'm a math graduate student, so some of the earlier chapters are relatively easy for me, hence the comment.

medo-bear · on April 3, 2023

yep i was in the same boat. and one of the things that would have save me a lot of time was not being dismissive of fundamental chapters. you will go through them quickly, but they might contain one or more insights that will be new or a useful refresher. also theoretical ml is rather math heavy (even for math graduates) so the word 'fundamentals' might be a bit deciving

hedgehog0 · on April 3, 2023

Yeah I agree. To be honest, I'm more into theoretical ML. Do you have any other similar recommendations?

medo-bear · on April 3, 2023

maybe have a look at the courses released by university of tuebingen

https://www.youtube.com/channel/UCupmCsCA5CFXmm31PkUhEbA?app...

hedgehog0 · on April 3, 2023

But thank you for the suggestion.

sloaken · on March 31, 2023

Those both look real good, and price effective for those of us just wanting to explore. Thank you.

hedgehog0 · on March 31, 2023

Murphy is really long…

nextos · on March 31, 2023

You don't need to cover it all! Just the foundations.

Companion code is in JAX mainly, which is quite good for learning.

hedgehog0 · on March 31, 2023

I’m still going through Dive Into Deep Learning, it’s long, but seems to have a good balance between theory and codes.

PaulHoule · on March 30, 2023

Some older books that are good reading:

https://www.amazon.com/Networks-Recognition-Advanced-Econome...

https://machinelearningmastery.com/neural-networks-tricks-of...

There is something to say for downloading models from huggingface and just going from there. The fact is you are never going to train a foundation model but you can do useful tasks with one in minutes and if the one you use isn't good for your application try another one. See

https://sbert.net/

particularly the "usage" example that is right there. If you have 1000-10,000 short texts and put them through k-means clustering

https://scikit-learn.org/stable/modules/clustering.html#k-me...

your jaw will drop or at least mine did because for years I have done clustering with bag of words, LDA and methods like that and when I applied to my RSS feed all the sports articles ended up in one place, the ukraine articles in another place, deep learning there, and reinforcement learning there... In like 30 minutes worth of work and faster than my LDA clustering engine. With DBSCAN clustering I get all the articles about the same news event clustered... It's just amazing.

nathants · on March 30, 2023

grab a gpu and just start building and tinkering with pytorch.

find a project to work on.

make sure whenever your pc used to be idle it is now always training something.

comma ai has two good projects on their github, the speed and calibration challenges. there are many others around.

hedgehog0 · on March 30, 2023

I’m thinking about getting something cheap like 1080ti or 2080ti. However, I have an Intel MacBook Pro that’s 12 years old, so it definitely cannot support Nvidia GPUs.

I’m currently learning on Google Colab. I’m considering getting a second hand thinkpad laptop, but would running a GPU locally take up a lot of energy, thus electric?

nathants · on March 31, 2023

a second hand gaming pc would be a better yet. work on your macbook and ssh to the pc for training.

24/7 gpu usage does use energy. whether or not it is a lot depends on your current energy usage.

hedgehog0 · on March 31, 2023

I’m still a grad student in math, so I cannot really buy a gaming PC.

t-3 · on March 31, 2023

If you don't have space for it, that's one thing, but PCs are actually cheaper than laptops on most metrics, and can be assembled piece by piece. GPU prices are insane nowadays, but you can find them used from mining ops for pretty low prices. A few generations old Ryzen is still way faster than any laptop CPU, RAM and SSDs are cheap as hell, PSU and box can be found for reasonable prices if you shop around.

neural_thing · on March 31, 2023

Value for money, hard to beat Colab Pro if you're just learning.

hedgehog0 · on April 1, 2023

What do you think of lambda labs or paper spaces compared to it?

tukantje · on March 31, 2023

Any suggestions on how to turn this second PC into a remote thing? (i.e. code on one device, and this second PC just runs training)

nathants · on March 31, 2023

echo change code && rsync -ahvc $(pwd) pc: && ssh pc python train.py

bjourne · on March 31, 2023

> Go back to the basics of recurrent neural networks, deep learning texts, and so forth? Or is there a shortcut into the hip and popular transformer-based technology at this point?

Sure there is, if you want a black box design that you don't understand.

atemerev · on March 30, 2023

You can take nanoGPT and work backwards from it, googling all concepts and papers. A book on PyTorch would be fine, too.

syndacks · on March 30, 2023

This gave me the theoretical foundation I needed to grok most of the recs on HN:Grokking Machine Learning book.

hedgehog0 · on March 30, 2023

What do you think of it comparing to “Understanding Machine Learning” and Andrew Ng course?