Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What are the foundational texts for learning about AI/ML/NN?
285 points by mfrieswyk on Jan 9, 2023 | hide | past | favorite | 113 comments
I've picked up the following, just wondering what everyone's thoughts are on the best books for a strong foundation:

Pattern Recognition and Machine Learning - Bishop

Deep Learning - Goodfellow, Bengio, Courville

Neural Smithing - Reed, Marks

Neural Networks - Haykin

Artificial Intelligence - Haugeland




"Introduction to Statistical Learning" - https://www.statlearning.com/

(there's also "Elements of Statistical Learning" which is a more advanced version)

AI: A Modern Approach - https://aima.cs.berkeley.edu/


ISL is a legit good book. Has the correct amount and balance or rigor and application.

The explanation, examples, projects, math- all are crisp.

As the name suggests, it is only an introduction (unlike CLRS). And it does serve as a great beginners' book giving you proper foundation for the things that you learn and apply in the future.

One thing people complain about is it being written in R, but no serious hacker should fear R, as it can be picked up in 30 minutes, and you can implement the ideas in Python.

As someone with industry experience in Deep Learning, I will recommend this book.

The ML course by Andrew Ng has no parallel, though. One must try and do that course. Not sure about the current iteration, but the classic one (w/ Octabe/MATLAB) was really great.


The Elements of Statistical Learning, by Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie. I’ve seen it referenced quite a few times and the TOC looks good.


This was one of the first books my advisor told me to read when I started my ML phd a...long time ago. The fundamentals of machine learning haven't changed and it's a great book.


May I know what your dissertation was? :)


I agree. I read the first edition to Intro to Statistical Learning and it went into just the right level of mathematical depth. The authors also have Youtube lectures that accompany the chapters, and these are a great reinforcement of the material.


Do you have a link to the YouTube lectures? I'm taking a course and this is one of the books that we're using


It's linked off Trevor Hasties webpage see here:

https://www.youtube.com/playlist?list=PLoROMvodv4rOzrYsAxzQy...


Thank you, I appreciate it


Nice I didn't realise they released a 2nd edition this book and also new website too! Thanks for sharing


Haugeland is GOFAI/cognitive science, not directly relevant to modern machine learning variety of models unless you are doing reinforcement learning or trees stuff (hey poker/chess/Go bots are pretty cool!). Russel and Norvig are the typical introductory textbooks for those. Marks and Haykins are all severely out of date (they have solid content, but they don't have the same scale of modern deep learning which has many emergent properties).

You are approaching this like an established natural sciences field where old classics = good. This is not true for ML. ML is developing and evolving quickly.

I suggest taking a look at Kevin Murphy's series for the foundational knowledge. Sutton and Barto for reinforcement learning. Mackay's learning algorithms and information theory book is also excellent.

Kochenderfer's ML series is also excellent if you like control theory and cybernetics

https://algorithmsbook.com/ https://mitpress.mit.edu/9780262039420/algorithms-for-optimi... https://mitpress.mit.edu/9780262029254/decision-making-under...

For applied deep learning texts beyond the basics, I recommend picking up some books/review papers on LLMs, Transformers, GANs. For classic NLP, Jurafsky is the go-to.

Seminal deep learning papers: https://github.com/anubhavshrimal/Machine-Learning-Research-...

Data engineering/science: https://github.com/eugeneyan/applied-ml

For speculation: https://en.m.wikipedia.org/wiki/Possible_Minds


A quick point about the "tree stuff" and Norvig&Russell:

While it does cover minimax trees, alphabeta etc, it only really provides a very brief overview. The book is more of an overview of the AI/ML fields as a whole. Game playing AI is dense with various game-specific heuristics that the book scarcely mentions.

Not sure about books, but the best resource I've found on at least chess AI is chessprogramming.org, then just ingesting the papers from the field.


To your second point I have a sneaking suspicion whatever is recommended in this very thread will suddenly jump in its estimation as a “classic.” History is made up as it goes along!


Well, GP's Neural Smithing is a solid example. There is nothing wrong with it, it is surprisingly well written and correct for something published before the millenium.

https://books.google.com/books/about/Neural_Smithing.html?id...

Take a look at the Google Books preview (click view sample). The basics are all there, intro to biological history of neural networks, backpropagation, gradient descent, and partial derivatives etc. It even hints at teacher-student methods!

The only issue is that it missed out on two decades of hardware development (and a bag of other optimization tricks). Modern deep learning implementations requires machine sympathy at scale. It also doesn't have any literature on autoregressive networks like RNNs or image processing tricks like CNNs.


Does the order matter for Kochenderfer? Any one of those put more emphasis on controls than the others?


Appreciate the comment very much. I feel like I need to build a foundation context in order to appreciate the significance of the latest developments, but I agree that most of what I posted doesn't represent the state of the art.


There are none anymore. We now know that throwing a bunch of bits into the linear algebra meat grinder gets you endless high quality art and decent linguistic functionality. The architecture of these systems takes maybe a week to deeply understand, or maybe a month for a beginner. That's really it. Everything else is obsolete or no longer applicable unless you're interested in theoretical research on alternatives to the current paradigm.


You are plain exaggerating. You can't do all of them in a few weeks. Algorithms: Lin Reg -> Log Reg -> NN -> CNN + RNN -> GANs + Transformers -> ViT -> Multimodal AI + LLMs + Diffusion + Auto Encoders

    SVM, PCA, kNN, k-means clustering, etc.

    LightGBM, XGboost, Catboost, etc.

    Optimization and optimizers.

    Application-wise:
    Classification, Semantic Segmentation, Pose Estimation, Text Generation, Summarization, NER, Image Generation, Captioning, Sequence Generation (like music/speech), text to speech, speech to text, recommender systems, sentiment amalysis, tabular data, etc.

    Frameworks:
    pandas, sklearn, PyTorch, Jax -> training  inference, data loading

    Platforms:
    AWS + GCP + Azure
    And a lot of GPU shenanigans + framework/platform specific quirks
All these will take you ~2 years or 1.5 years at least,

given that:

- you already know Python/any programming language properly

- you already know college level math (many people say you don't need it, but haven't met a single soul in ML research/modelling without college level math)

- you know Stats 101 matching a good uni curriculum and ability to learn beyond

- you know git, docker, cli, etc.

Every influencer and their mother promising to teach you Data Science in 30 days are plain lying.

Edit: I see that I left out Deep RL. Let's keep it that way for now.

Edit2: Added tree based methods. These are very important. XGBoost outperforms NNs every time on tabular data. I also once used an RF head appended to a DNN, for final prediction. Added optimizers.


> SVM, PCA, kNN, k-means clustering

Are these still relevant in the age of Deep Neural Networks?


Yes, there are all kinds of tasks where the appropriate solution is to use a DNN for much of the learning (either directly learning the correlations or as transfer learning from some large-data self-supervised task) and then, once you have the results of that DNN inference, work with these methods - apply PCA for interpreting the resulting vector, or to separate out specific dimensions to expose them for adjustment in some generative task; or perhaps the best way for the final decision is a kNN on top of the DNN output, etc.


It's not in your list but decision trees still outperform DNN on many tabular problems and can be trained faster.


Also boosting.

But yes these algs are the basis of a lot of more modern algorithms.

A deep NN won't do unsupervised clustering for ex, and NNs perform more poorly than simpler models on small datasets


Yes.

Different problems require different solutions.

Sometimes, an NN would be overkill.

And stakeholders in many situations would like insights why the prediction is what it is. NNs are miles behind LogReg in terms of interpretablity.


PCA is a foundational dimension reduction technique, and kNN can be used in conjunction with embeddings.

k-means is still great when you have prior/domain knowledge about the number of groups.


K-means is pretty poor when the clusters are not linearly separated, but it is the basis of a lot of more modern clustering techniques (kernel K-means if you have prior knoweledge, spectral clustering...)


A month to deeply understand?

I've been doing it since early 2019 and there are still subtleties that catch me off guard. Get back to me when you're not surprised that you can get rid of biases from many layers without harming training.

I broadly agree with you, but the timeline was just a little too aggressive. By about 10x. :)


This is separate from understanding how a language model or transformer works. You could read the major papers behind those ideas and read every line of code involved several times over in a month. I'd recommend it, if you're super curious.

You can figure out the bias thing after about a month (or so) of hands on practice. Do one Kaggle seriously and it'll become pretty clear, pretty quickly.


> I've been doing it since early 2019 and there are still subtleties that catch me off guard.

That's true of every non-trivial discipline. I often learn subtleties about programming languages and hobbies I've been dealing with for decades.


This is definitely a take that ignores the massive amount of utility for ML that exists outside of generative images and NLP on the one hand and on the other vastly misrepresents the time it takes to understand a model, assuming one does not already have a background in CS, linear algebra and in particular matrix calculus, probability, stats, etc...


You still need to understand some basic theory/math about probabilistic inference (along with some knowledge of linear algebra), or else you’ll get a bit overwhelmed by some of the equations and not understand what the papers are talking about. PRML by Bishop is probably more than enough to start reading ML papers comfortably though. (This would probably be too easy for a competent math major, but not all of us are trained that way from the beginning…)


I'm not sure why you're getting downvoted. I find it hard to believe that someone without a decently strong math background could make sense of a modern paper on deep learning. I have a math minor from a good school and had to brush up on some topics before papers started making sense to me.


What resources are there to understand in a month?


I personally consider Linear algebra to be foundational in AI/ML. Intro to Linear algebra, Gilbert Strang. And his free course on MIT OCW is fantastic too.

While having strong mathematical foundation is useful, I think developing intuition is even more important. For this, I recommend Andrew Ng's coursera courses first before you dive too deep.


Another interesting resource for Linear Algebra is the "Coding the Matrix" course.

http://codingthematrix.com/

https://www.youtube.com/playlist?list=PLEhMEyM9jSinRHXJgRCOL...


Strang is great but he covers a lot of things that don't have much carryover to AI/ML and doesn't really cover things like Jacobians which do. Maybe there's something more useful for someone who is only learning Calculus and Linear Algebra for AI/ML than what Strang teaches.



Linear algebra, and differential calculus (needs linear algebra), and a bit of optimisation (at least get an understanding of sgd)

Also proba/statistics! Without those you can end up doing stuff pretty wrong


I never took beyond Precalculus in school, thanks for the tip!


Many of the suggestions so far are assuming you have taken undergraduate linear algebra and calculus. I'd start with those two subjects, you really can't build a foundational understanding of modern AI techniques without them.


i did linear algebra and calculus using strang and spivak textbooks. Those were classes i enjoy the most. But most of that stuff has atrophied from my brain over the years, do you recommend redoing those courses fast or can i learn when i need it on demand basis.


You can try a refresher on Jacobians. If you're following everything there well enough, you probably have what you need to move forward (and pick up the rusty parts that you need as you go). If you're completely lost then you probably want to go back for a quick refresher.


Review on an on demand basis.

The main concepts are matrix multiplication and derivatives and their significance. Then you can dig into the specifics and review or expand your knowledge as needed.


Oh, most recommendations here assume stem college math knowledge. You should become comfortable with calculus, linear algebra, and probability/stats - those are the foundations of ML.


"Neural Networks and Deep Learning", by Michael Nielsen http://neuralnetworksanddeeplearning.com (full text)

The first chapter walks through a neural network that recognizes handwritten digits implemented in a little over 70 lines of Python and leaves you with a very satisfying basic understanding of how neural networks operate and how they are trained.


This is the thing that made NNs "click" for me, I think it was very good. Before this I did Andrew Ng's old ML course on coursera, so I thought that was a good intro to old ML approaches, common terms/techniques and flowed nicely into NNs.

But there's are both kinda old now, so there must be something newer that'll give you an equally good intro to transformers, etc.


+1 for this, when I was coming in as a complete newb to neural networks, this was the clearest and most accessible material I found.


+1 on Elements of Statistical Learning.

Here is how I used that book, starting with a solid foundation in linear algebra and calculus.

Learn statistics before moving on to more complex models (neural networks).

Start by learning ols and logistic regression, cold. Cold means you can implement these models from scratch using only numpy ("I do not understand what I cannot build"). Then try to understand regularization (lasso, ridge, elasticnet), where you will learn about the bias/variance tradeoff, cross-validation and feature selection. These topics are explained well in ESL.

For ols and logistic regression I found it helpful to strike a 50-50 balance between theory (derivations and problems) and practice (coding). For later topics (regularization etc) I found it helpful to tilt towards practice (20/80).

If some part of ESL is unclear, consult the statsmodels source code and docs (top preference) or scikit (second preference, I believe it has rather more boilerplate... "mixin" classes etc). Approach the code with curiosity. Ask questions like "why do they use np.linalg.pinv instead of np.linalg.inv?"

Spend a day or five really understanding covariance matrices and the singular value decomposition (and therefore PCA which will give you a good foundation for other more complicated dimension reduction techniques).

With that foundation, the best way to learn about neural architectures is to code them from scratch. Start with simpler models and work from there. People much smarter than me have illustrated how that can go: https://gist.github.com/karpathy/d4dee566867f8291f086 https://nlp.seas.harvard.edu/2018/04/03/attention.html

While not an AI expert, I feel this path has left me reasonably prepared to understand new developments in AI and to separate hype from reality (which was my principal objective). In certain cases I am even able to identify new developments that are useful in practical applications I actually encounter (mostly using better text embeddings).

Good luck. This is a really fun field to explore!


I'm sitting ten feet from my copy of Artificial Intelligence, a modern approach – Stuart Russell, Peter Norvig. While I will say it has a lot of still worthwhile basic information, I really wouldn't recommend it. It's an enormous book, so physically difficult to read, but also the bulk of the content is somewhere between dated and terse. I went through school and studied AI ten years before it was written, and I'm glad I didn't use it as an undergrad textbook - would have been overwhelming.

One of the problem with AI is exactly what you noted above - there are a lot of subcategories and my gut tells me these will grow. For the real neophyte, I'd say start with something that interests you or that you need for work - you likely aren't going to digest all of this in a month and probably no single book will meet all your needs.


This is off the beaten path, but consider Abu-Mostafa et al.'s "Learning from Data". https://www.amazon.com/Learning-Data-Yaser-S-Abu-Mostafa/dp/...

I adore PRML, but the scope and depth is overwhelming. LfD encapsulates a number of really core principles in a simple text. The companion course is outstanding and available on EdX.

The tradeoff is that LfD doesn't cover a lot of breath in terms of looking at specific algorithms, but your other texts will do a better job there.

My second recommendation is to read the documentation for Scikit.Learn. It's amazingly instructive and a practical guide to doing ML in practice.


I strongly second this. Abu Mostafa has videos and homework for this course too. This course was the one that made a LOT of fundamental things “click”, like, why does learning even work and what are some broad expectations about what we can and cannot learn.


LfD is a great book to get people to think about complexity classes and model families. We used that in my grad course and I can recommend it.


It’s probably a bit off the beaten path, but I can highly recommend Probability Theory, The Logic of Science, by E. T. Jaynes.

In the opening chapter Jaynes describes a hypothetical system he calls “The Robot”. He then lays out the mathematics of the “The Robot’s” thinking in detail: essentially Bayesian probability theory. This is the best summary of an ideal ML/AI system I’ve come across. It’s also very philosophically enlightening.


I'm so sad the editor chose not to publish Jaynes' C snippets because "they were too cryptic." They would've helped clarify the ideas greatly.

It's a good book, but I don't know how it's related to ML. My own answer would be "Just do it." Find an ML project you like and start tinkering around. But everyone learns differently, so maybe there's a book that can replace experience.


How is Jaynes (2003) related to ML? I guess in the same way probability theory is related to ML: it underpins just about every meaningful step forward in ML/AI research, as I see it.


seconded! it's a great book.


I'd suggest these two by Kevin Murphy:

Probabilistic Machine Learning: An Introduction

https://probml.github.io/pml-book/book1.html

Probabilistic Machine Learning: Advanced Topics

https://probml.github.io/pml-book/book2.html


Working through these right now- definitely recommend them


In read parts of Murphys "Probabilistic Maschine Laearning" (vol 1) which is an update of an existing book in ML. It covers a broad range of topics also very recent developments. It also includes foundation topics such as probability, linear algebra, optimization. Also it is quite aligned with the Goodfellow book. I found it quite challenging at certain points. What helped a lot was to read a book on bayesian statistics. I used Think Bayes by Allen Downey for that (http://allendowney.github.io/ThinkBayes2/index.html)


I maintain a list of well-known or foundational papers in ML in a github repo that may be of interest to readers of this thread

https://github.com/daturkel/learning-papers


Are there obvious paths into these spaces for someone stuck over in devops/infrastructure/platform engineering? Or is it too far a hop to really find a direct path in?

Let me ask a slightly different way - can someone like me get into a job like these, without needing some more college?

My day job is wrapping up OS templates for people with ML software and I always wonder what they get to go do with them once they turn into a compute instance.


> Let me ask a slightly different way - can someone like me get into a job like these, without needing some more college?

It is a trendy area and in such areas there is always skepticism towards wannabe entrants. As for whether you know enough math, I would start by watching the fast.ai videos and seeing if you're comfortable with the explanations and tools.

I can say I have a stronger math background than most programmers (though less strong than that of real math geeks) and I don't think I know enough math to really grok this stuff, but I'm always after a more foundational understanding than it takes to just use the tools. I think there are opportunities that don't require the math, but are just about having gotten some practice with packages X, Y, or Z. In the end though, those are like web frameworks that become obsolete all the time. So it is worth spending time on foundations.


Why not ask them?

Call it cross functional training to increase your domain knowledge, tell your manager you need it to ensure you’re providing the best service possible, and get your coworkers to help you learn the framework they use…?


if you're already doing a job at a company that does this stuff, can you talk to people about wanting to change teams and learn?


I would like to know this, as well.


Not sure if foundational (quite a tall order in such a fast-moving field), but for sure a nice introduction into neural networks, and even mathematics in general (for a teenager, because it's nice to see numbers in action beyond school-level algebra):

→ Harrison Kinsley, Daniel Kukiela, Neural Networks from Scratch, https://nnfs.io, https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0Qu...

Somewhat foundational, if not in actuality, then in the intention to actually build a theory as in theory of gravitation, although not necessarily an introductory text:

→ Daniel A. Roberts, Sho Yaida, The Principles of Deep Learning Theory, https://arxiv.org/abs/2106.10165


- AIMA by Russel and Norvig is a classic but I would say is more of overview of the field and for most topic areas isn't quite deep enough imo.

- For deep learning specifically, a more applied text that is beautifully written and chock full of examples is Francois Chollet's Deep Learning with Python (there a new second edition out with up to date examples using modern versions of Tensorflow). The first 3 chapters I would give as required reading for anyone interested in understanding some deep learning fundamentals.

- Deep Learning - goodfellow and bengio - seems like it would be hard to get through without a reading group not exactly a APUE or K&R type reading experience but I haven't spent enough time with it.

If you haven't taken a Linear Algebra or Differential Equations class its useful stuff to know for ML/DL theory but not fully necessary to do applied work with modern high level libraries, but definitely having a strong understanding of basic matrix math is useful.

If you have interests in natural language processing theres a couple good books:

- Natural Language Processing with Python - Bird Klein, Loper, is a great intro to NLP concepts and working with NLTK which may be a bit dated to some but I would definitely recommend, and its online for free. Great examples.(https://www.nltk.org/book/)

- Speech and Language Processing - Dan Jurafsky and James H. Martin - is good, though I have only spent much time with the pre-print

And then theres a lot of papers that are good reads. Let me know if you have any questions or want a list of good papers.

If you just want to get off the ground and start playing with stuff and building things I'd recommend fast.ai's free online course - its pretty high level and a lot is abstracted away but its a great start and can enable you to build lots of cool things pretty rapidly. Andrew Ng's online course also is quite requitable and will probably give you a bit more background and fundamentals.

If I were to choose one book from the bunch it would be Chollet it gives you pretty much all the building blocks you need to be able to read some papers and try to implement things yourself and I find building things a much more satisfying way to learn than sitting down and writing proofs or just taking notes but thats just my preference.


Norvig-Russel has many chapters spanning hundreds of pages that are way out of date and not used anywhere.

And the new things he cover are covered in a better manner and better depth in other sources.

I read this book like a novel. Good for a basic overview, but the RoI is very low.


agreed


You may want to also consider this one:

Artificial Intelligence, a modern approach – Stuart Russell, Peter Norvig


Can't recommend this highly enough, if for no other reason than to provide some context to help the OP from getting trapped in the "deep learning is all you need" echo-chamber. Sure ANN's and DL are great and do amazing things, but until it's proven that they really are the "be all, end all" (something I suspect we're far from) then it makes sense to dedicate at least some cycles to considering other paradigms.


The big book of stuff that doesn't work.


Everybody is talking about the need to incorporate knowledge representation and reasoning into the statistical models in vogue currently. Russell&Norvig will forever be relevant. Those guys are at the forefront of research in the academy and industry respectively. They have a mature perspective.


Prop it up with a small stick and put some cracked walnuts below to catch mice with it.


If you're more inclined to theory, I would suggest "Learning Theory from First Principles" by F. Bach: https://www.di.ens.fr/~fbach/ltfp_book.pdf

The book assumes limited knowledge (similar to what is required for Pattern Recognition I would say) and gives a good intuition on foundational principles of machine learning (bias/variance tradeoff) before delving to more recent research problems. Part I is great if you simply want to know what are the core tenets of learning theory!


Ironic, since the relatively recently discovered double descent makes it clear that bias-variance tradeoff as we know it from statistical learning theory simply doesn't apply to "overparameterized" deep models.

Much of old theory is barely applicable and people are, understandably, bewildered and in denial.

If someone were to be inclined to theory, I'd just recommend reading papers that don't try oversimplify the domain:

https://arxiv.org/abs/2006.15191

https://arxiv.org/abs/2210.10749

https://arxiv.org/abs/2205.10343

https://arxiv.org/abs/2105.04026


I don't believe it's oversimplifying the domain. Typically the reference I pointed to has a section dedicated to double descent (sec 11.2). You may also be surprised that such phenomenon can be observed on toy convex convex examples from "old theory" (sec 11.2.3), as you call it.

Anyways, I still believe that learning foundational stuff such as the bias-variance tradeoff is useful before diving to more advanced stuff. I even think that tackling recent research question with old tool is insightful too. But that's only my opinion, and perhaps I'm in denial :)


To add to the great recommendations on this thread, I really like Moritz Hardt and Benjamin Recht's "Patterns, Predictions, and Actions". It's published by Princeton University Press here: https://press.princeton.edu/books/hardcover/9780691233734/pa...

But is also available online as a preprint here: https://mlstory.org/



Do you have Linear Algebra knowledge, and Stats 101 knowledge?

Then start with ISLR.

Then go and watch Andrew Ng Machine Learning course on Coursera (a new version was added in 2022 that uses Python).

Then read the sklearn book from its maintainers/core devs. It's from O'Reilly.

Then go do the Deep Learning Specialization from deeplearning.ai.

Then do fast.ai course.

If interested in Deep RL, watch David Silver lectures, then read Deep RL in Action by Zai, Brown. Then do the HF course on Deep RL.

This is how you get started. Choose your books based on your personality, needs, and contents covered.

And among MOOCs, I highly suggest the one by Canziani, LeCun from NYU. (I loved the 2020 version.)

The one taught by Fei Fei Li and Andrej Karpathy is nice.

These two MOOCs can substitute classic books based on quality.

I have never read cover to cover any of the famous books. I read a lot from them sticking to specific subjects.

Get to reading papers, finding implementations. Ng + ISLR will give you good grounds. Fast.ai + deeplearning.ai will give you capability to solve real problems. NYU + Tubingen + Stanford + UMich (Justin Johnson) courses will bring you to the edge.

You need a lot of practical experience that aren’t taught anywhere. So, get your hands dirty early. Learn to use frameworks, cloud platforms, etc.

Then start reading papers.

A crystal clear grasp on Math foundations is a must. Get it if you don't have already.


AIMA by Russel and Norvig is a must read IMO.


I’d posit we don’t understand AIML enough to know their foundations with much certainty. Take for example the discovery of emergent zero-shot properties in the latest LLMs. My recommendation to a beginner would be to grok gradient descent, matrix multiplication, and the universal approximation theorem, then get on to engineering like the rest of us. You can’t go wrong with Jeremy Howard’s FastAI course and his “Deep Learning for Coders.”


I think a good start is to think about what you want to do. "Back in my day" ai was mostly academic and had more classic foundational parts with newer flashy bits. It wasn't, broadly, applicable to the real world. Some parts but not a huge amount.

Now I think you've got key parts. There's how to use recent production ready models/systems, how to train them and how to make them. Is it in a research or business context?

The field is also broad enough that any one section (text, images, probably symbols) and subsection (time series, bulk, fast online work) all have significant bodies of work behind them. My splits here will not be the best currently so I'm happy for any corrections on a useful hierarchy by the way.

Perhaps you're interested in the history and what's led up to today's work? That's more of a "brief history of time" style coverage, but illuminating.

I'm aware I've not helpfully answered, but I think the same question could have very different valid goals and wanted to bring that to the fore.


Coming from cognitive neuroscience surprised that Explorations in Parallel Distributed Processing by McClelland and Rumelhart doesn’t get more attention as a classic in bridging old school AI approaches with the modern paradigm.

https://psycnet.apa.org/record/1988-97441-000


This is nice; I am more interested in first understanding the origins/concepts/ideas behind AI/ML than in all the complicated mechanisms involved in implementing them (i.e. the simplest possible explanation/implementation) and hence these sort of books really interest me.

Any more recommendations?

PS: You might find Vehicles: Experiments in Synthetic Psychology by Valentino Braitenberg interesting if you don't already know of it.


I was raised on this era - you could go down quite the rabbit hole branching out from McClelland and Geoff Hinton. They were trying to reflect the brain and so backprop was initially seen as a shortcut that could be done away with once more complex models could be supported by processing power and inputs.


Blum, Hopcroft, and Kannan, Foundations of Data Science looks good:

https://www.cs.cornell.edu/jeh/book%20no%20so;utions%20March...

Also in published form from Cambridge University Press:

https://www.cambridge.org/core/books/foundations-of-data-sci...


Learning From Data (https://amlbook.com) is a great introduction to ML from a more theoretical perspective. The language is easy to understand but the concepts that it deals with are very theoretical, a combination that is hard to find elsewhere.

For example nearly everyone understands how to apply multivariable logistic regression, in say Numpy, however a good grasp of underlying concepts such as confidence bounds for overfitting and and being able to use formal proofs to explain concepts such as VC Generalisation will both help you stand out and provide a good foundation that makes further learning much easier.


Get a strong grasp on Linear Algebra and everything else falls into place more easily

https://math.mit.edu/~gs/learningfromdata/



I have not seen mentioned so far in this thread the following book, which I can't recommend more highly:

Understanding Machine Learning: From Theory To Algorithms – Shai Shalev-Shwartz


I remember Carmack mentioning in a podcast a list of seminal papers that Ilya Sutskever (@ilyasut) gave to him to learn AI foundations. I would love to see that list.


You may also want to consider reading through some of the important (or highly cited) academic papers in AI/ML/NN. From these papers you may get a sense of the techniques researchers are using, and which topics are most important to learn.

I have not applied this technique to AI/ML/NN specifically, but it has been useful for me when trying to learn other topics.


The foundations of AI/ML are really linear algebra and statistics. But not the kinds of stats most people learn in undergrad: focus on linear models (there are tons of great books on just that; also look up “common statistical tests are linear models” for a great intro into what i’d call useful stats), bayesian stats, anova/manova/permanova, etc.


I’m a big fan of learning through practice vs learning all the theory up front, and for anyone else who feels the same, the Fast AI course and book are very good: https://fast.ai

The authors are working on a new course that’ll dive deep into the modern Stable Diffusion stuff too, which I’m looking forward to.


I will recommend Information Theory, Inference, and Learning Algorithms by David MacKay. If you really want to understand the "learning" part, rather than being given a methodology without knowing why or in the unsorted bounds that "guarantee" abstract things that may mismatch reality.


For a less technical history of the field and major players I'd recommend Genius Makers.


Kevin Murphy’s books (especially the new ones) are what I’d point anyone towards for ML.


The Quest for Artificial Intelligence: A History of Ideas and Achievements Nils J. Nilsson

This is a good overview of the history of the field (up to SVMs and before deep NNs). I found this useful for putting all the different approaches into context.


If anyone is just starting and out wanting to do a study group let me know.

I’m having trouble keeping my motivation up but I really want to get up to speed on how LLM’s work and someday make a career switch.


Im down


I recommend against DL by Goodfellow. At this point it is pretty much outdated. Actually, anything specific to NNs is already outdated by release.

You'd need the following background:

- Linear Algebra

- Multivariate Calculus

- Probability theory && Statistics

Then you need a decent ML book to get the foundations of ML, you can't go wrong with either of these:

- Bishop's Pattern Recognition

- Murphy's Probabilistic ML

- Elements of statistical learning

- Learning from data

You can supplement Murphy's with the advanced book. Elements is a pretty tough book, consider going through "Introduction to statistical learning"[1]. Bishop and Murphy include foundational topics in mathematics.

LfD is a great introductory book and covers one of the most important aspects of ML, that is, model complexity and families of models. It can be supplemented with any of the other books.

I'd also recommend doing some abstract algebra, but it's not a prerequisite.

If you would like a top-down approach, I recommend getting the book "Mathematics of Machine Learning" and learning as needed.

For NN methods, some recommendations:

- https://paperswithcode.com/methods/category/regularization

- https://paperswithcode.com/methods/category/stochastic-optim...

- https://paperswithcode.com/methods/category/attention-mechan...

- https://paperswithcode.com/paper/auto-encoding-variational-b...

For something a little bit different but worth reading given that you have the prerequisite mathematical maturity

- Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges | https://arxiv.org/abs/2104.13478

[1] https://www.statlearning.com/

Many thanks to the user "mindcrime" for catching my error with Introduction to statistical learning.


consider going through "Introductions to Elements of statistical learning"

Was that supposed to be An Introduction to Statistical Learning[1] or maybe Introduction to Statistical Relational Learning[2]? I don't think there is a book titled Introduction to Elements of Statistical Learning?

[1]: https://www.statlearning.com/

[2]: https://www.cs.umd.edu/srl-book/


I referred to [1], thanks I have corrected GP.


(I can't wait until the myth that you need linear algebra and calculus to do ML finally dies. It's like saying that you need to understand assembly to do programming. It helps, but it's far from a requirement.)


I disagree strongly. In your analogy, if the compiler broke down all the time, you would probably need to understand assembly to do programming. ML is amazing today, but still kinda sucks. In general you’ll have a bunch of failures on the way to a successful novel application, so it’s more critical to understand what’s going on under the hood in ML than in your programming analogy.

If you just want to apply well known things to well known things, sure you’re right. But as soon as things go wrong, I couldn’t imagine how much more inefficient my iteration cycles would be trying to do novel work without understanding linear algebra (for some kinds of novel work) or calc (for other kinds of novel work). I think you kinda get at this when you say it’s not necessary but it helps. It’s not necessary, but it helps a lot with anything off the beaten track.


We agree, I think!

And certainly, if you're one of those people who can pull it off, studying ML from first principles is probably an advantage. I just wince every time since I wouldn't have gotten into ML in the first place if I had to start with a big Calculus tome. There are probably a lot of people like me out there.


OP asked for foundational, and I provided _foundational_. In my opinion, everyone should start from some sound foundations in LinAlg and Calculus.

Here are a couple of errors that stem from a single foundational problem:

- a linear regressor can not be more than the number of datapoints

- dimensionality reduction when you have NxM with M > N is bogus and you need a bigger dataset to do anything meaningful other than clustering

- input dimension of output layer is larger than the number of samples

The underlying issue in all of these is the rank nullity theorem which is pretty foundational for ML, and yet many practitioners don't know about it or haven't made the connection.

I am not expressing that you should have gone through Spivak or build bottom up. There are books like mathematics of ML that condense everything you need, giving you a decent enough foundation for what you will need.


Correction:

A linear regressor can not have more parameters than the number of data points.


> I can't wait until the myth that you need linear algebra and calculus to do ML finally dies.

This is such a dangerously absurd claim.. but then, it speaks volumes about the abysmal state the non-research heavy AI/ML field has fallen into.


As always on HN, the right answer is at the bottom.


At some points, UC Berkeley's course videos were available on the web, and they had a pretty good AI course


ISLR for foundation and passing interviews. It also has lectures in youtube, just type ISLR lectures.


Without a running code, it's hard to grasp concepts. So i prefer texts with code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: