Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Is making a self taught transition to AI/ML related fields possible?
58 points by tayo42 on Aug 16, 2023 | hide | past | favorite | 62 comments
Graduate school looks to be like way to much of a time and money commitment right now. A ton of this academic content seems to be free online anyway. I got into software with free content and classes online.

I'm wondering if anyone has had success moving into this field, for a generalist engineer? I'd imagine advanced degrees aren't required for everything? ML infra and stuff, perf/optimization work etc... Maybe learning materials, resume and interview advice etc? Thanks in advance if you have an interesting answer!




It’s very tough to do research-level ML without the whole track. There are exceptions, but if you want to publish papers at DeepMind, you probably want the graduate education.

But if you, like me, are happy to be an “understand and implement the paper” person instead of a “co-author the paper” person, that is eminently achievable via self-study and/or adjacent industrial experience. In fact, it’s never been easier as world-class education is more available on YouTube every day.

3Blue1Brown covers all the linear algebra, calculus, information theory, and basic analysis you need to get started. Ng and Karpathy have fantastic stuff. Hotz is writing a credible threat to PyTorch on stream/YT for the nuts and bolts accelerated computing stuff. Kilcher does accessible reviews of modern stuff. The fast.ai stuff is great.

This is all a lot easier if you can get a generalist/infrastructure role that’s ML-adjacent at a serious shop (that’s how I got exposed), but there’s enough momentum in open source now that even this isn’t a prerequisite.

I say fucking go for it, and if you want any recommendations on books or need someone to ping when you get stuck, feel free to email.


Yeah im ok without doing real research, but just looking for a space with new interesting challenges. Ive been seeing how much is on youtube and stuff which made me start thinking this all might be more accessible then I thought initially.

Thanks for the encouragement, yeah books would be interesting too.


One metric for impact is how much damage it causes to established players.

In the loop of voice -> agent -> voice, the cartel left only one edge out.

Voice to text/latent? Whisper is good.

LLMs? They’re everywhere. LLaMA 2 is practically state of the art and it’s licensed for commercial use.

Latent/Text -> voice? Best you can get open is T5, which is ok, but it’s no Voicebox or whatever. They launch new SOTA’s every month or two, but no weights, no code.

Stay dangerous. You want to work on that? I’ll do it with you.

On books, to get up and rolling I’d recommend Raschka, Liu, and Mirjalili.

The one even the PhDs should re-read is MacKay.


Do you maybe recommend some open source projects to contribute to as a way of learning ML?


If I had more time I’d be working on tinygrad every day.


Interesting, thanks for the recs. Will check them out!


I have a ML PhD - maybe slightly qualified to answer this question.

Yes, absolutely doable. Immerse yourself into learning things well. Learn the basics. Don't do course hopping & book hopping - pick a rock solid book or lecture series, devour it & get building. The last part is the most important part.

People with ML MS/PhD only have an additional (1) degree & (2) networking. If you invest time, you can overcome (2) by asking good questions in Twitter/Reddit & making connections. I still do it after finished my degree. Twitter is the Linkedin for ML.

As for (1), YSK that most advisees are getting advised by professors who made it big before deep learning took off. So everyone is still on the learning curve of sorts - advisors, advisees and your peers. Sometimes student's intuitions could be better than the professor's. Don't sweat over it. Focus on building.


As a hiring manager for AI/ML, if you can get in the weeds and talk the talk with data structures and pipelines, as well as advanced SQL then you're in with a good chance of making the final list. If you're a humanities major who's just done a few Coursera courses in AI at the weekends, then sorry no.

Software engineers by and large, can make great AI/ML practitioners - the specialization is a smaller leap than say from business analysts or any folks who're less likely to be able to install their own OS or automate task with scripting.


The MLops salaries compared to traditional software engineers are comparable, less/more? I assume the 900k job offers are the post grade researchers only (MS/PhD math etc)


Not just researchers, but theyre for experienced MLOps engineers. If youve built distributed training or inference pipelines at big tech, thats the sort of skillset that requires ML knowledge but has little to do with traditional research.


How do you get the opportunity to build training or inference pipelines at big tech?


Those roles where they have "must have PhD in related field" are mostly there as MacGuffins, if you're good enough and obviously know your stuff you're going to get moved on to the next step by and large.


MacGuffin - In fiction, a MacGuffin (sometimes McGuffin) is an object, device, or event that is necessary to the plot and the motivation of the characters, but insignificant, unimportant, or irrelevant in itself.[1]

What a great use of that term.

[1] https://en.wikipedia.org/wiki/MacGuffin


if you dont mind, resume wise what would make someone stick out when hiring? if they dont have professional experience yet but have done some serious learning. Projects with some substance?


The main thing to realize is that people may take a really short time to look over a resume, maybe a couple of minutes max. So any project you've worked on, needs to be accessible right away so they can check it - in effect you need to quickly spoon feed them something that shows your capabilities.

So, what not to do - provide some 50-slide online project presentation with a complex summary description.

In the absence of professional experience, what I'd suggest is to build some impressive ML/AI projects, ideally something with a running data pipeline in the background, and put them up online along with their source code. Something like this (below) would really catch my eye on any resume, and would definitely generate some click-throughs and interest. If you could then go on to explain how you single-handedly built out the data pipeline, handled and parsed LARGE volumes of data and what you used to generate the output that would be a great step.

You could even game it, by doing a later public analysis of all the click-throughs and interactions, which would be really impressive.

"May - Aug. 2023 - Built out MyStockForecast.com, which delivers a daily forecast for 500 tech stocks using an XGBoost model, and a continued retrospective analysis.

Url: MyStockForecast.com Repo: GitHub/blah/MyStockForecast "


Ok cool appreciate the response!


Assuming you are good with a non-research role, then yes.

Easiest way to get into that work, in my opinion, would be to take a data engineering job on a team that has an AI/ML capacity and then start learning from that team and taking on some of the AI/ML tasks directly. Alternatively, you could take a role at a smaller business that needs a generalist but also wants to invest in AI/ML (though in this case you will be more on your own to self-learn and it won't work quite as well for stepping stone into a more pure AI/ML role).


This is the way. Provide value on an ML-adjacent team and expand your scope over time.

One thing to watch out for is being too far from ML at an ML-heavy company. Those places often have plenty of devs doing frontend work, integrations, etc. but those roles are not so different from similar roles at non-ML companies. Being awesome at typescript isn't going to help you shift to ML work. You want to be as close to the ML folks as possible: data engineering, data science, ML platform, etc. A common theme there is python, so get very comfortable with the language and ecosystem.


I'd argue that the majority of people in the AI/ML field professionally are self-taught or learned on the job.


Maybe in ML ops but researchers are very PhD dense.


Yep, everyone else on my current team has a PhD - and we never touch SQL either. (If you do, you may be doing ML Ops...)


Yes, definitely. I am a generalist SWE, learned ML through Jeremy Howard's course and transitioned to the ML team at my company. 98% of ML in the real world is building a robust software system to support the models. The last 2% is actually modeling and simple models go a long way.


What kind of compensation are we talking, here?


At or above any other SWE role at the same company. Often above the ML researchers at the same company since most researchers are earlier in their career than the ML ops/platform folks, though that's much less true at FAANG-tier companies.


Yes definitely possible. I did this 6 years ago.

1. Find the best 1,2 courses for AI/ML At the time the Udacity self driving course was a great course from a basic Udacity ML intro course to a full system that

2. Allocate 2-4 hours a day for this. This is a heavy course and a lot to learn so you have to work really hard to get this done.

3. Final project should be impressive to people int he field. So for example I implemented a YOLO alternative from their paper. You’ll have to do something similar and show results.

Then getting a job is a completely different skill and you’ll be looking at a job on the margins. Keep backup options in lower pay jobs in startups or in non tech companies if your dream jobs don’t pan out.

It could take 6 months to 9 months to understand the content and then 6-9 months to get a job. IF you are okay with that … then do it.

Having ML + software eng background is really good spot to be in.


I was self-taught the entire way and am currently working on the application/inference architecture side of things. Started in 2017 or so just messing with tensorflow for proof of concept stuff, then years later now we’re using the inference architecture to deploy and build apps around models other teams have built.

It’s all very cool and you definitely don’t need anything at all to do any of it.


You can get in via data engineering, but in a sense you are relegated to a support position. In my experience, data science has been uncharacteristically strict on degrees compared to other CS-like fields. Unless you are very passionate about it, I'd avoid that part of the industry if you can.


I think this is a good and important point. I was a DE for a bit and grew tired of it for this reason; thankful to have escaped into MLE, though it was more luck/opportunity than an explicit goal.


A suggestion that's slightly different from the others. Because of all the emerging fields that are being considered with the new capabilities of near instant "quality" text/art production, there are also a lot of opportunities in areas that effectively did not exist a year ago. We rapidly went from "AI/ML will be really cool someday, yet its only good for chess" to "AI/ML is better than most non-specialists at really hard tasks."

Arguably, one of the best times to get involved. You may not want to fight about making the 99.9++% accurate system, yet a bunch of non-specialists were the first to extend some of these models, and actually apply them on non-toy problems.

There are also a lot of sites/guides/walkthroughs that did not exist 6 months ago where you can rapidly get a feel for "what can this actually do?" [1][2][3][4][5][6]

[1] https://huggingface.co/stabilityai/stable-diffusion-2-1?text...

[2] https://rerenderai.com/

[3] https://open-assistant.io/chat

[4] https://hacks.mozilla.org/2023/07/so-you-want-to-build-your-...

[5] https://writings.stephenwolfram.com/2023/07/generative-ai-sp...

[6] https://www.convex.dev/ai-town


Going against the grain here - you will have a much better time finding actual research job if that is your goal with a research degree. A Thesis based masters or PhD shows you can organize and present even incremental work in a useful way, moving the "state of the art" slightly vs. engineering around existing tools. It's 100% a subtle sliding scale - some engineering becomes research and much of research is engineering.

ML Infra is just Infra, with a different set of needs and issues than say, Spring Boot infrastructure or something. Self taught here is fine, and honestly I'd trust a software engineer with basically no experience to handle ML infra better than an ML researcher.


I made this transition in my late 20s after ~10 years of generalist engineering. I don't believe I could have done it without going back to school. After a decade writing code, I found stochastic thinking to be non-intuitive, and school helped me practice a lot of concepts. However, that was before a lot of online ML material became available. YMMV

I think of it like the gym. You could get in there and start something tomorrow, but unless you've been taught good form there's a good chance you'll injure yourself.


I'm working for one of the top energy companies. Have a BS and MS in Geology and worked in that for over 10 years, but have since moved towards data science. My title is now data scientist but I assure you I'm no where near a FANG DS.

I started just learning python, learning and implementing Azure tools, taking more ML classes, taken bootcamps, working on projects at home, and working with auto-ML like data robot. I've only ever been able to actually implement simple analytics and basic NLP from scripts I wrote and packaged them up into PowerBI or such. Our company isn't really a tech company and it's easy to impress with simple improvements.

So I feel like a phony data scientist and that I couldn't pass a heavy duty pure data scientist interview at another company. However, it seems so clear that there is a big need to bridge this space with a pretty good understanding of DS concepts and approaches wit subject matter expertise, plus a lot of user friendly auto-ML or Azure tools. And it's pretty easy to get chatGPT to build boilerplate code that I can't figure out. Really not too hard to work with the OpenAI API in python. I just need a decent team to work with to help bring things into the system of production. Which is a horrible problem and mess here.

I'm sure I could float around inside this company for years but now I wonder what can I do outside? Am I a data science informed imperilment? Am I a low-level data analytics generalist? I have a strong work ethic, great personal and project management skills, and can eventually do anything I set my mind to. I feel like any company could benefit from my skills but I don't fit the bill for hiring typical DS roles. Product management seems like a path, although not the most fruitful if you want to actually do some work.

Maybe others find themselves in the place or have made these self-taught transitions to the grey areas. Other thoughts?


The answers so far tend to be either: yes, it’s very easy. Just follow a boot camp or books, transition at work by working close to a DS/ML team, etc., or no, you need lots of background knowledge, track record of publications, etc.

I think the problem seems to be that there’s not a clear need for engineers with lower levels of expertise in some of the fields; that is people with less than an MS degree in that specific ML sub field.

If the field transcends for long enough, and urgent demand becomes so mainstream, then I imagine hiring managers will have to invest more resources in hiring people with less experience and training them in the job. Similar to how developers are hired fresh out of short boot camps these days.

Therefore the easiest way to transition would be by acquiring practical an theoretical knowledge using the strategies given in the other answers and then applying to ML teams where there’s enough demand for them to want to train you in the job. Of course that’s easier said than done. It might be interesting to hear some thoughts on whether this is already happening in certain fields.


Yes it's possible - I've spent the last 4ish years of my career making this transition and have only recently "made it" into a role that affords me the opportunity to spend most of my time as an AI/ML practitioner. Unfortunately there is still a ton of gatekeeping in this industry where hiring managers often want to see Phd's and work published in journals when the positions they are hiring for have little to do with these experiences. The upside is that industry seems to be gradually changing as companies begin to understand that fundamental engineering skills are far more important then deep theoretical knowledge of the theory and math.


I talk to a bunch of engineers with this question at an SF AI meetup, and I'm really curious to hear what the community consensus is but from my perspective, it's tough and getting tougher.

The issue isn't that it can't be done -- in fact, the greatest need right now is for engineers who can come in and build rock solid real world applications on top of commodified neural network architectures and weights, not PhD scientists. Your business might not even use its own ML model! You might just be calling an API.

The challenge is that for a few reasons, it's a very crowded market right now. A lot of people want to make a move into AI, yet for all the hype, the space of viable commercial applications that will survive without indefinite VC funding remains kinda small. Look how AVs are doing after billions and billions in funding chasing one of the most lucrative commercial possibilities imaginable. There's really cool stuff happening industry-wide, and commercial potential is growing, but nowhere near as fast as the cultural hype that has infected certain parts of tech space these past 12 months or so. Plus, many experienced ML engineers and scientists have been dumped back into the job market due to layoffs. So from the hiring side right now, for every AI posting there are tons of applications that have the cool portfolio, and then also a relevant degree and/or prior experience.

That's what you're competing against, so if you're going on portfolio alone it's got to be really outstanding. Way beyond doing the homework for a free course. Learning how to build an ML service that solves an actual problem in the real world reliably enough that you can actually use it should be the goal.

If you happen to be employed at a company where there is a need for an ML engineer in some capacity but no availability (hiring is expensive!), you can try stepping up to help out. Hiring challenges aside, it is absolutely possible to learn on the job the engineering skills needed to, say, build ML infra or work as part of an MLOps team. I recognize that's sort of just up to circumstance though. If you look for a new job, be a little wary of any that want to hire you for a more mundane task (like data entry/cleaning/labeling) with a promise of getting to do the ML engineering stuff, too, "eventually". Such roles do exist but it's also a bait-and-switch tactic.

Anyway, that's what I've got as someone who has been thinking about how to help people looking to do what you're doing, but i hope this thread turns up more ideas too.


> the space of viable commercial applications that will survive without indefinite VC funding remains kinda small

That's because the field of ideas on how to use ML looks like a desert with a lonely oasis of chatbots.


"There's really cool stuff happening industry-wide, and commercial potential is growing, but nowhere near as fast as the cultural hype that has infected certain parts of tech space these past 12 months or so."

This, all day, everyday.


I started working at a prominent AI company in 2020 with no formal background in AI/ML. I was able to bring outside understanding of language and theory of mind into use with large LLMs and create a role as a prompt engineer. My résumé was basically doing interesting things and helping make their models more useful.

Even though I'm not a formal researcher, I've been able to contribute to research projects and be included in papers because the field is so new.

The most important criteria I look for when I interview applicants is what they have built. Github repos, papers even cool Product Hunt projects can have impact.


> The most important criteria I look for when I interview applicants is what they have built.

How do you know the extent of their contribution? Maybe the project was cool but the applicant only had a marginal participation.

I prefer to ask a few comprehension questions that focus on concepts we use every day on the job, like what's the cost of doubling the sequence length for GPT models, or what is cosine similarity and its applications. They should demonstrate they can operate in this space and have a good grasp on the basics.


I look at their personal GitHub repos (when they're relevant.)


It's even easier now than 12 years ago when I started. Better tooling, more efficient pre-trained models, and the whole prompting paradigm that makes a month long task doable in a day. You don't have to label as much as before, or not at all.

When you train small models on small datasets you get very bad out-of-distribution results, but when you use these LLMs they have already seen everything on the web so they are not as often OOD.


A side but related question: what are some interesting industrial applications of machine learning that are not related to advertising, consumption, back office efficiency, and other such things? In other words, what makes machine learning an interesting field to work in?


for me, from the outside, it seems like there is some different approaches to computing that what I have done already. new stuff to learn. large network use and data requirements id imagine lead to problems to solve, that might have only been previously an issue in large companies. with ai becoming general more people will run into new problems that might not have clear answers.

I was working in the web and infra and feel a lot of it is solved in a way.

> machine learning that are not related to advertising, consumption, back office efficiency,

Im kind of at the point where most companies I think have products ill never be thrilled about (as long as its not a negative on the world/unethical), technical problems are interesting though so I look for that


There are some bootcamps available online (EdX), for example:

https://www.edx.org/course/columbia-engineering-machine-lear...


10 weeks seems a bit short for transitioning. Have you taken this course?


No. The bootcamp is very new. I have previously taken graduate courses on the subject and now I'm taking some online courses to avoid getting rusty, as I don't work in an ML position. I'm about to start studying, and tinkering with, LLMs.


Just don’t get stuck thinking about it, that’s going to be your biggest barrier, realistically.


In ML Ops and Infrastructure? Very much so.

Actually coming up with a new model? I am not sure, maybe not.


There seems to be huge demand for mlops now, and from my experience 80% of mlops just seems to be sensible DevOps or software engineering practices.

I'd say someone with a decent background in that could make the transition easily, especially if they brush up on a few of the hot topics, "feature stores", "model repositories" and of course wandb.com - which in many people's mind is mlops.

I don't know if that's how you get the big bucks, I don't work in that type of world. But having the skills or inclination to actually deploy an LLM as a working system much in the same way you would, say a normal web service, does seem to be in high demand right now.


I only had AI at my bachelor's, but we've learned all of it from YouTube/Coursera anyway, because the learning materials are simply better than the lectures 99% prefessors give.


Making a self taught transition to any field is possible. You just have to be interested enough and motivated/disciplined enough to plow through the tough parts of the learning curve.


Worst case is you learn lots of interesting stuff that makes it easier to at least integrate other people’s work, so what’s the risk of diving in?


wasting time. Kind of balancing, do I watch a video here and there and satisfy my curisosity or take it seriously and dedicate time and cut back on other things going on, hoping for a serious roi later on


Don't accept any advice from anyone who have not done this transition themselves, no matter their credentials.

That being said, let's not fool ourselves that going from knowing nothing to being pretty good at the stuff requires some formal, structured education. Programming is much more of a craft than it is either an art or a science and the field of AI is no exception.


If you can reliably publish at NIPS all will be forgiven.


Haven't seen NIPS in a while! Think they changed the name a few years ago to make searching for the conference from work a bit less likely to send you to a NSFW site. For anyone reading and wondering, it's NeurIPS.


More alarming (and not talked about!) is that it is also a racial slur...


yeah i did this, just work as a software engineer at an AI start up and you'll inevitably have to get your hands dirty with AI stuff


Yes.


[flagged]


^^^^^^^


What is ^^^^^^^ supposed to mean?


This?




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: