Ironic thing here is that Hadoop is mostly already outdated.
Which is btw one of the depressing thing for a lot of data engineers: we used to play with those cool distributed processing frameworks, and now? We are mostly writing some terraform to deploy cloud resources, most of the distributed part being handled by those cloud providers.
> Which is btw one of the depressing thing for a lot of data engineers: we used to play with those cool distributed processing frameworks, and now? We are mostly writing some terraform to deploy cloud resources, most of the distributed part being handled by those cloud providers.
Sounds to me like switching one provider/tool by another - or are data engineers feeling bummed because the job has become too trivial / less fun?
I am only speaking for myself here, but I am really feeling the switch from "data engineering" to "data ops", for whatever that means.
In short, 5-10 years ago, writing mapreduce / spark jobs (or even debugging / optimizing hive jobs) was complex enough that it was often the job of the data engineer (and not the data analyst / scientist). And I do not only mean writing the data processing logic, but more importantly, properly configuring it so that the resource footprint was acceptable. This required a good understanding of the underlying framework, analyzing the job execution plan, tweaking the resource configuration, etc.
Now, writing distributed jobs is pretty trivial with most cloud providers, hence it is now purely done by data analysts and scientists. And the data engineers have switched to doing more of a devops kind of work, doing the plumbing between the various cloud components and the IaC required to provide those cloud resources to other data users. In short, you can be a data engineer and have absolutely no clue on how distributed systems are actually working, this will not be an issue in your daily job.
It is still ongoing, but the trend is really on managed services. Most shops that are still running hadoop distribution are doing it for legacy reasons (and I used to work in one).
I mean, just look at job offers: how many offers do you see where hadoop experience is a plus VS cloud experience?
Which is btw one of the depressing thing for a lot of data engineers: we used to play with those cool distributed processing frameworks, and now? We are mostly writing some terraform to deploy cloud resources, most of the distributed part being handled by those cloud providers.