Please state the location and include REMOTE, INTERNS and/or VISA
when that sort of candidate is welcome. When remote work is
not an option,
include ONSITE.
Please only post if you personally are part of the hiring company—no
recruiting firms or job boards. Only one post per company. If it isn't a household name,
please explain what your company does.
Commenters: please don't reply to job posts to complain about
something. It's off topic here.
Readers: please only email if you are personally interested in the job.
Searchers: try https://hnhired.fly.dev, https://kennytilton.github.io/whoishiring/,
https://hnjobs.emilburzo.com, https://news.ycombinator.com/item?id=10313519.
Don't miss these other fine threads:
Who wants to be hired? https://news.ycombinator.com/item?id=33818035
Freelancer? Seeking freelancer? https://news.ycombinator.com/item?id=33818036
Internet Archive is a non-profit building a free library of all of the published works of humanity to share with the world. We're not there yet, but we've managed to accumulate some data along the way. Can you help us engineer it?
The Archiving and Data Services department provides services to mission-aligned organizations (primarily other libraries and cultural heritage institutions). These services include: web crawling SaaS, managed large-scale crawls, long-term digital preservation, and particularly relevant for this role: making use of these web archives and digital collections.
We're looking for a Data Engineer to help us with some of the following: - Turn researcher Jupyter notebooks into robust systems (these notebooks are mostly in Scala) - Develop data munging/wrangling/deriving workflows (we use Spark and Temporal.io) - Help administrate a 7.5 Petabyte Hadoop cluster - Potentially write jobs for our main, in-house long term storage cluster - There's always APIs that need work (these are mostly in Python) - ML experience is an interesting bonus
We're fully remote, employees can be based anywhere in US or Canada.
This is a new opening as of Dec 1, so new we're still working on getting it posted. If interested, please reach out to Alex at avdempsey [at] archive [dot] org.