Using python and constraining yourself to only use a basic subset of the standar...

mazambazz · 2024-07-17T05:58:33 1721195913

Hard disagree. I've written plenty in both. They both have their strengths, but bash is just more efficient if you're working with the filesystem. The UNIX philosophy of "do one thing and do it well" shines here. Python is more powerful but it's a double-edged sword. If I want to read a file containing API endpoints, send a request to them for some JSON, and do some parsing, I don't want to need or want to deal with importing modules, opening file objects, using dictionaries, methods, functions, etc.

Why do that when I can literally just ``` N=0 while read -r URL; do curl "$URL" | jq '.data[].someProp' | grep -v "filter" > "$N.data" N="$((N+1))" done < ./links.txt ```

The other thing is bash makes it exceptionally easier to integrate across different system tools. Need to grab something from a remote with `rsync`, edit some exif, upload it to a CDN? That's 3 commands in a row, versus god knows what libraries, objects, and methods you need to deal with in Python.

skydhash · 2024-07-17T06:21:49 1721197309

Libraries are nice, until you have to write the glue code between the modules and functions. But sometimes you already have the features you want as programs and you just need to do some basic manipulation with their arguments and outputs. And the string model can work well in that case.

wiseowise · 2024-07-17T07:00:15 1721199615

> Why do that when I can literally just ``` N=0 while read -r URL; do curl "$URL" | jq '.data[].someProp' | grep -v "filter" > "$N.data" N="$((N+1))" done < ./links.txt ```

So that other people can read and modify this.

pxc · 2024-07-19T03:10:06 1721358606

? That code's meaning is extremely clear: it reads from a list of URLs that return JSON lists of JSON objects. For each URL, it pulls out some property and checks whether each line does not contain the string 'filter'. Those lines which clear the 'filter'-filter are written to a file whose name is the line of the original input file which contained the URL pinged suffixed by the extension '.data'.

It's very easy to read and modify if you just write it out longwise, which is what I'd always do in some actual script. (I also like to put reading data at the beginning of the pipeline, so I'd use a useless use of cat here.) To illustrate:

  N=0
  cat links.txt \
    | while read -r URL; do
        curl "$URL" \
          | jq '.data[].someProp' \
          | grep -v "filter" \
          > "$N.data"
        N="$((N+1))"
      done

It's a very simple pipeline with a single loop. It's not very different from pipelines you might write to transform some data using a thread macro in Clojure, or method chaining against a collection or stream in OOP languages like Java or Scala or Ruby or whatever you like.

qz_kb · 2024-07-17T06:09:31 1721196571

now add error handling.

ryapric · 2024-07-17T06:21:28 1721197288

That's really not that hard to add above. A lot of folks act like it's impossible to handle errors etc. in bash, but it's pretty straightforward -- certainly no more difficult than in any other language. The hard part, like with all languages, is deciding how to handle errors cases. The rest is just code.

wiseowise · 2024-07-17T07:01:40 1721199700

> That's really not that hard to add above.

Then show how it is going to look like?

ryapric · 2024-07-17T07:13:55 1721200435

On mobile so no idea if this a) looks good or b) runs (especially considering the command substitutions, but you could also redirect to temp files instead), but it's just something like this:

    N=0
    while read -r URL; do
      data="$(curl "$URL")" || { printf 'error fetching data\n' && exit 1 ; }
      prop="$(jq '.data[].someProp' <<< "$data" || { printf 'error parsing JSON response\n' && exit 1 ; }
      grep -v "filter" <<< "$prop" > "$N.data" || { printf 'error searching for filter text\n' && exit 1 ; }
      N="$((N+1))"
    done < ./links.txt

bash also has e.g. functions, so you could abstract that error handling out. Like I said, not that weird.

Edit: oh good, at least the formatting looks ok.

theamk · 2024-07-17T18:47:44 1721242064

You forgot "-f" flag to curl, which means it won't fail if the server returns an error. Also "jq" returns success on empty input, pretty much always. Together, this might mean that networking errors will be completely ignored, and some of your data files will mistakenly become empty when server is busy. Good luck debugging that!

And yes, you can fix those pretty easily.. as long as you aware about them. But this is a great illustration why you have to be careful with bash: it's a 6-line program which already has 2 bugs which can cause data corruption. I fully agree with the other commenter: switch to python if you have any sort of complex code!

1-more · 2024-07-17T19:34:28 1721244868

plus the delightful mechanism of

    set -euo pipefail # stop execution if anything fails
    cleanup () {
        rm $tmp_file
        popd
        # anything else you want to think of here to get back to the state of the
        # world before you ran the script
    }
    trap cleanup EXIT # no matter how the program exits, run that cleanup function.

A really good rundown of the different options for set https://www.howtogeek.com/782514/how-to-use-set-and-pipefail...

Too · 2024-07-18T05:08:01 1721279281

But why bother? The moment you start doing all that, all the arguments of "oh look how much i can solve with my cool oneliner" goes away. The python version of that code is not only safe by default, it's also shorter and actually readable. Finally it is a lot more malleable, in case you need to process the data further in the middle of it.

    for N, url in enumerate(Path("links").read_text().splitlines()):
        resp = requests.get(url)
        resp.raise_for_status()
        prop = resp.json()["data"]["someProp"]
        matches = (line for line in prop.splitlines() if "filter" not in line)
        Path(f"{N}.data").write_text("\n".join(matches))

I'm sure there is something about the jq [] operator i am missing but whatever. An iteration there would be a contrived use case and the difficulty to understand it on a glance just proves I'm not interested. As someone else mentioned, both curl and jq requires some extra flags to not ignore errors, i can't say if that was intentional or not. It would either way be equally easy to solve.

ryapric · 2024-07-18T22:06:57 1721340417

I never said anything about one-liners?

pxc · 2024-07-19T03:23:12 1721359392

There's real value in 'one-liners', though. A 'one-liner' represents a single pipeline; a single sequence of transformations on a stream of data— a single uninterrupted train of thought.

Chopping a script into a series of one-liners where every command in the pipeline but the first and/or last operate only on stdin and stdout, as far as is reasonable to do, is a great way to essentially write shell in an almost 'functional' style. It makes it easy to write scripts with minimal control flow and easy-to-locate effects.

Such scripts are ime generally much easier to read than most Python code, but I don't find Python especially easy to read.

cassianoleal · 2024-07-17T12:32:26 1721219546

    set -euo pipefail; N=0 while read -r URL; do curl "$URL" | jq '.data[].someProp' | grep -v "filter" > "$N.data" N="$((N+1))" done < ./links.txt

aulin · 2024-07-17T05:17:57 1721193477

Choice is a privilege you rarely have in a day to day job. Bash most of the time is already there and you have to live with it.

Also, I've been forced to work on a huge SCons based project and I guarantee python can make your life quite miserable when used for something it's not supposed to.

qz_kb · 2024-07-17T05:25:26 1721193926

I'm not suggesting you build a a whole build system with python (which is basically bazel and it seems to be good enough for google.)

A lot of originally little automation/dev scripts bloat into more complicated things as edge cases are bolted on and bash scripts become abominations in these cases almost immediately.

udev4096 · 2024-07-17T05:28:03 1721194083

Bash is native. You won't find python pre-installed on all distros, like alpine

hi_hi · 2024-07-17T06:49:36 1721198976

Bash may be native, but alot of the programs you'll want to call may not be, or will differ between platforms in subtle ways. Although this won't be a concern for small/trivial scripts, but if we're talking about python as an alternative, my point probably still applies.

a-french-anon · 2024-07-17T08:17:48 1721204268

This. People using bash extensions and util-linux as if they're standard are my bane.

If you can't do it in POSIX (sh and utilities) and don't want to do an extensive investigation of portability for what you need, pony up for Python/Tcl/Perl (all in MacOS base, by the way).

lloeki · 2024-07-17T07:21:09 1721200869

> You won't find python pre-installed on all distros, like alpine

The same can be said of bash, especially since you mention Alpine (also FreeBSD)

Perl, though... ;)

(if Perl is not there it gets pulled in very quickly as a dependency of something, e.g typically you pull git, you get perl)

yawpitch · 2024-07-17T05:42:33 1721194953

True, though there’s a whole world of people who will yell at you for using Bash-isms rather than pure posix precisely because Bash (at least up to date versions) isn’t everywhere either.

rendaw · 2024-07-17T05:09:57 1721192997

I agree, but I've been having a hard time even with python recently. I had a small script (50-100 lines) to format a data drive on boot I refactored, 3 or 4 obvious undeclared variables and who knows how many more I didn't notice - mypy found 0 issues.

I was looking up statically typed alternatives and stumbled upon Ammonite and Scala-CLI for scala. I haven't used them much, but Ammonite bundles some basic libraries including command line parsing, http, and json, which are probably 99% of what I used in Python too? And Scala seems like an interesting language too with decent editor integration.

wiseowise · 2024-07-17T07:02:56 1721199776

> I had a small script (50-100 lines) to format a data drive on boot I refactored, 3 or 4 obvious undeclared variables and who knows how many more I didn't notice - mypy found 0 issues.

What does pyright/pylance say?

macrael · 2024-07-17T05:12:30 1721193150

use mypy!

cinntaile · 2024-07-17T05:29:29 1721194169

> mypy found 0 issues

It didn't help him. Which is a bit strange?

ok_dad · 2024-07-17T07:04:13 1721199853

To make mypy strict enough to compare your dev experience to a typed language, you have to declare all sorts of configurations, otherwise there are huge swaths of things it’ll allow compared to most typed languages.

I use below, and only when necessary use ignore comment pragmas when third party libraries are not typed.

    #? message format settings
    show_column_numbers = true
    show_error_codes = true
    #? strictness settings
    disallow_any_unimported = true
    disallow_any_expr = true
    disallow_any_decorated = true
    disallow_any_explicit = true
    disallow_any_generics = true
    disallow_subclassing_any = true
    disallow_untyped_calls = true
    disallow_untyped_defs = true
    disallow_incomplete_defs = true
    disallow_untyped_decorators = true
    no_implicit_optional = true
    warn_redundant_casts = true
    warn_unused_ignores = true
    warn_return_any = true
    warn_unreachable = true
    strict_equality = true
    strict = true

runjake · 2024-07-17T05:31:56 1721194316

As a person who’s been doing shell programming for 35 years and Python for 15 years, I completely disagree.

Bash scripts and Bash control flow has been and is used in highly critical scripts all over the place, including other planets.

We’ve been writing reliable, well-designed scripts for many decades. Some of my scripts are several hundred lines long and older than the system engineers currently occupying their positions.

Python is fine too. Use the right tool for the right job.

NegativeLatency · 2024-07-17T05:05:50 1721192750

Had decent luck using chatgpt to translate some crufty old bash deploy scripts to python

guappa · 2024-07-17T05:24:09 1721193849

> Using python and constraining yourself to only use a basic subset of the standard library modules

Used to be a viable strategy until they started to drop modules from the standard library at every single release.

yawpitch · 2024-07-17T05:40:09 1721194809

> Used to be a viable strategy until they started to drop modules from the standard library at every single release.

That’s a bit of a ridiculous statement, there’s a small number of very-long deprecated modules removed in 3.12, and some more recently deprecated modules in 3.13. And these things are old, largely or completely unmaintained, and usually complete obsolete.

I’d be surprised if anyone has a script that’s been adversely effected by this, and if they did it’s because they stopped maintaining it years ago (and also chose to both silence warnings and upgrade interpreter versions without reading the release notes).

guappa · 2024-07-17T06:46:52 1721198812

> largely or completely unmaintained

Consider that the python foundation absolutely has the resources to put a developer to maintain them.

If they don't is because they don't want to.

> and usually complete obsolete

The amount of modules I've had to patch to keep working on 3.12 tells me they aren't as obscure and unused as you think they are.

> I’d be surprised if anyone has a script that’s been adversely effected by this

I'd say that over 99.9999% of python users do not download python from python.org. They use whatever is on their system. Which means that updating an LTS distribution will create mayhem. And that's considering that most modules have already been patched by the distribution maintainers to fix all the brokenness introduced by the new python version.

Also, a bash script from 30 years ago still works fine. A python script from 5 years ago doesn't start.

yawpitch · 2024-07-17T12:23:27 1721219007

>Consider that the python foundation absolutely has the resources to put a developer to maintain them.

The resources to pay someone doesn’t mean that someone with interest and knowledge exists, especially for modules that were formally deprecated in Python 2 and which will never be reinstated. Lots of this stuff is just cruft, most of which has an obvious replacement, and if it doesn’t there’s a decent chance it’s not been used in years by anyone and if it ever had a reason to be in the standard lib, that reason is long gone.

> The amount of modules I've had to patch to keep working on 3.12 tells me they aren't as obscure and unused as you think they are.

If that number is at all significant, where are the issues pushing back against deprecation and removal? It’s not like there hasn’t been a formal process for all these modules. What got deleted in 3.12 was well documented and easily caught just by catching DeprecationWarning… anyone getting surprised by these modules going missing isn’t doing due diligence.

> I'd say that over 99.9999% of python users do not download python from python.org. They use whatever is on their system. Which means that updating an LTS distribution will create mayhem.

And I’ll pretty much guarantee you that 99.9999% of those users haven’t heard of, much less imported, any of the modules that have been removed.

> And that's considering that most modules have already been patched by the distribution maintainers to fix all the brokenness introduced by the new python version.

But again where are the issues and hands being waved that these issues are wide-spread enough to halt or reverse the deprecation process? If distro maintainers are simply patching everything for users who are constantly advised to leave their system Python alone and they’re not reporting the issues then those distro maintainers are harming everyone.

> Also, a bash script from 30 years ago still works fine. A python script from 5 years ago doesn't start.

I’ve written plenty of Python scripts that are still running on the interpreter and stdlib they were authored for, decades later. I’m also keenly aware that most of those scripts could not be written in Bash without reimplementing a significant portion of the Python standard lib and ecosystem, none of which was materially affected by the 3.11>3.12 removals.

marcusramberg · 2024-07-18T08:50:26 1721292626

For instance, some fairly commonly used Linux apps like ulauncher, autokey, fail2ban and xpra depend on pyinotify which hasnt been maintained for the last 6 years or so, which is why fedora, arch and nixos now includes patches to make it 3.12 compatible. I don’t find it very unlikely that your inhouse script could be using it too.

guappa · 2024-07-22T05:08:50 1721624930

> The resources to pay someone doesn’t mean that someone with interest and knowledge exists

That's why you can pay people. So that despite their disinterest they will read the code and acquire the knowledge needed.

> especially for modules that were formally deprecated in Python 2

??? I'm talking about modules removed now, in 2024. They were not deprecated since python2. Please don't steer the conversation to different topics.

> Lots of this stuff is just cruft, most of which has an obvious replacement

distutils? Is it cruft? The thing used to install modules? Can you tell me which stdlib replacement it has?

> it’s not been used in years by anyone

Why did I have to patch over 10 modules?

> and if it ever had a reason to be in the standard lib, that reason is long gone

Is installing modules no longer a thing then?

> those distro maintainers are harming everyone.

Aaah yes, the evil distro maintainers that keep things working instead of breaking everything. They're the real culprits here… really?

> I’ve written plenty of Python scripts that are still running on the interpreter and stdlib they were authored for, decades later.

Decades later? That's at least 20 years. If that were true they'd be written in python2 and I can promise you they wouldn't work with python 3.12. So I'll consider this statement a lie.

Please try to be more honest when you interact with me the next time. This hasn't been pleasant.

isbvhodnvemrwvn · 2024-07-17T05:09:39 1721192979

I would agree if python dependency management wasn't a dumpster fire.

rendaw · 2024-07-17T05:11:08 1721193068

I think the point GP was making was that you restrict yourself to only the bundled standard library, which covers most of the basics needed for scripting.

qz_kb · 2024-07-17T05:12:52 1721193172

This is why you force yourself to use nearly zero dependencies. The standard library sys, os, subprocess, and argparse modules should be all you need to do all the fancy stuff you might try with bash, and have extremely high compatibility with any python3.x install.

guappa · 2024-07-17T06:48:28 1721198908

And it's a dumpster fire because they refuse to make any decision and decide which is the supported way.

Instead they removed distutils, so now there is no way to install any module without using a 3rd party installer.