Don't let dicts spoil your code

cardanome · 2024-10-09T02:34:58 1728441298

This is absolute key advice.

Another way to look at it is the functional core, imperative shell pattern.

Wrapping up your dict in a value object (dataclass or whatever that is in you language) early on means you handle the ugly stuff first. Parse don't validate. Resist the temptation of optional fields. Is there really anything you can do if the field is null? No, then don't make it optional. Let it crash early on. Clearly define you data.

If you have put your data in a neat value objects you know what is in it. You know the types. You know all required fields are there. You will be so much happier. No checking for null throughout the code, no checking for empty strings. You can just focus on the business logic.

Seriously so much suffering can be avoided by just following this pattern.

sevensor · 2024-10-09T14:03:00 1728482580

I like to think about this as securing the perimeter. Inside, everything is typed, static analysis constrains what can happen, and I am never surprised as long as the code type checks. Outside, data is probably garbage. All the effort goes into locking down the interface. Pydantic is ok for this, although I find it too intrusive for my taste, and I think mixing arbitrary validity predicates with structural correctness is a mistake. Still, I’d much rather walk into a codebase that uses Pydantic than one that assumes its inputs are valid, because confidently writing business logic that can assume its inputs are correct is incredibly liberating.

aequitas · 2024-10-09T12:42:06 1728477726

> Another way to look at it is the functional core, imperative shell pattern.

A good explanation of this is: https://www.destroyallsoftware.com/talks/boundaries

cruffle_duffle · 2024-10-09T14:55:16 1728485716

Pydantic makes that stuff super simple too. It has all manner of data validation hooks as well as (de)serialization help.

mcdeltat · 2024-10-09T08:53:57 1728464037

The "loosey-goosey" approach to data in coding is one of my biggest pet peeves. Some people absolutely insist on making everything as dynamic as possible, and then wonder why we end up with a buggy mess. I always found it very natural to move as much as possible into the type system, because why wouldn't I want the machine to find all my inevitable mistakes for me?

majewsky · 2024-10-09T12:16:40 1728476200

My personal favorite of this has got to be a particular mess of a Python API that led to me implementing "type veryFlexibleUint64": https://github.com/sapcc/limes/blob/9ea9d1f86383f8a5fe0fa1d1...

pbronez · 2024-10-09T12:35:56 1728477356

So brutal

Kinrany · 2024-10-09T08:29:25 1728462565

There's little to disagree with here, and yet this comment reads like a slogan soup.

jimmytucson · 2024-10-09T02:00:31 1728439231

Here’s an out-there take, but one I’ve held loosely for a long time and haven’t shed yet: dicts are not appropriate for what people mostly use them for, which is named access to member attributes.

dict is an implementation of a hash table. Hash table are designed for o(1) lookup of items. As such, they are arrays which are much bigger than the number of items they store, to allow hashing items into integers and sidestep collisions. They’re meant to act like an index that contains many records, not a single record.

A single record is more like a tuple, except you want named access instead of, title = movie[0], release_year = movie[1], etc. And Python had that, in NamedTuple, but it was kinda magical and no one used it (shoutout Raymond Hettinger).

Granted, this rant is pretty much the meme with the guy explaining something to a brick wall, in that dicts are so firmly entrenched as the "record" type of choice in Python (but not so in other languages: struct, case class, etc. and JSON doesn’t just deserialize to a weak type but I digress).

fallingsquirrel · 2024-10-09T03:04:03 1728443043

NamedTuples are great, but they let you do too much with the objects. You probably don't want users of your GitHubRepo class to be able to do things like `repo[1]` or `for foo in repo`. Dataclasses have more constrained semantics, so I reach for them by default. In my ideal world they would default to frozen=True, kw_only=True, slots=True, but even without those they're a big improvement.

aatarax · 2024-10-09T03:51:52 1728445912

Dicts in python are for when you have a thing and you aren't sure what the keys are. Dataclasses are for when you have a thing and you're sure what the keys (attributes are). The trouble is when you have a thing and you're sort of sure, but not entirely sure, and some things are definitely there but not everything you might be thinking of.

jsyang00 · 2024-10-09T02:03:49 1728439429

I think most modern Python codebases are using dataclasses/ something like Pydantic. I think dicts are mostly seen, like the author suggests, because something which you hacked up to work quickly ends up turning into actual software and it's too much work refactor the types

jonathrg · 2024-10-09T07:10:04 1728457804

dicts are used internally in the language to look up class and module attributes. They are optimized for this use case. How can it be wrong to use them that way when the very fabric of the language depends on it?

namedtuple is widely used in Python code, especially before the introduction of dataclasses.

jpc0 · 2024-10-09T13:31:17 1728480677

A hash function will always be more expensive than a pointer lookup, specially concidering a pointer lookuo is still needed after the hash function.

No matter what you do, a lookup into an array will always be quicker than a hash lookup if you don't need to do a linear search, even in a lot of cases the linear search will be quicker.

Structs in other languages is a lookup of pointer + and offset. Which to my knowledge is also true in python classes using __slots__. There's no reason to use a dict if you know the contents of the data, use a dataclass with slots=True purely because there's no hash function run on every lookup into the datastructure.

sickblastoise · 2024-10-09T18:06:46 1728497206

It’s not wrong to use dicts, it’s just bad practice when you could use something like a dataclass or pydantic model instead.

Dicts are useful for looking things up, like if you have a list bunch of objects that you need to access and modify, you should use a dict.

If you are using the dict as a container like car={“make”:”honda”,”color”:”red”}, you should use a proper object like a class, dataclass, or pydantic model based on whether you need validation, type safety, etc. This drastically reduces bugs and code complexity, helps others reason about your code, gives you access to better tooling etc.

cruffle_duffle · 2024-10-09T14:57:12 1728485832

Right? I thought pretty much all the higher level “objecty” stuff in python are dicts under the hood.

travisjungroth · 2024-10-09T05:33:56 1728452036

I think I once heard a Clojure talk where they were referred to as big and small maps. Small ones are what you’re comparing to arrays.

A place where dicts for hard coded keys makes sense is notebooks. The convenience is worth it and it’s unlikely to get out of hand.

seabrookmx · 2024-10-09T04:56:25 1728449785

Subclassing NamedTuple is very ergonomic, and given they're immutable unlike data classes I often reach for them by default. I still use Pydantic when I want custom validation or when it ties into another lib like FastAPI.

psd1 · 2024-10-12T17:34:26 1728754466

You know about frozenset, right? Dataclasses can be immutable.

It's python, so take that with a grain of salt.

Jean-Papoulos · 2024-10-09T06:13:03 1728454383

>"unstructured data is problematic"

>"solution : use dataclasses"

Damn, it's almost like using an untyped language for large projects is not a great idea.

mkesper · 2024-10-09T09:26:41 1728466001

Python is absolutely typed. By default, it's really dynamic, though.

ktosobcy · 2024-10-09T07:51:23 1728460283

And yet we are overwhelmed by javascript nonsense... I get it - it's so easy to get up to speed with tiny snippets but it quickly becomes hot mess.

Yes, decades ago I was also fascinated by python and it's ease of doing stuff (compiler doesn't complain that I missed something) but with time I grew fond of statically typed languages... they simply catch swaths of errors earlier...

kabes · 2024-10-09T09:03:08 1728464588

Are we still overwhelmed by js? I almost only see TS code these days.

ktosobcy · 2024-10-09T15:36:05 1728488165

I had to deal with it ~1-2 years ago so for my faily ancient self it feels "recent" though considering that there were JS frameworks popping up every couple of months and were getting dropped a year later then my timeframe may be off ;)

sosotrue · 2024-10-09T06:35:36 1728455736

[flagged]

jmorenoamor · 2024-10-09T06:49:19 1728456559

Dynamic languages demand self discipline, they teach you to respect runtime and think ahead of execution time.

I've written software with both typed and untyped languages and never had problems (out of the ususal) with them.

ktosobcy · 2024-10-09T07:52:51 1728460371

> Dynamic languages demand self discipline, they teach you to respect runtime and think ahead of execution time.

Ah... yes... because static languages doesn't do that by forcing you to properly model everything. And as a bonus you can easily navigate between everything and not fear that you miss something while refactoring...

jrjrjrjrj · 2024-10-09T06:59:16 1728457156

I would argue that dynamic languages make a compile time problem a run time problem...

So yeah that small not hit portion of code can always be a time bomb if it does not get tested...

consp · 2024-10-09T08:15:21 1728461721

> So yeah that small not hit portion of code can always be a time bomb if it does not get tested...

That has nothing to do with the language, and can happen in any language.

bigstrat2003 · 2024-10-09T01:37:59 1728437879

For better or for worse, Python doesn't do typing well. I don't disagree that I prefer well defined types, but if that is your desire then I think Python is perhaps not the correct choice of language.

Ey7NFZ3P0nzAe · 2024-10-09T03:18:07 1728443887

Personnaly I became a huge fan of beartype : https://pypi.org/project/beartype/

Leyec, the magic dev behind it managed to make a full python type checker with super advanced features and about 0 overhead. It's crazy

skeledrew · 2024-10-09T04:57:56 1728449876

I tried using it, but beartype quickly became a pain with having to decorate things manually. Then I found typeguard which goes even further and never looked back. Instead of manually decorating each individual function, an import hook can be activated that automatically decorates any function with type annotation. Massive QoL improvement. I have it set to only activate during testing though as I'm unsure of the overhead.

Mattwmaster58 · 2024-10-09T13:13:01 1728479581

It looks like beartype supports the same sort of implicit decoration, because there's mention of an explicit API:

>Beartype now implicitly type-checks all annotated classes, callables, and variable assignments across all submodules of all packages.

skeledrew · 2024-10-09T23:33:06 1728516786

That definitely got implemented after I had already moved on.

Ey7NFZ3P0nzAe · 2024-10-12T17:04:21 1728752661

Iirc beartype is several orders of magnitude faster than typeguard so you might want to give it a try again

nerdponx · 2024-10-09T02:02:04 1728439324

Python does typing pretty darn well now for data like API requests and responses.

"Typed Python" does poorly (compared to e.g. Typescript) on things like overloading functions, generics, structural subtyping, et al.

est · 2024-10-09T03:01:36 1728442896

> Python doesn't do typing well

Golang does typing, but JSONs are PITA to handle.

Try parsing something like `[{"a': 1, "b": "c", "d": [], "e": {}}, null, 1, "2"]` in go.

Types are a bless as well as a curse.

Aditya_Garg · 2024-10-09T03:11:03 1728443463

Thats only because your list has different types. Its a badly formed API and if you really need to support that use case then you can use maps and reflection to handle it.

est · 2024-10-09T03:32:08 1728444728

The problem is, programmers can't dictate what JSON should look like in the wild.

We used to have strict typed XML. Nobody even bothered.

a57721 · 2024-10-09T05:11:04 1728450664

> The problem is, programmers can't dictate what JSON should look like in the wild.

Not JSONs in general, but a sane API would never return something like that.

> We used to have strict typed XML. Nobody even bothered.

Nowadays there is OpenAPI, GraphQL, protobuf, etc. and people do bother about such things.

mook · 2024-10-09T05:25:19 1728451519

Unfortunately, a lot of the time you need to deal with other people's APIs.

shiroiushi · 2024-10-09T05:13:08 1728450788

>We used to have strict typed XML. Nobody even bothered.

Yeah, because it was ugly as hell and not human-readable.

emmanueloga_ · 2024-10-09T17:22:33 1728494553

gjson [1] and a few other go packages offer a way to parse arbitrary JSON without requiring structs to hold them.

re: Python. I like PyRight/PyLance for Python typing, it seems to "just work" afaict. I also like msgspec for dataclass like behavior [2].

---

1: https://github.com/tidwall/gjson

2: https://jcristharif.com/msgspec/

jpc0 · 2024-10-09T13:45:47 1728481547

[]inferface

But the same issue exists as other dynamic languages, how do you know what the type is of the item you are accessing?

If you know the array will be laid out exactly like that before you make the request you can always create a custom parser to return a struct with those fields name what they actually are instead of arbitrary data.

The only valid way to parse that dynamically is to try and fail in a loop which is inefficient enough that you should stop using whatever API returns that monstrosity.

Turskarama · 2024-10-09T06:04:46 1728453886

And if you got that JSON back in Python, how would you do anything with it? This API is essentially useless. You can deserisalise it, sure, but then what?

est · 2024-10-10T01:31:17 1728523877

I can get parsing job easily done without mental gymnastics.

Turskarama · 2024-10-10T23:51:23 1728604283

Right but what do you do with the parsed object? An array of random objects is used for what, exactly?

ungamedplayer · 2024-10-09T05:54:50 1728453290

Can someone educate me in why dicts are uncool for explained reasons, but clojure (which seems to be highly recommended on hn) seems to suffer the same issues when dealing with a map as a parameter (ring request etc).

I know how to deal with missing values or variability in maps, and so do a lot of people.. what am I missing here?

bloppe · 2024-10-09T06:25:58 1728455158

Dicts are great when the data is uniform and dynamic, like an address book mapping names to contact info. You never assume that a key must be in there. Lookups can always fail. That's normal for this kind of use-case.

When the data is not uniform (different keys point to differently-typed values), and not as dynamic (maybe your data model evolves over time, but certain functions always expect certain keys to be present), a dict is like a cancer. Sure, it's simple at first, but wait until the same dict gets passed around to a hundred different functions instead of properly-typed parameters. I just quit my job tech at a company that shall remain nameless, partially because the gigantic Ruby codebase I was working on had a highly advanced form of this cancer, and at that point it was impossible to remove. You were never sure if the dict you're supplying to some function had all the necessary keys for the function it would eventually invoke 50 layers down the call stack. But, changing every single call-site would involve such a major refactor that everybody just kept defining their functions to accept these opaque mega-dicts. So many bugs resulted because of this. That was far from the only problem with that codebase, but it was a major recurring theme.

I learned this lesson the hard way.

cornholio · 2024-10-09T07:26:30 1728458790

This should be the top answer. It's not about using dicts in their primary use case, it's about abusing them as a catch all variadic parameter for quick prototyping and "future expansion"

scotty79 · 2024-10-09T08:54:02 1728464042

I think the problem is that different data containers have completely different interfaces.

If getting a filed of your object had the same syntax as getting a value from a dict you could easily replace dicts with smarter, more rigid types at any point.

My dream is a language that has the containers share as much interface as possible so you can easily swap them out according to your needs without changing most of the code that refers to them. Like easily swap dict for BTreeMap or Redis.

I think the closest is Scala but it fallen out of favor before I had a chance to know it.

lispisok · 2024-10-09T06:23:21 1728455001

Maps arent nearly as problematic in clojure because data is immutable by default on top of the functional paradigm where your program is basically a big composition of functions and the language is built around using maps. In Python I largely agree with the author. In clojure I love my maps.

Here is Rich Hickey with an extreme counter example although I would argue he's really demonstrating against getters and setters. https://www.youtube.com/watch?v=aSEQfqNYNAc

nlitened · 2024-10-09T07:37:57 1728459477

In Clojure, maps don’t have either of the flaws highlighted in the article. They are neither opaque (they are self-describing with namespaces keys) nor mutable.

As a result, they are very powerful and simple to use.

orf · 2024-10-09T07:08:51 1728457731

They also work fine with JavaScript.

The issue is that the concrete types are implicit. Depending on the language, runtime or type system expressing the type in a “better” way might be very hard or un-ergonomic.

Waterluvian · 2024-10-09T01:17:54 1728436674

I think one really nice thing about Python is duck typing. Your interfaces are rarely asking for a dict as much as they’re asking for a dict-like. It’s pretty great how often you can worry about this kind of problem at the appropriate time (now, later, never) without much pain.

There’s useful ideas in this post but I’d be careful not to throw the baby out with the bath water. Dicts are right there. There’s dict literals and dict comprehensions. Reach for more specific dict-likes when it really matters.

turnsout · 2024-10-09T02:39:11 1728441551

Duck typing is so fragile… Once you have implementations that are depending on your naming or property structure, you can’t update the model without breaking them all.

If you use a real type, you never have to worry about this.

pistoleer · 2024-10-09T05:07:54 1728450474

You would still have to update everything if you rename a field in a struct, what do you mean you never have to worry?

dwattttt · 2024-10-09T06:03:59 1728453839

If you use type checking, the breakage occurs when you introduce the change: the author of the change is the one who can figure out what it means if 'foo' is no longer being passed into this function.

If you're duck typing, you find this out in the best case when your unit tests exercise it, and in the worst case by a support call when that 1/1000 error handling path finally gets exercised in production.

pistoleer · 2024-10-09T07:57:37 1728460657

I agree with that, in the context of dynamically typed languages.

Slowly but surely, new languages are starting to develop with static duck typing. Implicit interfaces if you will.

jcgl · 2024-10-11T04:46:35 1728621995

Which languages are developing? This is something I’ve been wishing for.

trealira · 2024-10-12T20:58:07 1728766687

> static duck typing

What do you mean by this? Macros? C++ templates?

turnsout · 2024-10-09T13:53:32 1728482012

Exactly… with strong typing, you can do the refactor automatically, because the IDE knows everywhere that symbol is used. (For codebases in your control—for third party users, you can indicate that something has been deprecated or renamed via a warning or other language feature)

zmgsabst · 2024-10-09T07:18:54 1728458334

And now inserting every middleware is an exercise in retyping the system, rather than piggybacking on the parameter dict.

cle · 2024-10-09T04:15:46 1728447346

Dicts can be a problem, but this particular example isn't that great, like in this diagram from the article:

  External API <--dict--> Ser/De <--model--> Business Logic

Life's all great until "External API" adds a field that your model doesn't know about, it gets dropped when you deserialize it, and then when you send it back (or around somewhere else) it's missing a field.

There's config for this in Pydantic, but it's not the default, and isn't for most ser/de frameworks (TypeScript is a notable exception here).

Closed enums have a similar tradeoff.

mjr00 · 2024-10-09T05:00:24 1728450024

If external API adds a new field but your software already worked, you didn't need it in the first place, so why should it matter?

Dropping unknown/unused fields makes sense in 99% of cases.

buzer · 2024-10-09T05:32:52 1728451972

Unfortunately some APIs assume that they will get all the fields as part of the update. If field doesn't exist in the input it gets it will drop the original value during the update.

_ZeD_ · 2024-10-09T06:04:32 1728453872

yet, again, most of the libraries already deal with extra fields... i.e. for pydantic https://docs.pydantic.dev/latest/concepts/models/#extra-fiel...

vouwfietsman · 2024-10-09T07:32:40 1728459160

I don't deal with external APIs often, but this is a development nightmare. You can't just magically let data flow through your system without knowing about it, because this is not how programming works. Your API has a contract and your code is written to support that contract, if the contract changes it should either be a very consciously decided breaking change that is versioned somehow, or it should be an unversioned non breaking change. Apparently whatever data is added like this is completely meaningless to your program so why do you need to be in charge of passing it back to the API.

Changing your API and assuming everything just keeps working is a nonsense cowboy attitude to software compatibility, even if some frameworks bend over backwards to support it through magic that's hidden from the developer. Furthermore, many programming languages are simply incapable of doing this, and this approach to APIs is immediately restricting those languages from use.

Finally, transforming objects to an internal domain model is really the cornerstone of a lot of recent well-thought-out programming discipline, and this API design is throwing that in the garbage. It's explicitly asking you to mess up your service architecture, spreading bad architecture like a virus to all systems that interact with the API.

Garlef · 2024-10-09T07:03:40 1728457420

I don't think dicts themselves are the problem.

In typescript using plain JS objects is very straightforward. Of course you have to validate the schema at your system boundaries. But you'll have to do this either way.

So: If this works very well in TS it can't be dicts themselves but must be the way they integrate into- and are handled in python.

This leads me to the conclusion that arguments presented in the article might be the wrong ones.

(But I still think, the conclusion the article arrives at is okay. But I don't think there's a strong case being made in the article about wether to prefer data classes or typed dicts.)

soulchild77 · 2024-10-09T07:52:56 1728460376

This. I think types really make the difference here. You can get very far with just plain old JS objects as long as you've got strong types in place.

hcarvalhoalves · 2024-10-09T01:18:40 1728436720

Debatable. Here's a counter-point:

https://www.youtube.com/watch?v=aSEQfqNYNAc

But ok, it's less bad in Python since objects are dicts anyway and you don't need getters.

fhdsgbbcaA · 2024-10-09T01:32:37 1728437557

Seems like the issue is less using dicts than not treating external APIs as input that needs to be sanitized.

physicsguy · 2024-10-09T09:58:17 1728467897

The code in the examples doesn't even check the API response code, let alone the structure of the response.

pmarreck · 2024-10-09T01:34:02 1728437642

Agreed. If you sanitize/allowlist API data you should not have issues with dicts.

imron · 2024-10-09T01:43:52 1728438232

You'll have issues if you ever rename things in the dict.

Linting tools will pick up on every instance where you forgot to rename the fields of a class, but won't do the same for dicts.

FreakLegion · 2024-10-09T02:00:24 1728439224

TypedDicts solve the linting problem, but refactoring tools haven't caught up (unlike e.g. ForwardRef type annotations, which are strings but can be transformed alongside type literals).

tomjakubowski · 2024-10-09T02:46:23 1728441983

Is there any advantage to using a TypedDict for a record over a dataclass?

FreakLegion · 2024-10-09T04:22:58 1728447778

TypedDicts "aren't real" in the sense that they're a compile-time feature, so you're getting typing without any deserialization cost beyond the original JSON. Dataclasses and Pydantic models are slow to construct, so that's not nothing.

This of course means TypeDicts don't give you run-time validation. For that, and for full-blown custom types in general, I tend to favor msgspec Structs: https://jcristharif.com/msgspec/benchmarks.html#json-seriali....

orf · 2024-10-09T06:06:57 1728454017

> Dataclasses and Pydantic models are slow to construct

Citation needed? Pydantic is really quite fast, and you can pass raw JSON responses into it.

It may be slower (depending on the validators or structure), but I’d expect it to be comparably fast to the stdlib JSON module.

newaccountman2 · 2024-10-10T00:17:40 1728519460

> Citation needed? Pydantic is really quite fast

Pydantic v1 was slow enough for them to write a lot of the core logic in Rust for Pydantic v2, and for the previous sloth to have been an argument people launched against it if you look back at threads on here and Reddit comparing it to other libraries.

FreakLegion · 2024-10-09T07:34:03 1728459243

Pydantic's JSON parsing is faster than the built-in module, on par with orjson, but creating model instances and run-time type checking net out to be much slower. I linked msgspec's benchmarks in the previous post.

cschneid · 2024-10-09T01:08:26 1728436106

I generally support this. When dealing with API endpoints especially I like to wrap them in a class that ends up being. I also like having nested data structures as their own class sometimes too. Depends on complexity & need of course.

    class GetThingResult
      def initialize(json)
        @json = json
      end
    
      # single thing
      def thing_id
        @json.dig('wrapper', 'metadata', 'id')
      end
    
      # multiple things
      def history
        @json['history'].map { |h| ThingHistory.new(h) }
      end
      ... two dozen more things
    end

Attummm · 2024-10-09T09:41:34 1728466894

Python has made its rise as an antithesis to Java thinking. Classes used to be seen by some in the community as an anti-pattern. [0] The coding style used to focus on "Pythonic-ness," which meant using Python's expressiveness to write code in such a way that type information could be inferred without explicitly stating the type.

Most developers will carry their previous language paradigms into their new ones. But if types, DDD (Domain-Driven Design), and classes are what you're looking for, then Python isn't the best fit. Python doesn't have compiler features that work well with those paradigms, such as dead code removal/tree shaking. However, starting out with dictionaries and then moving over to dataclasses is a great strategy.[1] As a small note, it's kind of ironic that the statically typed language Go took inferred typing with their := operator, while there is now a movement in Python to write foo: str = "bar".

[0] https://youtu.be/o9pEzgHorH0?si=pv0QQyM-iBrHuXUN

[1] https://docs.python.org/3/library/dataclasses.html

CraigJPerry · 2024-10-09T07:08:13 1728457693

This has merit in some cases but let me try to make a counterpoint.

You lose the algebra of dict’s - and it’s a rich algebra to lose since in python it’s not just all the basic obvious stuff but it’s also powerful things like dict comprehensions and ordering guarantees (3.7+ only).

You tightly couple to a definition - in the simple GitHubRepository example this is unlikely to be problematic. In the real world, coupling like this[1] to objects trying to capture domain data with dynamic structures is regularly the stuff of nightmares.

The over-arching problem with the approach given is that it puts code above data. You take what could be a schema, inert data about inert data, and instead use code. But it might also be an interesting case to consider as a slippery slope - if you can put code concerns above data concerns then maybe soon you will see cases where code concerns rank higher than the users of your software?

[1] - by coupling like this I mean the “parse don’t validate” school of thought which says as soon as you get a blob of data from an external source, be it a file, a database or in this case a remote service, you immediately tie yourself to a rocket ship whose journey can see you explosively grow the number of types to accurately capture the information needed for every use case of the data. You could move this parsing operation to be local to the use case of the data (much better) rather than have it here at the entry point of the data to the system but often times (although not always) we can arrive at a simpler solution if we are clever enough to express it in a style that can easily be understood by a newbie to programming. That often means relying on the common algebra of core types rather than introducing your own types.

zmgsabst · 2024-10-09T07:13:31 1728458011

You also make a nightmare of dynamically adding middleware — which can piggyback on a generic dict and have no meaningful way to insert themselves into your type maze.

cranium · 2024-10-09T05:37:51 1728452271

Python dataclasses are a good start for internal use. They are just a bit of a pain to serialize/deserialize natively. When it comes to that, I prefer to use Pydantic objects and have all the goodies, at the cost of some complexity.

xenoxcs · 2024-10-09T05:04:12 1728450252

I'm a big fan of using Protobuf for the third-party API validation task. After some slightly finniky initial schema definition (helped by things like json-to-proto.github.io), I can be sure the data I'm consuming from an external API is strongly typed, and the functions included in Protobuf which convert JSON to a Proto message instance blows up by default if there's an unexpected field in the API data it's consuming.

I use it to parse and validate incoming webhook data in my Python AWS Lambda functions, then re-use the protobuf types when I later ship the webhook data to our Flutter-based frontend. Adding extensions to the protobuf fields gives me a nice, structured way to add flags and metadata to different fields in the webhook message. For example, I can add table & column names to the protobuf message fields, and have them automatically be populated from the DB with some simple helper functions. Avoids me needing to write many lines of code that look like:

MyProtoClass.field1 = DB.table.column1.val

MyProtoClass.field2 = DB.table.column2.val

wruza · 2024-10-09T14:35:13 1728484513

Knew it was python before the first line of code. Python lacks ceremony-free data syntax, that’s why people use dicts. Dataclasses have to be named, initialized and imported, which is tedious. Much easier to just foo({name, age}) and let typings match, but python doesn’t have that. Lack of “POPO” is a design mistake.

pmarreck · 2024-10-09T01:32:58 1728437578

Less important in Elixir (where they are "maps") due to the immutable nature of them as well as the Struct type which is a structured map.

mikhmha · 2024-10-09T03:06:41 1728443201

Yup! I find Elixir makes it really intuitive to know when to represent a collection as a map and when to use a list of tuples. And its easy to transform between the two when needed.

nesarkvechnep · 2024-10-09T05:05:30 1728450330

Yes, usually my APIs in Elixir receive their arguments as a well-typed map, not stringly keyed, and transform them to structs which the core business logic expects.

Barrin92 · 2024-10-09T02:52:08 1728442328

It's a bit of an odd article because the second part kind of shows why dicts aren't a problem. You basically just need to apply the most old school of OO doctrines: "recipients of messages are responsible for how they interpret them", and that's exactly what the author advocates when he talks about treating dict data akin to data over the wire, which is correct.

If you're programming correctly and take encapsulation seriously, then whatever shape incoming data in a dict has isn't something you should take an issue with, you just need to make sure if what you care about is in it (or not) and handle that within your own context appropriately.

Rich Hickey once gave a talk about something like this talking about maps in Clojure and I think he made the analogy of the DHL truck stopping at your door. You don't care what every package in the truck is, you just care if your package is in there. If some other data changes, which data always does, that's not your concern, you should be decoupled from it. It's just equivalent to how we program networked applications. There are no global semantics or guarantees on the state of data, there can't be because the world isn't in sync or static, there is no global state. There's actually another Hickey-ism along the lines of "program on the inside the same way you program on the outside". Dicts are cool, just make sure that you're always responsible for what you do with one.

alfons_foobar · 2024-10-09T06:23:18 1728454998

I assume you're basically referring to this quote from the article?

"Ignore fields coming from the API if you don’t need them. Keep only those that you use."

IMO this addresses only one part of the problem, namely "sanitize your inputs". But if you follow this, and therefore end up with a dict whose keys are known and always the same, using something "struct-like" (dataclasses, attrs, pydantic, ...) is just SO much more ergonomic :)

scotty79 · 2024-10-09T08:48:58 1728463738

> Ignore fields coming from the API if you don’t need them. Keep only those that you use.

This is great if you know what you need from the start. If you only find out what you need after passing your data through multiple layers and modules of your system then you need to backtrack through all your code to the place of creation.

If you have immutable data structures then you have to backtrack through multiple places where your data is used from previous structures to create new ones to pass your additional data through all that.

So if your data travels through let's say 3 immutable types to reach the place you are working on then even if you know exactly where the new field that you need originates, you need to alter 3 types and 3 places where data is read from one type and crammed into another.

If you have a dict that you fill with all you got from the api there's zero work involved with getting the new piece of information that you thought you didn't need but you actually do. It's just there.

karmakurtisaani · 2024-10-09T03:45:39 1728445539

I've cleaned up code where input parameters came in a dict form. Absolute shit show.

- The only way to figure out which parameters are even possible was to search through the code for the uses of the dict.

- Default values were decided on the spot all over the place (input.getOrDefault(..)).

- Parameter names had to be typed out each time, so better be careful with correct spelling.

- Having a concise overview how the input is handled (sanitized) was practically impossible.

0/10 design decision, would not recommend.

pansa2 · 2024-10-09T06:24:35 1728455075

> convert [dicts] immediately to data structures providing semantics [...] You can simplify your work by employing a library that makes “better classes” for you

Python seems to have many different kinds of "better classes" - the article mentions `dataclass` and `TypedDict`, and AFAIK there are also two different kinds of named tuple (`collections.namedtuple` and `Typing.NamedTuple`).

What are the advantages of these "better classes" over traditional classes? How would you choose which of the four (or more?) kinds to use?

pansa2 · 2024-10-09T06:41:41 1728456101

To me, the proliferation of "better classes" implies there's a problem with Python's built-in classes - but what's wrong? Are they just too flexible and/or too verbose? Or actually deficient in some way?

zmgsabst · 2024-10-09T07:11:17 1728457877

People enjoy the flexibility and many Python systems rely on duck-typing via dicts, etc.

So people are trying to force Python to be something it isn’t in adherence to their ideology — but it fails to gain consensus because there’s a sizable cohort that use Python because it isnt those things.

So we get repeated implementations, from each ideologically motivated group.

QuadrupleA · 2024-10-09T17:26:59 1728494819

Hard disagree on most of this. The immutability dogma for one (changing data is "the worst felony you can commit to your data"). Computing IS manipulation and transformation of data. The contortions people go through to try and sidestep that seem delusional.

Plus all this 1995-era OOP and domain-driven-design crap, "business logic" and data layers and all this other architectural rigidity and usually-needless complexity, layers of boilerplate (and then tools to automate the generation of that), etc.

If your function takes a dict, and is called from many different places, document the dict format in the function comment. Or yes, create a dataclass if it saves more trouble than its additional boilerplate and code and maintenance causes. But take it case by case and aim for simplicity. Most of the time I call out to an API in python, I process its JSON/dict response right after the call, using maybe 10% of the data returned. That's so much cleaner and simpler than writing a whole Data Object Layer, to be used by my API Interface Layer, to talk to my Business Logic layer, etc.

newaccountman2 · 2024-10-09T18:01:14 1728496874

I work with people who are ambivalent about this and believe using random dicts in a variety of places is a valid way to write Python code.

For these kinds of people, no amount of rational evidence or argument is going to convince them this is bad. They practically make an identity out of eschewing anything that seems too orderly or too designed.

(Luckily, at work, most of us on our team like `Pydantic` and also (some of us more than others) type-checking, so these people are dragged along)

est · 2024-10-09T03:03:44 1728443024

dicts are OK, because at least they do have a `key` and it does mean something.

un-annotated tuples and too many func params are cancer.

stonethrowaway · 2024-10-09T03:08:02 1728443282

No no,

Un-annotated tuples and too many func params are OK, because at least they are pushed and popped from the stack.

Calls and rets without a prologue and epilogue on the other hand…

est · 2024-10-09T03:18:03 1728443883

> from the stack

Or many, many stacks you can't comprehend nor amend.

I dare to add a new `key` to a dict, can you modify a func call or a tuple with confidence?

ramraj07 · 2024-10-09T03:12:32 1728443552

Who does this still??

directevolve · 2024-10-09T07:03:10 1728457390

In bioinformatics, one of our main dataflow platforms, Nextflow, is built with unnamed tuples in mind. Implementing the ability to conveniently pass data with HashMaps instead of unnamed tuples was a huge boost to usability for me.

dijksterhuis · 2024-10-09T11:22:00 1728472920

i really want to go on a rant about the general state and historical choices regarding data formats and data structures in bioinformatics, plus all the wheel reinvention.

but i’m also trying to move on and do things differently today.

let’s just say the situation is displeasing and leave it at that.

secretsatan · 2024-10-09T12:29:14 1728476954

I largely moved away from dictionaries when switching to swift. Usually we only use them now when going from legacy code. For the example in the article with JSON, Swift has the Codable protocol, which cleaned all my client code for our back end from the old NSJSONSerialization code.

greatgib · 2024-10-10T12:38:11 1728563891

I don't agree with the predicate, but I have to admit that the rest of the article is well written to list the different ways to give types to dicts when it is needed.

jimberlage · 2024-10-09T18:45:19 1728499519

If you want an example of a language where the exact opposite advice is taken at all times (with all the pitfalls described in this blog post), give Clojure a whirl.

thebeardisred · 2024-10-09T01:38:44 1728437924

FYI, posted in 2020, updated in 2021.

leoh · 2024-10-09T03:26:06 1728444366

Big structs as params in rust have similar issues

saintfire · 2024-10-09T03:50:16 1728445816

In what way? They're not opaque or mutable (by default).

They can be unwieldy but they do define a pretty strongly typed API.

klyrs · 2024-10-09T01:34:02 1728437642

Lists and sets suffer the same drawbacks. If the advice is to not use any of the batteries included if the language, why are we using Python?

If you want an immutable mapping, why not use an enum?

o11c · 2024-10-09T01:51:25 1728438685

This isn't arguing against them in general, but against the unfortunate Javascript-esque abandonment of specified semantics.

In particular, whenever anyone thinks that "deep clone vs shallow clone" is a meaningful distinction, that means their types are utterly void of meaning.

gotoeleven · 2024-10-09T04:28:31 1728448111

Personally I find it is often helpful to keep Dicts in a BigBag ie:

BigBag<Dict>

likeclockwork · 2024-10-10T03:50:01 1728532201

That's good eating.