Arguments against JSON-driven development

wtbob · on Aug 25, 2016

> The fundamental advice on Unicode is decode and encode on system boundaries. That is, you should never be working on non-unicode strings within your business logic. The same should apply to JSON. Decode it into business logic objects on entry into system, rejecting invalid data. Instead of relying on key errors and membership lookups, leave the orthogonal business of type validity to object instantiation.

This right here is the correct approach. Serialisation formats should be serialisation formats, whether they be JSON, S-expressions, protobufs, XML, Thrift or what-have-you; application data should be application data. There are cases where it makes sense to operate on the serialised data directly, for performance or because it makes sense in context, but in the general case operate on typed application values.

qwertyuiop924 · on Aug 25, 2016

Part of the problem is that JSON and Sexprs aren't that they AREN'T serialization formats. They've been pressed into service as such, but they are actually notation for datastructures: In python, it may not be idiomatic to crawl dicts like this, but in JS, those aren't dicts, they're objects. If they've been de-serialized to some degree, they may even have their own methods.

By the same token, in Lisp, Sexprs aren't a serialization format. They're a notation for the linked cons cells that Lisp data is made of. In Lisp, that Sexpr will be crawled for data, or maybe even executed.

So while in Python, both may seem to be serialization formats, they aren't.

Either way, if the application programmer has any sense, they'll abstract away the format of their data. In a lisp app, you won't be cdring down a sexpr, you'll be calling a function to grab the necessary data for you, usually from a set of functions that abstract away the underlying sexpr implementation, and treat whatever it is as a separate datatype.

Of course, the sexpr might have been fed to an object constructor. Heck, it might be an object constructor, or a struct constructor. All of those types typically provide O(1) access, and autogenerated access functions, so it's the same story.

kazinator · on Aug 25, 2016

A notation for a data structure is a serialization format.

> In Lisp, that Sexpr will be crawled for data, or maybe even executed.

A s-expression cannot be executed; it's just text.

The object which it denotes can be walked or executed via eval.

Before that happens, the s-expression must be converted to that object.

In other words, deserialized by the reader.

qwertyuiop924 · on Aug 25, 2016

Good point kaz. I meant that the notation signifies a specific set of general data structures, unlike XML, which doesn't specify the data-structure used in memory afaik, only the actual tree structure of the data itself.

ohyes · on Aug 25, 2016

In that s-expressions are a notation for computation, they can be executed by an interpreter. This is what we refer to as execution. Even assembly is like this, there's no other reasonable way for it to work right now.

This feels like you're hairsplitting to no obvious benefit other than increasing confusion. I could as easily say "mathematical notation isn't math it's just text, you can't evaluate '1 + 2' without a human being because otherwise those are just marks on the page otherwise." This is true(ish) (with the correct escaping), but it's difficult for me to see how it's relevant to the discussion? We could imagine the situation where I've created the Texpr, which has slightly different notation but the same properties. I don't know that we would necessarily classify it differently or treat it differently.

This leads to the alternate conclusion that maybe the s-expression is the underlying set of objects in the interpreter/compiler. (S-expressions are a special type of linked list, in that case). This rings true(er) to me, because of the way that we talk about s-expression manipulation in lisps. We most certainly are not using string operations to generate them. In which case the thing with the many parenthesis is merely standard lisp syntax, not s-expressions. This is further justified by the existence of different syntax in Dylan or Clojure, and availability of reader macro manipulation as its own entity.

Tldr; it most certainly is not text! Nor can it be executed.

kazinator · on Aug 25, 2016

> mathematical notation isn't math it's just text

That is correct. It's just text which talks about concepts that don't have text, like transcendental numbers, infinities, infinitesimals and so on.

There are functions in mathematics that can't be written down in symbols at all, like the integrals of certain functions (which themselves can be written down).

Math text has some useful properties in that certain transformations you can think of as typographical (manipulations of the text) actually preserve semantic properties in a useful way. So for instance addition commutes, semantically; and in the text, this lets us swap the left piece of text for the right one, around the plus sign.

There can be a very close correspondence between typography and semantics (like in Douglas Hofstadter's "TNT": typographical number theory, which he uses to explain Gödel's incompleteness theorem).

> This leads to the alternate conclusion that maybe the s-expression is the underlying set of objects in the interpreter/compiler.

I assure you that it isn't; not in any main-stream Lisp interpreter or compiler.

(Where by "main-stream Lisp interpreter or compiler", I intend to rule out cute hacks like this:

https://news.ycombinator.com/item?id=7956246 )

qwertyuiop924 · on Aug 25, 2016

No, kaz has a point. do not confuse the shadow for that which cast it: Sexprs are a serialization format/notation for sets of conses. Most of the time, saying so is splitting hairs, but it bears mentioning here, as we're discussing serialization formats.

jerf · on Aug 25, 2016

JSON is a serialization format that was based on the data structure notation for Javascript. It is, however, a serialization format. Javascript objects are a superset of JSON, as they can contain arbitrary objects and functions, which JSON can not, and "true" Javascript object notation can elide quote marks or use apostrophes for keys, whereas JSON strictly specifies double-quotes around keys.

The problem that arises in Python and other dynamically-typed languages is that there exists a default deserialization that is so good it very strongly tempts the programmer to use that exclusively. However, as good as the default serialization may be, it's also quite dangerous, for the reasons you mention and more. In strongly-typed languages there's a stronger focus on parsing the JSON instead, which has the advantage of producing objects with stronger guarantees (which isn't quite the same as pointing out there's stronger typing here, you could theoretically get the same guarantees in Python with some sort of JSON schema library or something), but has the disadvantage of generally being more challenging, since it's hard to beat the conciseness of "json.loads(s)" in Python. There generally is a default serialization in strongly-typed languages, but it's far more likely to become inconvenient if you need anything beyond simple numbers and strings, and people generally learn to prefer true deserialization in my experience, unless they, alas, start their program out from day 1 inputting and outputting JSON and accidentally structure their entire program around the default JSON structures and end up with the exact same problems as you'd get in Python. But as long as JSON isn't the very first thing to go in you're generally in better shape.

(I have witnessed a Java program primarily written as a map of string to map of string to map of string to string. It was unsalvagable. And for all the cynicism I may occasionally muster, I don't say that often, because refactoring can be pretty powerful in Java, but this was beyond help. It actually had no JSON in site, but the same fundamental forces were in play.)

Personally, despite preferring the more strongly typed approach in most ways, I must confess that when I'm working in Perl I am generally unable to resist the temptation to just JSON::XS::decode_json, and cover over the differences with unit testing rather than dealing with "true" deserialization. I make myself feel better by also telling myself that if I do anything else, I will confuse my fellow programmers who don't generally expect to see fancy deserialization routines when dealing with JSON, which is true enough, but in my heart I still know guilt.

czardoz · on Aug 25, 2016

> In strongly-typed languages there's a stronger focus on parsing the JSON instead

I think you mean _statically_ typed languages here. Python is a strongly typed language.

jerf · on Aug 25, 2016

The definition of "strongly-typed" that Python conforms to is a useless one, because very few weakly-typed languages exist anymore, and even those are only very, very partially "weakly typed" by having operators that are defined to do automatic coercion, generally only between strings and numbers, which isn't even the way the original "weakly typed" was meant. The only truly "weakly typed" language I know of that is still extant is assembler/machine language, which can never really go away, where all you ever have are numbers, and thus absolutely nothing stops you from adding a string pointer to the first element of some structure. (Modulo some distinctions that still may exist between floats and integers and such... even assembler isn't as weak as it used to be, though it is still by no means a strongly-typed language.)

So I don't use the useless definition. By any useful definition of strong vs. weak typing, that is, one that actually creates two or more non-empty, non-trivial sets of members in the universe of discourse, Python is a weakly-typed language.

_skel · on Aug 25, 2016

You can use whatever definition you want in your own head, but you cannot expect anyone else to accept it.

Furthermore, one of the most popular languages in use today is weakly typed: JavaScript!

jholman · on Aug 25, 2016

Here's a plausible definition: Strongly typed languages disallow programs to escape the type system (e.g. put an integer into into memory, then treat it as a float, or vice versa, as in the famous Quake fast-inverse-sqrt hack.). Oh, look, Python is strongly-typed and C is weakly-typed.

Note that I do not endorse using "strongly-typed" to mean this definition, or any other. There are no useful definitions of this phrase, don't use it, except when correcting people.

Goladus · on Aug 25, 2016

I guess you need a term for "typing of moderate strength"

qwertyuiop924 · on Aug 25, 2016

I'm not sure I prefer the strongly typed approach: I come from Lisp, so the approach is: "read it, validate it, and then wrap it in functions to hide the implementation in case we change it."

This works well in Lisp, where the line where objects end, and structs, lists and functions begin is hazy at best. Besides with a bit of wrangling, you could probably just pass your validated serialized data to the object constructor as the arguments. Or you could just write a struct, which is simpler than an object, provides O(1) access, and you can still probably easily pass your datastructure, or something close, into the constructor as the arglist.

parenthephobia · on Aug 25, 2016

What are serialization formats, then? What makes them different from notations for data structures?

mattnewton · on Aug 25, 2016

Not gp, but one difference is you can have a notation that can't capture some state: think of the date class in JavaScript. JSON can't serialize this without resorting to string encoding, whereas something like a protobuf or pickle could.

parenthephobia · on Aug 25, 2016

What's really being described there, I think, is the notion that you can serialize and unserialize some data and get, to some extent, the "same" data back.

Whilst that's more possible with protocol buffers or pickling (or whatever your language calls it), I can't think of any languages offhand which can round-trip any data. It's generally not possible to serialize objects denoting external resources - such as open file handles or network sockets. It's also often not possible to serialize closures, weak references (without dereferencing them), and not necessarily possible to serialize self-referencing objects: e.g. a list which contains itself - pickle can handle it, but I don't believe protocol buffers can do it.

qwertyuiop924 · on Aug 25, 2016

Some extended sexpr notations (particularly those used in Common Lisp, although many Schemes also support it, as do other lisps) support self-referential data structures.

fds and sockets CAN, IIRC, to some degree, be sent to other processes on the same machine, but it's fairly limited.

As for serializing closures, CHICKEN Scheme's s11n egg is the most prominent, although not the only, example. It's fairly limited, once again, to avoid sending the forest with the bannana, as Joe Armstrong would put it.

This has nothing to do with our discussion, I just thought it was cool.

steveklabnik · on Aug 25, 2016

That means JSON is missing support for some kinds of data, that doesn't mean that it's not a serialization format at all. Just one with less descriptive power.

mattnewton · on Aug 25, 2016

That's fair, I guess it's not really a different class but a different shade of the same thing.

qwertyuiop924 · on Aug 25, 2016

First off, make no mistake, JSON and sexprs ARE serialization formats. I didn't establish this well in my orignal comment. However, they're designed be a notation for specific structures: dicts and arrays in json's case, and cons cells, specifically Lisp code in the case of sexprs. This is unlike say, XML, which defines a tree hierarchy, but NOT what the underlying structures are. That's the parser's job.

recursive · on Aug 25, 2016

From the official source: "JSON (JavaScript Object Notation) is a lightweight data-interchange format. "

http://json.org/

It's nothing about pressing into service. This is the authoritative source.

dspillett · on Aug 25, 2016

From the official source: "Democratic People's Republic of Korea"

http://www.korea-dpr.com/

It's democratic, and for the people. It says nothing about being a totalitarian dictatorship.

Not everything is/does what it says on the tin. You don't have to agree with official or otherwise authoritative sources without question.

(not that I wish in any way to compare Mr Crockford or whoever runs json.org with DPRK or its leadership - I'm just using a deliberatly extreme example to highlight that what is written may not be what is, at least not from absolutely everyone's point of view)

oldmanjay · on Aug 25, 2016

I hope you feel clever, because in context, this is far too inapplicable to the discussed reality of JSON to be an actual point.

dspillett · on Aug 26, 2016

> I hope you feel clever

I generally do, thanks, though I'm not any sort of genius by any measure.

> because in context, this is far too inapplicable to the discussed reality of JSON to be an actual point.

You seem to be missing a bit of an intentional context switch. The comment was more about the logic of the GP's response to "<something> isn't really X" which was "yes <something> is X, it says so on <something>'s home page", than it was about <something> or X in particular.

So it was relevant in the context of the discussion and the facts being used for reference but not, as you call out, in the context of the subject of the discussion (hence my somewhat defensive clarification of intent in the last sentence)

betenoire · on Aug 25, 2016

"lightweight data interchange format" is not the same as "serialization formats".

For one, serialization usually handles binary data as well. And types, lots of types. Serialization includes class definitions, etc, which are used to create a (in this case) python object. JSON is literally javascript object notation, and has nothing to do with pickling python.

kazinator · on Aug 25, 2016

Yes it is.

Serialization doesn't have to handle a particular type system in its entirety to be serialization.

A serialization scheme can dictate its own type system; which can be smaller than that of the programming languages which support that serialization scheme.

JSON has a simple type system; it serializes that system.

(Might you be confusing serialization for other concepts like object store databases, or image saving?)

qwertyuiop924 · on Aug 25, 2016

That's the source that pressed it into service: JSON is exactly what it says it is: Javascript Object Notation. Specifically, it's a subset of javascript's, well, object notation.

The point is, the syntax behind JSON was originally designed for a specific language, as a textual representation of that language's objects. It just happened to make a convenient serialization format.

scarmig · on Aug 26, 2016

Just to be technically correct, because that's the best way to be correct...

JSON isn't a strict subset of JS. JSON strings can contain literal line terminators. JS strings cannot.

Cthulhu_ · on Aug 25, 2016

When using parsers like e.g. Jackson or Gson for Java, this process is completely transparent and does not require any active thought from the developer - well, maybe if there's very specific formats that don't map 1:1 with the class that should be instantiated or generated from the json object.

It's a bit more tricky in JS, both client-side and node. You can't work with the json string there, but after that you work directly with the json object. They're not OOP languages, really. I wouldn't want to work with too much untyped / unstructured json in back-end land myself to be fair.

adrianratnapala · on Aug 25, 2016

I've never had good experiences with automated serialisation -- even though it sounds like other people do it with success. What's the secret?

To give you a flavour of the kind of poblem: In C# (or rather .net) json.net reads JSON and calls setters from the target class.

That means the setters have to be public, and you don't know what order they will be called in, and you have no real signal about when it is all done. The constructor is no longer enough to guarantee the object's invariants are met.

Most awkward.

curveship · on Aug 25, 2016

It sounds like you're parsing the JSON straight into your business objects, which is the source of the problem. You need an intermediate class which represents a strongly-typed version of the JSON message. So JSON.net goes from a JSON string into this message object, then you write your own code (or, if it works for you, use a tool like automapper), to go from that into your business classes.

adrianratnapala · on Aug 25, 2016

This is what I settled on -- at least in the hard cases. And if I understand his acronyms, it's also what @mythz is recommending.

Perhaps I should have done it for the easy cases as well (where the business objects are struct-like enough that it doesn't matter) and just lived with the boilerplate.

But I see little advantage in this over just having a dictionary that I can inspect to initialise my real business object. True that is not strongly-typed, but the stage between the message-object and the business object can have validation errors anyway, so why not treat typechecking as part of that?

matwood · on Aug 25, 2016

In Java land with Jackson/Gson they can use the getters/setters or reflection and find the private fields. The only time it is not completely automatic is when some json object is mixed cased myField1 and my_field1. Even then, just adding an annotation fixes it. For any special formats, for example iso8601 dates, you can quickly define a serializer/deserializer and be done.

Is it really that hard in c#? It is not something I ever think about in Java.

sbov · on Aug 25, 2016

Even beyond that, Jackson can use a private constructor if you use the @JsonCreator annotation on the constructor and @JsonProperty annotations on each parameter.

mythz · on Aug 25, 2016

That's because you should be serializing purpose-specific DTOs or clean POCOs not Business Objects with behavior.

eropple · on Aug 25, 2016

JSON.NET can use private setters. They just have to exist.

You do have to use thin constructors, but in JSON.NET there's a way to call a method post-deserialization.

lukasLansky · on Aug 25, 2016

Yes, the automatic serialization is not a solution for the most pressing problems presented by the article -- it's just the first thing from all the things that has to be done at the boundary.

You have some DTO class that is your system typed idea about the structure of the JSON -- this class is quite useful as an implicit documentation, but it really has to stay internal to the boundary. You will use an autodeserializer to such class and then you will continue by constructing real object from deserialized data that can be presented to the rest of the application. During such construction you can validate state and return errros.

This step can be eased by some validating attributes on the boundary DTO properties, but there is always some custom logic that describes what is acceptable and what is not.

PaulHoule · on Aug 25, 2016

Automated serialization has gotten much better than it was in the bad old days of RPC and COM!

mack73 · on Aug 25, 2016

I have nothing good to say about COM, but I'm seriously thinking about gRPC [0] to get away from the sloppy json endpoints we code around today, at work. Before I dive in I would love to hear, what it is that makes that architecture a bad one.

[0] http://www.grpc.io/

solatic · on Aug 25, 2016

Automated serialization is the devil. Gson and Jackson require you to write EJB-style objects to get automatic serialization - default constructors, with getters and setters for each field - to achieve automatic serialization.

The problem with this approach is that you've completely abdicated the power of the type system to ensure that your objects are valid. What happens if a field is missing from the JSON? Well, that field just becomes null. So now you have one of two options:

1) Write highly defensive code with null-checks everywhere. This is a pain to write, a pain to read, and almost impossible to get right and actually prevent null pointer exceptions. This is a nightmare. Switching to a null-safe language like Kotlin doesn't really help you beyond making sure that you actually code in all the null checks - the code is still ugly and a pain to maintain.

2) Call a (potentially) expensive verification method at the beginning of each method call for your object. This is less error prone than having null checks everywhere, but it's not much of an improvement. Because verification happens not at object creation time but rather when it's used, you'll find yourself with a verification exception at the entrance to some business logic where the JSON was passed to your system a week ago, immediately stored in a schema-less ORM, retrieved now, so you kind of have an idea that you have some client which didn't populate the field, but you have no idea which of the many, myriad versions of the client is responsible. So now you're fucked, and you're doubly fucked if you're losing data because of it.

Or you could just take advantage of type safety and write immutable object factories which refuse to instantiate invalid objects. Then you can write clean code using objects which you know must be valid because of type system guarantees. Libraries like immutables.github.io make this a piece of cake.

galdosdi · on Aug 26, 2016

> Gson and Jackson require you to write EJB-style objects to get automatic serialization

Not the case. I successfully used Jackson combined with Lombok to achieve some really nice DRY class definitions that Just Worked with Jackson. It took a little figuring out and a couple bugfixes to Jackson but it worked. That said, part of the hassle was that I insisted on being able to do this with @Wither so we could have the objects be immutable too.

Then you can write stuff roughly like

@Value public final class Thing { String name; int age; boolean boiling; }

Although IIRC I had to use some other random set of lombok annotations instead of @Value to get it to work right with Jackson (this was a while ago, don't recall details)

> The problem with this approach is that you've completely abdicated the power of the type system to ensure that your objects are valid.

Yeah, this was a big problem with my approach. The other choice would have been to use @Builders instead of @Withers. Then you get a little more boilerplate (having to type .build()), but you can guarantee the built objects meet consistency requirements. (In retrospect, I doubt I chose the right tradeoff there)

kodablah · on Aug 25, 2016

Automatic serialization is not the devil, overly forgiving automatic serialization is the devil. I use JSON serialization libs all the time in Scala which properly support optional vs required fields. For Java devs, Gson has bad required/optional support [0] but Jackson does have it for creator properties [1]. It is important to qualify statements like your initial one to include the specific situation which it is bad instead of using a broad brush.

0 - https://github.com/google/gson/issues/61 1 - http://static.javadoc.io/com.fasterxml.jackson.core/jackson-...

scarface74 · on Aug 25, 2016

It's a language/framework issue. C# supports first class properties with get/set semantics. In your method (actions) in the controller you would write something like this:

public List<CustomerModel> Get(SearchRequest request) { ..... return customers; }

Somewhere else in the (configurable) pipeline the framework can decide how to deserialize the SearchRequest and how to serialize the List<customer> based on the Accept-Header.

(CustomerModel/Request would not be business objects. They would only be used to on the API layer)

As far as validation. You could just put attributes on the properties of the Request like [Required] and they would automatically be validated before your Get method is called. Of course if the types don't match, the framework would send the appropriate error.

vonmoltke · on Aug 25, 2016

> Automated serialization is the devil. Gson and Jackson require you to write EJB-style objects to get automatic serialization - default constructors, with getters and setters for each field - to achieve automatic serialization.

Gson does not require this. The following class will serialize and deserialze fine with Gson:

    class Example {
      private final int foo;
      private final String bar;
    
      private Example(final int foo, final String bar) {
        this.foo = foo;
        this.bar = bar;
      }
    }

More complicated cases will require custom serializers and deserializers, but any class that defines only basic data types (including collections) works just fine.

whateveracct · on Aug 25, 2016

> Automated serialization is the devil. Gson and Jackson require you to write EJB-style objects to get automatic serialization - default constructors, with getters and setters for each field - to achieve automatic serialization.

I've used Gson fine with Scala case classes.

Although now I just use Scala JSON libraries, which do not suffer from the two problems you list at all.

wst_ · on Aug 25, 2016

Jackson allows you to create immutable object without any problems, with final fields initialized in constructor. It also supports numerous annotations that will throw an exception when field is missing, etc. You just have to know your tool and use it properly, that's all.

empath75 · on Aug 26, 2016

if you're just using python for simple scripting and a random failure now and again isn't going to ruin your day, it's fine to just use json.loads, IMO. I've written quite a few scripts where the time it would take to do it 'right' wouldn't be worth the effort.

rattray · on Aug 25, 2016

I think this might be where tools like Flow or TypeScript can become useful, since you can have typed javascript objects.

On the other hand, the fact that there's no runtime typecheck renders static analysis somewhat impotent when it comes to the result of a network call.

VikingCoder · on Aug 25, 2016

...unless your application is doing an in-place edit.

For instance, if your image compression application throws out my EXIF data that it doesn't understand, I'm going to be pissed. (Unless you give me an option to preserve it.)

Goladus · on Aug 25, 2016

This right here is the correct approach. Serialisation formats should be serialisation formats, ... application data should be application data.

True, although the OP seems to be advocating having your app pretty much ignore serialization altogether in favor of object-oriented design. In particular the author objects to use of dictionaries and lists instead of objects.

It is true that if you're designing an application with a json api in mind, you're likely to stick with the data structures that are easiest to serialize.

Personally, I started writing programs that way before json became so common. I did it simply to take full advantage of the native data structures and to avoid prematurely confining myself into an object hierarchy that wasn't a good fit for the problem domain. It also winds up making code more generic and easier to rewrite in a different language if necessary (for example, moving server-side code to client javascript).

spc476 · on Aug 26, 2016

That's the approach I took with my DNS library (https://github.com/spc476/SPCDNS)---extract the DNS packet into a C structure that's easier to deal with (for instance, the A RR structure: https://github.com/spc476/SPCDNS/blob/master/src/dns.h#L270).

sidlls · on Aug 25, 2016

I'm not entirely in agreement.

Use of object-oriented programming paradigms here would merely distribute the logic that is necessary to achieve the desired mapping over multiple points in the code.

The example function presented is only marginally too complicated. I'd split it in two: one to obtain the book list given the same arguments as the example function, and one taking the result as its only argument to build the mapping.

I find myself shying away from rigorous adherence to encapsulation more and more these days. I prefer small functions that operate on data explicitly.

Edit: and I'm a bit confused how the example has anything to do with "JSON-driven development", other than the coincidence that a hash/dictionary is the core data structure being manipulated here. This example function could exist and be (mostly) reasonable had JSON never existed. I'd expect to see an argument that the JSON serialization schemes that abound are problematic, given the title.

milesvp · on Aug 25, 2016

This. I've been programming this way for over a decade. Long before JSON was a thing. I find I rarely need anythng more than a list or a dict for most of the data manipulation I do. Being on the web has only strengthened my tendency for this, since everything ends up being stringly typed anyways. Nearly every function/API I write is: get some data from somewhere (hopefully serialized), manipulate the data, return data (very possibly serialized). Nearly every time I've seen coworkers try to improve things with classes, it complicates the code, and often adds little encapsulation given how much we do is reliant on external data sources.

Every once in a while I think how nice it would be to be able to use typed data and smart setters to avoid much of the bounds checking I have to do, but I find there's never enough code between the boundaries of serialization to make it worth the added complexity that this introduces (also my problem domain involves mostly copy so most things are basically strings, ints, or datetimes anyways).

Roboprog · on Aug 25, 2016

I can't stand the development style that takes takes the associative array values from a web action (POST, JSON parameter...), creates a custom class code, dutifully copies stuff into it (with varying degrees of library support and manual make-work), then immediately passes it somewhere else to copy and immediately throws it away thereafter.

Such as waste of typing and perpetual reading.

The alternative deserves a "top level" comment...

jowiar · on Aug 25, 2016

Lists are lists, and I have no issues there. It's dicts as objects (often nested) where things to get hairy. I often see folks end up relying on internal implementation details of other libraries, or other data sources, and things can subtly break (or explode in a ball of fire).

The more systems I've built, the more I've wanted to have very well-defined seams between "inside" and "outside" -- well-defined interfaces with external APIs, libraries, databases, systems that may be maintained at a different speed, etc. Having an explicit translation/serialization/etc. step forces you to do this. It's not the only way, but pretty much every codebase I've interacted with that doesn't do this gets careless, and it's can get really messy when things get any longer than a "script".

sheepmullet · on Aug 25, 2016

> I often see folks end up relying on internal implementation details of other libraries, or other data sources, and things can subtly break

I'm not sure how you can get around this as a consumer of a service? How do you know what is an internal implementation detail? Why are they exposing implementation details?

As a producer of a service there are lots of techniques. E.g.

- Only expose data and provide a spec for that data.

- Provide "helper" classes in target languages for consumers to use

- Publish an API spec + gaurantees

- Always maintain backwards compatibility

jowiar · on Aug 25, 2016

As a consumer, you're not going to get around it. But what you can do is quarantine it to a specific area within your application. I find dependencies to be a very natural place to divide up an application -- at the very least, to consider "What would have to change if I ripped this thing out entirely?". "What are the methods I would need to implement?", etc, and writing a translation layer implementing that interface.

As a hypothetical, let's say you needed to implement a binary persistence layer in your application. Rather than interacting with S3 in 10 different places, you define a "BlobStore" interface with the methods that you need, code against that interface, then implement an S3BlobStore that handles the calls to Amazon using whatever library necessary.

mattnewton · on Aug 25, 2016

With the added benefit that you can reuse these functions for new data shapes more often than I expected when switching to this style.

sidlls · on Aug 25, 2016

And better testability.

afroisalreadyin · on Aug 26, 2016

I think I failed to represent the scale in the example. That's a heavily modified function from a code base I'm working on, and the inventory, book, cell etc. dictionaries and lists containing them are all over the place, with similar looping logic (e.g. find item with given label) and combinations are all over the place. Adding the objects to the above function would of course complicate it in the sense that it would get longer, but it would improve the actual example considerably. I will try to come up with better sample code that represents my worries better.

Goladus · on Aug 26, 2016

Still, if there's way too much looping logic strewn around the code, that's probably because the code lacks good abstractions for the most routine data access tasks in this code.

If the code used more objects, each data type would be packaged up with its own methods for looping over the data. For code that doesn't wrap every bit of data in a specialized type, you may find that a few small utility abstractions will clean up your code substantially.

The tradeoff is that you must import these functions in every module that needs to use them, since they aren't passed to your code with the object.

mikekchar · on Aug 26, 2016

I'm also confused about this example. I've often seen similar code in C using structs. What's really missing for me is some context about why he wants these data structures and what he's going to do with them. Essentially he's doing a join on 2 tables and you are left with the thought, "Why do you need to do that join?"

I think what he's really trying to get at is that he dislikes the style of programming espoused by one of the child posters: Make everything a dict/hash and write filters than manipulate those dicts/hashes. I think the reason he dislikes it is for exactly the reason I'm confused about his example: you can lose track of why you need the types in the first place.

One thing you often see in Javascript (and I presume Python, although I don't have much experience in that ecosystem) is the idea that types don't matter. You have an object (essentially a hash) and you can transform it any way you want. If it is slightly more convenient to access your data in a different way, then transform, transform, transform.

Now all your functions have different signatures: "No, in this function we use the store inventory, which is exactly the same as a book list, but grouped by store". And then you have 25 different functions all doing slightly different versions of the same thing to keep track of all the weird mutations of types along the way.

Again, this isn't new stuff. We've been writing crappy code like this for decades. One of the nice things about languages like C++ is that it's such a PITA to define arbitrary data structures that you avoid doing it, but you still see variations of that theme even there.

As for OO or not OO, I think it's a red herring. If I have functions: make_foo(bar, baz), print_foo(foo), manipulate_foo(foo), or if I have a class called Foo with a constructor(bar, baz) and 2 methods called print() and manipulate(), it's exactly the same thing. Even if you write the equivalent code functionally, mostly all you are doing is moving the context (bar and baz) out of the heap and putting it on the stack (yeah... I know... lack of mutability is a pretty important bit too ;-) ).

This is almost as long as the original rant, but I'll jam one more thing in. Serialization, I think, has little to do with the problem except that people don't know how to separate their concerns at layer boundaries. The main bad idea that perpetuates is that I should have the same data structure in my database as is in my business logic as is in my UI views as is in my UI wigits as is in my communication protocols. Back in my day, we even thought that it was a good idea to serialize entire objects (with executable code!) from one end to the other, so I guess it's getting slightly better ;-)

To sum up: you can't ignore types even when it is easy to morph types in your language. At your layer boundaries you also need to transform your data from one set of types to the other set of types (and you should never expect that a 1:1 mapping is automatically going to be a good idea). Within your layers you should never mutate your types and you should write functions with clear signatures. OO helps you do this. Non-mutating state is also a really good idea and functional helps you do this.

Cthulhu_ · on Aug 25, 2016

I disagree with the anemic object argument. If an object is just there to store data and no behaviour, then that's fine - don't add behaviour if it doesn't need it. A large portion of back-end services are CRUD and data wrangling operations anyway - as in, convert data format A to data format B (which I guess could be a constructor or factory method if you're comfortable with having the conversion logic in a data class).

tantalor · on Aug 25, 2016

Especially true if your business objects are generated code, e.g., protocol buffers.

Combining business logic with business objects is a mistake. That's a textbook example of tight coupling.

jbattle · on Aug 25, 2016

> Combining business logic with business objects is a mistake. That's a textbook example of tight coupling.

Isn't that a textbook example of object oriented programming? Whether OOP is a mistake in and of itself is another question ...

tantalor · on Aug 25, 2016

Textbook examples of object oriented programming are notoriously bad. Stuff like,

  Dog dog = new Dog("Spot");
  dog.bark();

These are toy examples meant to quickly introduce the mechanics of objects, not teach good software engineering patterns.

mattmanser · on Aug 25, 2016

Where else are the business rules of an object supposed to go? Textbook? It's what you're supposed to do!

And if you say in a bloody "service", which just has the object passed as the first variable into every bloody method, I swear I'll come down there and strangle you with your own keyboard cord.

That's the actual textbook definition of tight coupling.

I'm working on a project which has 5 layers, business objects with no methods, a dto layer, a dal layer, a service layer and finally the website. All incredibly tightly coupled and utterly pointless.

It's an utter nightmare. Every time I touch the code half of it disappears, right now I've still deleted more lines than I've added while adding a ton of functionality.

That idea is so broken because it violates KISS, YAGNI and DRY all in one in a futile attempt to "decouple" things which are obviously coupled because the data is needed to perform the business rules.

There's a malaise in modern programming and it's the dogmatic pursuit of decoupling over simple and clear code.

caconym_ · on Aug 27, 2016

I agree, but the problem is that usually the representation of the data comes before the logic you need around it, which can accumulate over a period of months or years. Depending on the application, depending on the programmer(s), that logic can turn into a real mess since there's no obvious place for it to live. This reduces code reuse, which leads to bugs.

It's not always appropriate, but building some language-idiomatic encapsulation around data from the very start makes it much less likely that the inevitable addition of hundreds or thousands of lines of logic will descend into incomprehensible spaghetti hell. This doesn't have to be OOP; it could just as easily be e.g. a module in a purely functional language.

afroisalreadyin · on Aug 26, 2016

Very good point. I would say that if your object is doing e.g. validation, or if/then/else'ing on field values to normalize them somehow, it's already far from anemic. But the key point is that you should not put data in objects, and then put the business logic, as in the small code sample, into some routine that simply accesses fields. That's the anti-pattern.

clifanatic · on Aug 25, 2016

> If an object is just there to store data and no behavior

Then why do you have it at all?

Bognar · on Aug 25, 2016

To define a valid shape for related data.

wtetzner · on Aug 25, 2016

I suppose because the language doesn't have typed records. You have to simulate them with objects.

sidlls · on Aug 25, 2016

Sometimes many different values are related to each other and it is useful to keep them close together for explicitness (and other reasons).

st3v3r · on Aug 25, 2016

... to store data?

lmm · on Aug 25, 2016

The main reason this happens in Python is that creating actual datatypes is incredibly clunky (by Python standards) because of the tedious "def __init__(self, x): self.x = x". The solution here is to have a very lightweight syntax for more specific types, e.g. Scala's "case class".

I'd also argue for using thrift, protobuf or even WS-* to put a little more strong typing into what goes over the network. Such schemata won't catch everything (they have to have a lowest-common-denominator notion of type) but distributed bugs are the hardest bugs to track down; anything that helps you spot a bad network request earlier is well worth having.

aeruder · on Aug 25, 2016

An article about the "attrs" library was posted here a couple weeks ago. Really highlighted the tedium of Python objects while offering a neat solution.

https://glyph.twistedmatrix.com/2016/08/attrs.html

Regarding protobuf, I'm a bit disappointed with the direction of version 3. Fields can no longer be marked as required - everything is optional; i.e. almost every protobuf needs to be wrapped with some sort of validator to ensure that necessary fields are present. I understand the arguments, but I did enjoy letting protobuf do the bulk of the work making sure fields were present.

tantalor · on Aug 25, 2016

Required fields are bad; don't use them.

You should be very careful about marking fields as required. If at some point you wish to stop writing or sending a required field, it will be problematic to change the field to an optional field – old readers will consider messages without this field to be incomplete and may reject or drop them unintentionally. You should consider writing application-specific custom validation routines for your buffers instead.

https://developers.google.com/protocol-buffers/docs/proto#sp...

lmm · on Aug 25, 2016

They're a tradeoff. Sometimes you really are confident enough that this attribute will be required forever that the saving of not having to write custom validation is worth it.

stavros · on Aug 25, 2016

That's a really interesting article, thank you!

EDIT: I liked it so much that I posted it here:

https://news.ycombinator.com/item?id=12359522

kazagistar · on Aug 25, 2016

How, if at all, does attrs interact with serialization and deserialization?

tantalor · on Aug 25, 2016

Named tuples assign meaning to each position in a tuple and allow for more readable, self-documenting code. They can be used wherever regular tuples are used, and they add the ability to access fields by name instead of position index.

https://docs.python.org/2/library/collections.html#collectio...

sevensor · on Aug 25, 2016

Named tuples are pretty great! I used to think their immutability was a drawback, but I'm starting to come around to the opposite point of view.

joshmarlow · on Aug 25, 2016

I've not yet made use of it, but using the `type` keyword to create new classes quickly looks promising.

https://docs.python.org/3.5/library/functions.html#type

amyjess · on Aug 25, 2016

At one company I worked at, we used Avro to transfer data over the network. It's strongly typed with schemas, and it has both a compact binary form for transfer over the network and a text-based form for storage on disk that looks like JSON except field order matters (the schema and data are stored in separate files).

afroisalreadyin · on Aug 26, 2016

aeruder already posted the awesome glyphobet post on attrs; I agree with everything in there. The Python object protocol is great, but difficult to use for small classes. If you are not doing some kind of schema validation on REST endpoints, you're doing it wrong, I would say. But JSONSchema is also really sucky; write more JSON to validate JSON is not my idea of simplicity. Will have to look at the alternatives at some point.

catnaroek · on Aug 25, 2016

> The main reason this happens in Python is that creating actual datatypes is incredibly clunky

It's not clunky, it's outright impossible. Datatypes are inhabited by compound values (data constructors applied to arguments), but Python simply doesn't have compound values. All it has is object identities, which are primitive and indecomposable values no matter how compound the object is.

Sadly, the same is true in Scala.

mhd · on Aug 25, 2016

This basically repeats the ORM arguments/counter-arguments, but now it's a slightly more complex data structure instead of the DB-row-as-hash/array you get there. "row-driven" in this context often leads to barely wrapped DAO Objects.

On the other hand, sometimes (surprisingly often) a hash is good enough and the effort spent in modeling the database (...) doesn't need to be replicated.

And as with ORMs/SQL generators/DAOs/etc., there's a whole spectrum of solutions and you really have to look at the task to see what's appropriate...

mythz · on Aug 25, 2016

This isn't JSON-driven development, it's just choosing to apply logic over loose-typed data structures instead of named constructs. It's more awkward in Python because it doesn't have sugar syntax to index an object like JavaScript has.

But using clean built-in data structures instead of named types has its benefits especially if you need to serialize for persistence of communication as it doesn't require any additional knowledge of Types in order to access serialized data, so you can happily consume data structures in separate processes without the additional dependency of an external type system that's coupled and needs to be carried along with your data.

This is why Redux uses vanilla data structures in its store or why JSON has become popular for data interchange, any valid JSON can be converted into a JavaScript object with just `JSON.parse()` which saves a tonne of ceremony and manual effort then the old school way of having to extract data from data formats with poor programatic fit like an XML document into concrete types.

If your data objects don't need to be serialized or accessed outside of the process boundary than there's little benefit to using loose-typed data structures, in which case my preference would be using classes in a static type system to benefit from the static analysis feedback of using Types.

Nullabillity · on Aug 25, 2016

> as it doesn't require any additional knowledge of Types in order to access serialized data

You still need to know the shape of the data you're working with, or you won't get anything useful done. So you can't skip defining types or a format, you're just skipping the tools that help you follow said format.

mythz · on Aug 25, 2016

You only need to know how to access the data you need, not the entire class structure that's coupled to the monolith that created it.

codedokode · on Aug 25, 2016

Maybe they use untyped hashes and arrays just because there is no other data structures in JS?

mcms · on Aug 25, 2016

Anemic objects and whether they are harmful or harmless has been debated in software engineering for long.

I find over-relying on encapsulation more harmful than useful nowadays specially if you are going to write scalable software that are inherently distributed. For example, hiding accessing a database behind a simple getter function makes another programmer ignore performance implication and other issues that may arise.

qwertyuiop924 · on Aug 25, 2016

Yes, but OTOH, it lessens the likelyhood of errors, and means you'll have to rewrite minimum amounts of code when you, say, switch from MySQL to Postgres.

Abstraction always lessens awareness of that which is abstracted. Decide where to draw the line for your app.

Millennium · on Aug 25, 2016

It sounds to me like these arguments aren't so much against JSON, per se. They're against using JSON.parse() (or json.loads() in Python, json_decode() in PHP, or whatever) as your entire data-import process.

Instead, the argument goes, one should load the JSON, walk the redulting structure, and use it to build your native data structure/objects/whatever. Similarly, when the time comes to save, you crawl through your native structure to build a dict/array/primitive structure, then call JSON.stringify() (or the analogous function) to serialize that.

Uncoupling your data structure from the serialization format, though, is really just basic good software design anyway, is it not? Does anyone argue in favor what this article calls "JSON-driven development" as a design principle? Or is it just a shortcut that developers -and I am no less guilty of this than anyone else- sometimes take in the interest of getting a quick-and-dirty solution out the door?

Yes, working directly on the output of JSON.parse() is a code smell. But I'm not sure that claiming there's a rising trend of "JSON-driven development" is entirely founded. It's just people taking shortcuts.

ramblenode · on Aug 25, 2016

This. "${PRACTICE}-driven development" suggests a practice that someone actively pursues because of perceived merit rather than a shortcut taken because of time/resource constraints.

micimize · on Aug 25, 2016

While I see the point the Ulaş is getting at, I wouldn't call this JSON-driven development. I think JSON-driven development would use abstraction layers that are based on JSON, like JSON schema, and perhaps an OOP library that leverages it.

What I'd actually call this problem is a lack of abstraction. In functional programming, simple data structures are often preferred, and composable functions are used to manage complexity. A functional programmer might declare a function `to_structured_dict(enumerable, path)` and call it with `to_structured_dict(book_list, path=('shop_label', 'cell_label, 'book_id', count'))`

qwertyuiop924 · on Aug 25, 2016

And then EVERY programmer would further abstraact that, as you really want the bare minimum of your code depending on your datastructure internals.

clifanatic · on Aug 25, 2016

Yeah, I was seeing code like that in Java long before the term "JSON" was coined.

falcolas · on Aug 25, 2016

If you're in Python, and are afraid of "anemic" objects, I would recommend checking out collections.namedtuple. It's a fantastic lightweight and performant object-like data structure.

You also get a few additional features, such as in-order iteration, the parameters are fixed at run time, and there's a method for turning it into an ordered dictionary (which is serializable in, wait for it, JSON).

ajdlinux · on Aug 25, 2016

If you're not limited to the standard library, https://github.com/hynek/attrs is also worth taking a look at.

stavros · on Aug 25, 2016

I just found out about it in this article, and already posted a new story (it looks that good):

https://news.ycombinator.com/item?id=12359522

Singletoned · on Aug 25, 2016

> Once you go dict, you won't go back. This style of development is too easy, since dictionaries are baked into Python, and there are many facilities for working effectively with them.

How is this an argument against using dictionaries?

After 10 years of Python development, I do find myself using dictionaries rather than objects, in just the way that the author proscribes, but I'm finding it to be a genuine pleasure.

Roboprog · on Aug 25, 2016

REPENT!!! :-)

Yeah, get shit done, YAGNI, and all that.

If needs wrapper, make one for repeated access types. If needs one or more bonafide objects for passing around, updating, and general good behavior, the make them as needed. Otherwise, there are 15 other little jobs that need coding, and we gotta move on.

st3v3r · on Aug 25, 2016

And what happens when one of those keys changes?

niftich · on Aug 25, 2016

You change your code. Loose coupling a nice goal to aim for, but at the end of the day, somewhere deep down inside the code, you have to tightly couple to actually get anything done. Where that transition occurs is entirely programmer's discretion.

st3v3r · on Aug 25, 2016

But there's a difference between changing it once when you serialize/deserialize it, and changing it every time you try to access the key.

sidlls · on Aug 26, 2016

Writing the code with the assumption that a key will change is on the same level with premature optimization, in my opinion.

It might be reasonable to make the assumption for some keys. In these cases a function taking the data and key as arguments and returning the value is sufficient and appropriate.

Frankly, if the code base is so littered with references to that specific data and key combination it might actually indicate a poor design.

st3v3r · on Aug 26, 2016

"Writing the code with the assumption that a key will change is on the same level with premature optimization, in my opinion."

Considering how easy it is, and how often I've had keys change on me, I have to strongly disagree.

"Frankly, if the code base is so littered with references to that specific data and key combination it might actually indicate a poor design."

That's kinda the point of the article.

Retra · on Aug 26, 2016

A single constant will fix that problem.

Singletoned · on Aug 26, 2016

Or even just a search and replace...

beat · on Aug 25, 2016

I think this article is somewhat off-base. The problem isn't JSON, it's lack of respect for separation of duties. JSON is just a data exchange format.

Want to program in an OO way using JSON? Easy. Just build a factory to generate objects from JSON input. Put your validation and error handling right there. Now you can get a known valid object from the JSON, a class instance with all the encapsulation and business logic your heart desires. Need to share it with the outside world? Provide a JSON output method.

Translating data formats is at the heart of day-to-day programming. It ain't rocket surgery. Fix the problem, not the blame.

(And if you think JSON sucks, believe me, you never dealt with data file formats from the pre-XML days!)

wvenable · on Aug 25, 2016

The author isn't saying the problem is JSON.

The same programming style exists quite a bit in older PHP code as well. This is because one of it's primary data types is an list/hashtable hybrid. And JSON is similar -- it promotes those same structural types; the list and the hashtable (array and object, respectively). So programmers are using them, not just for building structures for data interchange, but for actual programming logic.

The fix for the problem is just education.

nbevans · on Aug 25, 2016

There is so much wrong with this blog post that I don't even know where to begin. He appears to have included Python-specific details in his list of why he hates lists and dictionaries. Apparently Python throws exceptions if a key doesn't exist and seemingly has no Maybe/Option alternative? I don't know if that is true or not.

He claims using lists and dictionaries means you lose encapsulation - does it? A smarter programmer would realise that actually it entirely depends on the _types_ you are storing in those data structures.

digisth · on Aug 25, 2016

The rule of thumb I've always used for when to use OO is "will there be more than one extant object at once or not?" If yes, and especially if these objects need real behavior, then use OO.

If you're essentially going through one object at a time, then discarding them, you're may just be doing conduit data processing, and so there's little advantage to using objects. I think what's missing in this (well-written) analysis is this distinction; if you're slurping data from one place, making a few changes (or especially if you're not making any), then sticking into a DB or vice versa, OO may be the wrong choice.

Ask yourself while writing the code: "are these active, behavior-driven objects that need encapsulation and relatively sophisticated behaviors, or is this just data I'm doing some relatively simple processing on?"

corysama · on Aug 25, 2016

The author has lots of good points. Because I write most of my Python in the style he is advising against, I recognize that style has issues. The main issue for me is that a dict of dicts of dicts is not an interface. It doesn't have any constraints. It doesn't communicate expectations for use for the actual intent of the code. The best you can do is a comment explaining what to expect and a lot of error checking.

That said, almost all of the python I write these days is in the form of functional transforms on built-in data structures. And I love it!

There was a great Pycon2012 talk titled "Stop Writing Classes". You can find it linked and discussed here https://news.ycombinator.com/item?id=3717715

davidism · on Aug 25, 2016

Start Writing More Classes: http://lucumr.pocoo.org/2013/2/13/moar-classes/

HN discussion: https://news.ycombinator.com/item?id=5204967

agentgt · on Aug 25, 2016

On the one hand I agree with OP on directly interacting with JSON is not really a good idea but on the other hand I completely disagree with that behavior should be shoved into data objects. Also I think part of the problem is Python doesn't have much typing (I know they recently added optional typing in python but I don't think many use it).

As more of an FP guy I'm firm believer of the separation behavior and data. Clojure's Hickey sort of has a valid point... its freaking data... stop making it complicated to access it.

Robin_Message · on Aug 25, 2016

I'm surprised no-one has linked Steve Yegge's Universal Design Pattern – http://steve-yegge.blogspot.co.uk/2008/10/universal-design-p...

It argues that loosely defined objects are an excellent design pattern, but I'm too tired to decide if it is directly relevant to this.

thesmallestcat · on Aug 25, 2016

It's called a hash. Or a dict. Or a map. Not JavaScript Object Notation, FFS.

spullara · on Aug 25, 2016

At this point everyone should be using an evolvable (thrift, protocol buffers, avro, etc) schema format when they are storing or transmitting their data if they want to run an always on service - there is no downtime for migrations in the real world. Trying to do this ad-hoc with JSON is a lost cause and will eventually lead you to failure at runtime or worse, data loss situations.

crucini · on Aug 26, 2016

JSON isn't un-evolvable. In fact, thrift can serialize to JSON.

What makes thrift evolvable in practice is that we don't remove fields and don't add mandatory fields. The same discipline can be applied to JSON definitions.

Well thrift also tags all fields with integers, so a consumer with an older schema can parse a record with a newer schema, skipping the new fields. Of course JSON trivially has this property.

Maybe the key here is "ad-hoc"; something like JSON-schema is needed.

spullara · on Aug 26, 2016

Yep, I mentioned below that using JSON as a serialization format is fine but you still need to specify a schema and understand what happens when you read data written by newer/older code.

mixmastamyk · on Aug 25, 2016

Anyone have a good blog post handy on this?

spullara · on Aug 25, 2016

This is a related post that talks about the entity store that I built for my startup that was ultimately acquired by Twitter. They internally had their own similar store called ThriftStore that worked similarly. Google also builds their systems like this using protocol buffers. It is a pretty easy pattern and could theoretically be done with JSON if you provide some kind of schema and evolution strategy on read.

https://javarants.com/havrobase-a-searchable-evolvable-entit...

jjzieve · on Aug 25, 2016

At least lists, dictionaries map relatively well to a tabular (SQL) format. Objects don't map well at all! Anyone who's spent enough time with "mature" ORMs knows this. Especially when there's a deadline and you have to write "native" SQL just to get whatever the hell you needed in the first place. "Well maybe you should have read everything and understood the ORM to its most minute detail..." NO! That's the whole point of abstraction! If I understood everything about that code, I'd be better off re-writing it to better suit MY specific problem. Look, I don't want to be another OO basher. OO definitely has a place in complex systems like game development, where the lives of the objects are longer than a page refresh. But in web dev, its becoming increasingly obvious to me that the OO paradigm is a huge time suck. /rant

okreallywtf · on Aug 25, 2016

I feel like we have this discussion at work daily involving nhibernate. It is an abstraction that makes 80% of work quicker and easier, but what it makes easier and cleaner would have been trivial anyways.

lgunsch · on Aug 25, 2016

I find that code that uses dictionaries a lot ends up with mysterious unnamed types used in various places throughout the system disguised as dictionaries. They have required fields, and must interacted with using business logic that is not obvious. This becomes a real problem when the original authors of the system are gone, and new maintainers have taken over and have to implement new features.

By using objects, or lightweight objects like namedtuple, which have already been mentioned in other comments a bunch, you formally document the data-structure. You give it a name, expected fields, and required behaviours when interacting with it. The code becomes much easier to follow and understand clearly. Bugs don't creep in when a new maintainer forgets about the mysterious undocumented required business logic.

keithnz · on Aug 25, 2016

Lots of religion in this thread!

I think the point is, if json is your data exchange format it could be bad if you let that structure propagate into your application.

so in general you should prefer :-

json => <chosen languages best form for dealing with data>

over json => <chosen languages tools for dealing with json>

Different languages are going to have different mechanisms. Some languages you may abstract from json completely, some languages may natively deal with json, and your persistence layer may deal with json also.

So what you need is a well considered design that takes advantage of your chosen languages philosophy / mechanics, whatever that may be. There is no one way to design anything. The thing to avoid is - not working out how to structure your code to make things easy / appropriate to the task at hand. That path leads to messy code.

qwertyuiop924 · on Aug 25, 2016

For crying out loud, you don't have to build an object hierarchy around the thing (although it would make sense to in this case), but at least have the common sense, or the sense of shame, to abstract away data lookups into separate functions. That's data structure abstraction 101.

lanestp · on Aug 25, 2016

I take a couple of issues with the author here. I don't personally worry much about breaking away from strict OOP. When a pattern like this develops it is usually because the data is too dynamic for a static property list. An obvious example is for creating reports. No one is going to sit down and hand design every single multi column report in a large project (I tell a lie, people do, it just makes the code base a horror show). By letting the data be more dynamic (Usually with JSON) it is trivial to create generic report structures and populate them.

Additionally, if you are using NoSQL as a backing store then solutions like class serialization don't make any sense since you will need to communicate in JSON anyway.

grandalf · on Aug 25, 2016

The code in the example has poor encapsulation, but I do not think it's the "JSON" style that causes that.

Much OOP code includes a hodgepodge of exposed internal state and methods that offer a combination of derived state and behavior to mutate that state.

Often, using data literals (like JSON) can make code clearer by making it explicit what is going on with state (when/if it is being mutated), and making the system easier to snapshot, test, etc.

While much code that uses JSON-like constructs is overly verbose and error prone, adding a bit of structural typing (with Flow) or creating schemas to ensure system invariants (jsonschema) can lead to a system that is easy to reason about and maintain.

weatherlight · on Aug 25, 2016

Use GraphQL or Really stick with RESTful routes. The more predictable the schema of these Dictionaries/hashes/JSON are the less likely you are to see that mess above. This is true whether you are using a FP approach or an OO approach. Using an Imperative coding style when doing ETL will always be hairy.

That function also violates the Single responsibility principle. I wouldn't even know where to begin to write a unit test for that other than breaking it down into smaller parts. There are design patterns that could be followed in dynamically typed languages that would avoid that mess altogether other than just OO.

greencurry43 · on Aug 25, 2016

Coupling your code to the JSON you receive over the web can lead to some interesting problems. If the system on the other end decides to make some change you are not expecting, it can lead to errors.

In JavaScript, a simple thing that helps is to use lodash.get and provide a path to the property you are wanting.

  lodash.get(someObject, 'path.to.a.property')

If the path isn't there, the lodash.get returns undefined. This is much nicer than getting the error "Cannot read property 'to' of undefined" when "path" isn't there.

stavros · on Aug 25, 2016

Yeah, this is currently pretty bad in Python, which led me to create the jsane library:

https://pypi.python.org/pypi/jsane

>>> j = jsane.loads('{"foo": {"bar": {"baz": ["well", "hello", "there"]}}}')

>>> j.foo.bar.baz[1].r()

u'hello'

Roboprog · on Aug 25, 2016

Simple rule of thumb: Don't Repeat Yourself. If part of the data structure is accessed in multiple places, create a wrapper routine (or even an object/class) around it. Otherwise, if it's in one place, perhaps "You Aren't Gonna Need It".

bluetwo · on Aug 25, 2016

Yes, any useful technology will get over-applied, at the detriment of better ways to do things.

Not the fault of the technology, but of the developer who failed to consider alternate ways of accomplishing the same task.

tscs37 · on Aug 25, 2016

JSON is best when it's solely used for serialization (or config files).

Using it deep into the project makes no sense, the first step in handling JSON should always be to code it into native data structures.

rocqua · on Aug 25, 2016

The point made here seems to be that often, these native data structures are dicts and lists, because that is JSON. Meanwhile, you'd often want something else when looking at only the internals.

jack9 · on Aug 25, 2016

First step is validation, then conversion, then logic. This keeps the json structure errors and changes from affecting the business logic.

ajmurmann · on Aug 25, 2016

I think the "anemic objects[domain model]" is a red herring in this case. It would be much cleaner to create separate serialized and deserializers that convert your actual domain models to JSON and back. By the time you are doing something as shown in his example like building a book inventory it should be all proper objects and no primitives dictated by what should be the serialization layer.

Edit: Fixing phone auto correct typo - "property objects" -> "proper objects"

mpweiher · on Aug 25, 2016

I actually saw this coding style way before JSON, for example at Apple. I even jokingly created DUKE: Developers United against Keyed Everything.

Another place I see this is in the eternal dynamic/typing debate. A lot of the criticism of dynamic typing will be with examples from JavaScript, Ruby, Python and maybe even PHP. Hardly ever from Smalltalk (or Objective-C), because Smalltalk code tends to not have the types of problems cited. This puzzled me for a while, because I also find these languages somewhat less "solid", yet couldn't quite put my finger on why.h

That is, until I realised that all of these languages use hashes as their basic object representation. Coincidence? I think not. So I coined the term "hash language" for these languages, both because they are hash-based and it appears to be easy to make a hash of things in them, possibly for precisely that reason.

That said, I think it's also a mistake to disregard the power this sort of very generic programming brings, especially once you consider objects composed of multiple facets that are interpreted in different contexts.

IMNSHO, the way to combat hash-programming is to provide powerful and convenient metaprogramming facilities for object representation, so dealing with objects generically is just as easy and obvious as dealing with dictionaries.

Not entirely surprisingly, my own language ( http://objective.st ) has some facilities for this, mostly by making identifiers into first class entities. More research needed ;-)

iamleppert · on Aug 25, 2016

Is the OP familiar with Object.keys()?

You don't have to hard-code explicit dot notation into your code when you're processing JSON or any other hierarchical object serialization format, which is what JSON is.

If you want to make your code more robust, you should process the structure of the JSON document and infer meaning from its keys based upon your position in the tree and of the values of the key names that are meaningful to your application.

This makes it possible to accept any kind of JSON, even if the original format changes, and you won't get uncaught exceptions and your application can decide what to do in a more graceful manner.

You should also centralize the code that is responsible for serializing and deserializing your JSON wire format and creating objects. There's no reason to have ad-hoc code in each object constructor like his example. A good example of such a thing is dnode https://www.npmjs.com/package/dnode. It handles all the JSON abstraction (in this case for RPC) and you don't even need to worry about the JSON ever again.

This has nothing to do with JSON and more to do with poor design and tight coupling of interfaces.

spdustin · on Aug 25, 2016

I may be missing something, so I'd appreciate a correction, but why all that effort when you can use collections.namedtuple and a custom object_hook for json.loads?

    import json
    from collections import namedtuple
    
    data = '{JSON string goes here}'
    fancy_data = json.loads(data, object_hook=lambda d: namedtuple('X', d.keys())(*d.values()))

K0nserv · on Aug 25, 2016

I agree with this and I've raised it several times in Objective-C codebases that can some times end up littered with NSDictionary:s everywhere taking no advantage of the Objective-C type mechanics. I don't think it's necessarily as bad in python or javascript. This is because these languages are dynamically typed and that diminishes the benefits of deserializing json to a native model. It's still valuable because you can guard against bad data by rejecting at the boundary of your program.

In statically typed languages however the added bonus is a lot more significant because the type system increases the benefits of deserializing JSON to native models. Take this Swift example

    import Foundation
    
    enum SerializationError: ErrorType {
        case InvalidData
    }
    
    struct Thing {
        let a: Int
        let b: String
        
        static func deserialize(fromDictionary data: [String:AnyObject]) throws -> Thing {
            guard let a = data["a"] as? Int, let b = data["b"] as? String else {
                throw SerializationError.InvalidData
            }
            
            return Thing(a: a, b: b)
        }
        
        static func deserialize(fromArray data: [[String: AnyObject]]) -> [Thing] {
            return data.flatMap {
                try? deserialize(fromDictionary: $0)
            }
        }
    }
    
    let data: [[String: AnyObject]] = [
        [
            "a": 10 as NSNumber,
            "b": "Hello" as NSString
        ],
        [
            "a": "10" as NSString,
            "b": 10 as NSNumber
        ]
    ]
    
    let models = Thing.deserialize(fromArray: data)

Not only do you end up with a native array of models you can also be certain that any type information is correct because invalid results have been thrown away during parsing.

xyzzy4 · on Aug 25, 2016

It's good to use lots of NSDictionary's in Objective C, in my opinion. The alternative is to create lots of different object types that takes much more code, and very little benefit for that extra code. If you're just shifting data around then there's no need to define objects for them.

moosey · on Aug 25, 2016

This is what keeps dragging me back to moose & perl5. You describe the attributes of a class, the constructor for that class is created for you, and you can pass in hashes and it will automatically instantiate (and fail if the rules you have set for attributes are not met).

I've found that you can kinda sorta do the same with other languages (python/ruby/javascript) by writing static factory builders inside the class that do this checking for you and raise an exception or return an object for you, but it still doesn't compare to me to Moose/Moose::Util::TypeConstraints::coerce/subtype and attributes with the coerce option set. It makes it so easy to coerce a deep json object into a deep class structure.

I always try to hunt down things that are similar in other languages (python allows named arguments from a dict, IIRC, and that allows similar things, but you still have to write the constructor yourself), but I've yet to find something that makes it as simple.

sleek · on Aug 25, 2016

If he used tuples as keys to the dict, he wouldn't have this absurd code and the article wouldn't have been written

zdw · on Aug 25, 2016

And this is where we end up without easy to use and well supported schemas...

If this was XML, you'd write a very simple RELAX NG grammar (use the compact syntax: http://relaxng.org/compact-tutorial-20030326.html ) that describes the structure of the incoming data, then use it to validate the input data before processing it.

After that, you know data is valid and in the right structure, so you can throw away most of the "is this in the right place?" checks.

JSON and YAML's various schema implementations can't hold a candle to this, and it's been around for over a decade.

The XML ecosystem does have some very bad parts, but it's not all bad, so it's worth learning from places where it actually works well.

sk5t · on Aug 25, 2016

In Java land, swagger + dropwizard validation do a rather good job of this with json. I wouldn't be excited to return to using soap/xml all the time.

michaelfeathers · on Aug 25, 2016

This is space that the Clojure community has already visited.

dasmoth · on Aug 25, 2016

And, somewhat, tamed using tools like get-in and update-in.

Clojure.spec is looking promising as a route for making this general style more robust without losing the ease and flexibility.

davexunit · on Aug 25, 2016

And every other Lisp community.

collyw · on Aug 25, 2016

And what was their outcome?

golemotron · on Aug 25, 2016

heat death.

crucini · on Aug 25, 2016

I agree that the supplied code is improvable. Consider this:

  def set_r(adict, keypath, val):
    key = keypath[0]
    if len(keypath) == 1:
      adict[key] = val
      return
    if not key in adict:
      adict[key] = {}
    set_r(adict[key], keypath[1:], val)

  def build_book_inventory(book_ids, shops):
    shop_labels = [shop['label'] for shop in shops]
    books = Persistency_books_table_read(
      shop_labels=shop_labels,
      book_ids=book_ids)
    inventory = {}
    keys = 'shop_label cell_label book_id'.split()
    for book in book_list:
      keypath = [book[k] for k in keys]
      set_r(inventory, keypath, book['count']
    return inventory

First, the author clearly needed "autovivification" as supplied by Perl. We supply a substitute with set_r().

Second, I'd avoid creating local variables like "book_id". It creates mess. We never had the slightest interest in the book_id; it's just part of the wine we are pouring from one bottle into another.

Third, I've preserved (modulo names) the interface of this function but I suspect the surrounding code could also be improved. Also call a list of books "books", not book_list; list is the assumed sequence container in Python. "books=book_ids" is unfortunate; to thrive in a weakly typed language we need variable names that distinguish objects from ids.

Larger point: the author wants to create classes for the various business objects, which is a common enough pattern, but ultimately just makes extra work and redundant lines of code. A relational database can handle a wide variety of objects, with some knowledge of their semantics, without any custom code per-class.

As you know, the difference between dicts and objects in python is mostly semantic sugar. We can easily enough make a class that gives dot-notation access to values in a dict, if one objects to the noisiness of foo['bar'].

If you want to enforce object schema at system boundaries, there are better ways (more compact, expressive and maintainable) than writing elaborate "classes" for each type of object.

rm999 · on Aug 25, 2016

More generally, using data structures well-suited to your problem is really important but often underappreciated in software engineering. Elegant code and algorithms naturally follow.

In the example in the article, OO seems like a good way to go.

dgb23 · on Aug 25, 2016

From an OO perspective I can completely agree with this and the encode/decode pattern the author is suggesting seems to be the right way to deal with this problem as it will also hopefully translate in having an esperanto data type which can be used for all sorts of APIs and formats.

But from a functional perspective I would disagree. The code example wouldn't even make sense in that world. You would query it as is and compare it with other data or create new structures as with any other data you handle in your language.

tony-allan · on Aug 26, 2016

I agree 100% with the article.

I've just written a script which makes every one of the mistakes listed in the article. I am consuming a JSON based API in a long ugly mash of code, exactly as described. It doesn't look pretty.

In my defence, I wrote the code as I was trying to understand the API. I had not read ahead and didn't really know which API's I would need or how fiddly it would be to bring it all together.

I am now exposed to pain if the API changes or if anything breaks. Time to go back and tidy up the code!

tony-allan · on Aug 27, 2016

I went back and fixed the script to turn a portion of the JSON API into a set of useful Python objects. The code looks mostly OK now.

It's no surprise that it took about the same amount of time again to tidy it up. Apart from making me feel better, and a promise of less work in the future for updates, it's hard to justify the extra effort.

astazangasta · on Aug 25, 2016

JSON is the equivalent of a UNIX pipe; it is for passing data between applications on the Internet, just like a pipe is a way for passing data between applications on a machine.

codedokode · on Aug 25, 2016

In PHP we call it "Array Oriented Programming" ( http://www.epixa.com/2012/04/array-oriented-programming.html ). So looks like Python developers finally discovered this paradigm too. Let's wait for JS developers now.

By the way, PHP has type hints for functions that help to understand what are the types of arguments and the return type.

joelcorrea · on Aug 25, 2016

Pretty much statically typed vs dinamically-typed discussion. It depends on your particular case, if there is a sufficiently defined schema or not

ajmurmann · on Aug 25, 2016

I don't see how this is related to statically typed vs dynamically typed. This is more a case of Primitive Obsession http://c2.com/cgi/wiki?PrimitiveObsession and a lack of a adapter/serialization layer.

joelcorrea · on Aug 25, 2016

I meant, typed vs not-typed. Someone mentioned Clojure, and for me thats the whole point: on one hand, and for a whole category of problems the best abstraction is not to have a strong schema, given the variable parts. On the other hand however there are scenarios where you already know the structure sufficiently well. A matter of abstraction IMO

jroseattle · on Aug 25, 2016

Just an aside, I'm not sure I agree with the title of "JSON-driven" development.

The problem being described is about using parts of a language in an inefficient or ineffective way of solving a problem.

The solution to this particular problem can be summed up in concepts like encapsulation or DRY. More to the point, let's not blame components like JSON for basically poor implementation.

malodyets · on Aug 25, 2016

I prefer XML + schema (for which I use RelaxNG + jing) because this makes validating the input very straightforward and normalizes that process. But I understand the appeal of JSON and use it for some APIs myself. What experiences do people have with JSON.net Schema? Does anyone know of a json schema + validator system that is cross-language?

emodendroket · on Aug 25, 2016

There's no particular reason objects can't be serialized as JSON and deserialized back into strongly typed objects.

partycoder · on Aug 25, 2016

I like to wrap relevant objects around a type, that is responsible for creating, validating, serializing, deserializing and mutating that object.

But taking a JSON and passing it around, with no ownership, no predictability... it's the mindset of the tech debt programmer.

qwertyuiop924 · on Aug 25, 2016

I think the general rule is "validate anything coming in from the internet, ever."

collyw · on Aug 26, 2016

Thankfully there isn't really JSON driven development as a methodology. Its just a bit of a crappy pattern (for many tasks) that far too many people use. Hopefully no one is advocating it as the way to develop software.

ris · on Aug 25, 2016

I couldn't agree more. My life these days consists 90% of marshalling json around. Hooray for microservices.

seomis · on Aug 25, 2016

So we just need references? Let's all switch to XML (just kidding) or YAML (maybe not kidding)!

Goladus · on Aug 25, 2016

Although I coincidentally mostly agree with the conclusion "encode on entry and decode on exit" I disagree substantially with the rest of the article.

> I know that it's now en vogue to sneer at OO

I don't sneer at it. OO has been the dominant coding religion for most of my career and I rage against the educators and propagandists who spend a decade smothering everyone with it. I curse them for all the time I wasted trying to build classes for my data only to realize that if I had used a simple dictionary or list my code would be shorter, simpler, more robust, and more flexible.

The logic in the article above is all premised on object-oriented religion. OO for OO's sake because OO. Using the power of dictionaries is bad, using the power of objects is good.

> It completely defeats object orientation.

You could just as easily say using objects defeats the point of having lists and dicts.

> It offers nothing of the abstraction powers of object orientation.

Most of the abstraction power of object orientation happens when you create the methods. If you aren't defining new classes, you write functions instead. You still have abstractions, what you don't have is the encapsulation of code with data.

> It doesn't say what it's doing.

Sure it does, and better yet since you are using the standard data types it will be said in a language that any other Python developer is likely understand immediately.

> The above code is filled with auxiliary logic that has nothing to do with what it actually tries to achieve.

The above code is filled with auxiliary logic because the author of it apparently didn't write any useful functions for operating on dictionaries.

I'm not sure that any of the auxiliary logic in that code has anything to do with the choice to use dictionaries. It's a standard data munging problem that comes from having data from different sources. You have the exact same problem with objects if you didn't write any useful methods for operating on them.

> but done right, it can be very powerful, especially in big and complex codebases.

And in my opinion, the right way to do Object-Orientation in Python is to not do it until you really know you need it: your code is heading towards big and complex and you need to lock it down and organize the data and methods into well-encapsulated classes. (Although maybe at that point you realize it's not a big deal and don't bother)

Designing around lists and dicts from the start is a much more flexible strategy than trying to get all the encapsulation exactly right on the first try. If you don't have lots of time to spend up front UML'ing an object heirarchy for your big, complex application, you're probably better off sketching and iterating with JSON in mind (and yaml if humans need to edit it). As your application takes shape, it will become apparent where it makes sense to lock down functions and data into objects.

This is especially true given that all of Python's standard types are inheritable classes.

drawkbox · on Aug 25, 2016

Arguments for dict/list driven development:

NOTE: This isn't JSON driven-development, JSON just mimics base types like dict, list, string, numeric etc of all languages and a big reason it is so common especially in Python/Javascript.

- Large unknown lists/dictionaries data structures can be deserialized into dicts/lists for any language without issue.

Sometimes keys/data are unknown such as large attribute data sets that may always have new keys. In that case strong typing to an OO object will always be broken. Example: a facebook attribute set, keys/data that aren't set will not appear, new ones are added all the time, which would create a cat/mouse serialization/deserialization game. Same problem a binary structure has (offsets) when you really need a flexible keyed structure.

One missing key doesn't break your whole serialization/deserialization system built with strongly typed OO. Validation can be done on accept and if necessary convert into an OO system.

- If needed, classes that are backed/extended/inherited by a dict/list or set (or composition) that can load in the JSON/dict/lists and only expose the needed values after validation are useful.

i.e. a class that inherits from or composes to Dictionary<string,object> for instance in C# would only fill keys that are necessary for the view data, not a bunch of extra null fields because it might not have that key/property. It also has the ability to deserialize objects that may have new keys. Not everything is a perfect world where data structures are known before-hand.

- It reduces complexity many times, no need for an OO serialize/deserialize layer when you are passing back as basic dict/list or JSON.

Why add complexity to something simple?

- Unless you control the server and the client, real-world data structures aren't a perfect map of keys/values to OO properties.

Assuming that is a system ready to break in the real-world many times. Someone adds a field to the DB object then all clients that use it can't serialize/deserialize. Real world serialization/deserialization has to accept in basic types dict/list, validate and then use as needed (some go to OO objects behind scenes). I see too many systems where people just have an EF object and expose that over a web api and just expect it to work, that is a bad example of poor encapsulation. Some fields don't need to be serialized to public apis. In Microsoft land MVVM was created to help stop this practice but still creates two sets of OO objects and breaks on any new keys/data (though breaking here may be desired for strong typing).

- Dict/list data structures can be easily setup to have cleaner naming and keys without tons of attribute/helpers

i.e. first-name key instead of first_name, FirstName, or firstName. This is more friendly to web/url naming that is common.

- Noted in the article, less memory used in many cases and highly optimized time in basic lists/dicts.

There are many more reasons...

Dicts, lists and basic string, numeric types are the base of all languages and computer science types. The reason this is common is it is simple to work with these types without added cruft of OO when needed.

OO does add complexity and if it isn't necessary you are just upping complexity for no reason (and memory). It is similar to the complaints of C coders to C++, basic structs and sets are sometimes less complex than C++ OO objects. Same thing with dict/list of some monstrosity of an OO serialization/deserialization system that breaks on every new key or field and you have to update the server and client rather than just increment a version and validation. The longer you code the more you see this.

OO objects should not be used all the time just like dict/lists shouldn't be used all the time.

hardlianotion · on Aug 25, 2016

Completely agree.

jowiar · on Aug 25, 2016

With dynamic languages (certainly Python, Ruby, and JS), there's a definite lack of nudge from the tooling to translate from "interchange" format into an internal "smart" format (procedural code + everything is a hash = easy hacks). Whereas with something like Scala, the tools make it very clear that if you serialize/deserialize on the periphery, you're in for a bag of hurt (or, at the very least, fighting with one hand tied behind your back).

This is not to say that one tool is better than the others, but tools do have opinions, and while being more permissive/ambivalent makes throwing together a quick script easier, a tool that nudges you in the direction of building something in a more-sustainable way is useful when building nontrivial systems.

catnaroek · on Aug 25, 2016

> If an object is just there to store data and no behaviour, then that's fine - don't add behaviour if it doesn't need it.

In that case, you want values rather than objects. Alas, Python doesn't have compound values.

dang · on Aug 26, 2016

I'm not sure what the dynamic is behind this weird flamewar, but it's definitely not the sort of discussion we want on HN, and your comments seem, presumably unintentionally, to have trolling effects. Please don't.

We detached this subthread from https://news.ycombinator.com/item?id=12358875 and marked it off-topic.

catnaroek · on Aug 26, 2016

> and your comments seem, presumably unintentionally, to have trolling effects.

I have no idea why stating facts would constitute “trolling”. But, anyway.

dllthomas · on Aug 30, 2016

It's entirely possible to communicate other things while - first order - only stating uncontroversial facts. What, you don't think your shop is nice? You don't think it would be a shame if something happened to it?

Given that, it's also entirely possible for people to perceive other things as communicated in cases where there's maybe not the intent.

dang · on Aug 26, 2016

I agree that it's strange, but the discussion has gone badly, and it has something to do with how you engage people. That's what I mean by unintentional trolling.

sevensor · on Aug 25, 2016

As tantalor reminds us in a sibling comment, the named tuple works nicely in this role.

catnaroek · on Aug 25, 2016

All you need to do is use the `is` operator to see how even tuples (named or otherwise) are objects, not values.

dragonwriter · on Aug 25, 2016

> All you need to do is use the `is` operator to see how even tuples (named or otherwise) are objects, not values.

That is not language feature of Python but an implementation detail. Python implementations are permitted (but not required) to intern any immutable value (which tuples, contrary to your description, are), and "is" is a mechanism for revealing what the implementation has done.

You seem to be equivocating between talk of values as a logical thing (where Python absolutely has compound values, including tuples) and as an implementation feature (where individual Python implementations may or may not implement certain logical values through interning.)

Terr_ · on Aug 25, 2016

That sounds like basically the same newbie-pitfall that exists with code-literal Strings in Java: Just because a certain comparison operation can work in some circumstances (where the underlying platform makes an optimization) doesn't mean is a safe/sane choice in general.

catnaroek · on Aug 25, 2016

> That is not language feature of Python but an implementation detail.

So you're saying that the behavior of the `is` operator is implementation-defined?

> values as a logical thing (where Python absolutely has compound values, including tuples)

The (informal, unspecified) metalanguage that you're using to reason about Python programs has compound values. Python itself doesn't.

dragonwriter · on Aug 25, 2016

> So you're saying that the behavior of the `is` operator is implementation-defined?

No, the behavior has a standard definition: it reveals whether the operands refer to the same in-memory construct.

Whether immutable values are stored in the same in-memory construct is, however, AIUI, implementation dependent.

catnaroek · on Aug 25, 2016

> Whether immutable values are stored in the same in-memory construct is, however, AIUI, implementation dependent.

If I can't bind it to a variable, it isn't a value. You can't bind the list [1,2,3] to a variable, because Python has no such thing as the list [1,2,3].

dragonwriter · on Aug 25, 2016

I don't know why we are talking about lists, here, since (implementation details aside) lists in Python aren't conceptually values; for one thing, they aren't even immutable.

If you meant not to change the subject from tuples, yes, its true that whether the tuple (1,2,3) -- which is logically a value -- has a unique in-memory representation is not guaranteed at the language level in Python (and, in fact, it does not in the most common implementation.)

catnaroek · on Aug 25, 2016

Err, sorry, yes, pretend I said “the tuple (1,2,3)”.

Regarding in-memory representations, the whole point to using values is that you don't care about the representation. A value may be represented in a myriad different ways, but from within the language (as opposed to, say, using a memory debugger), you can't observe the difference. If you're allowed to probe differences between two representations of the same value, the value abstraction is leaky.

in_the_sticks · on Aug 25, 2016

That's a useless distinction. Everything is an object in Python. Does that mean Python doesn't have values?

catnaroek · on Aug 25, 2016

It has primitive values:

(0) small enough numbers

(1) True, False, None, etc.

(2) object references

But it doesn't have compound values.

And the distinction isn't useless. Values have a richer equational theory than objects, enabling lots of automatic optimizations.

qwertyuiop924 · on Aug 25, 2016

...And outside FP, nobody cares, or uses that definition of value. As noted in my above post.

catnaroek · on Aug 25, 2016

It's useful when writing Prolog programs too.

qwertyuiop924 · on Aug 25, 2016

Logic programming's actually pretty close to FP.

catnaroek · on Aug 25, 2016

Not really. Prolog is first-order. Functional languages, just like object-oriented ones, are higher-order.

qwertyuiop924 · on Aug 25, 2016

"Close," not "is."

catnaroek · on Aug 25, 2016

[flagged]

qwertyuiop924 · on Aug 25, 2016

Don't mock me. And yes, I did mean the notion of mathematical value and variable. That's one of the core tenets of both.