Couldn't agree more. If the behaviour or outcome of a function depends on the current time, then the current time is an input to that function. Inputs to functions should be declared explicitly as arguments where possible.
The existence of Time.now and the fact that it's possible to stub this method are not in themselves a good enough justification for excluding the current time from the list of explicit inputs. The length of the list of a function's inputs is one of the indicators of whether a function has too many disparate inputs and therefore too many responsibilities. Obscuring this number is likely to lead to more cluttered code.
In DHH's example, Time.now should be called somewhere higher up the call stack where its return value doesn't matter to the outcome of the function in which it resides. It should look like this:
def publish!(time)
self.update published_at: time
end
That way the inputs are clear. It also happens to mean that you can test it without requiring the linguistic flexibility provided by Ruby, but that doesn't mean that testability in inflexible languages is the only reason why it's the right thing to do.
Totally agree. The oddest thing is the example DHH picked and the language he used:
> If your Java code has Date date = new Date(); buried in its guts, how do you
> set it to a known value you can then compare against in your tests? Well, you
> don't. So what you do instead is pass in the date as part of the parameters
> to your method. You inject the dependency on Date. Yay, testable code!
It suggests that the whole of functional programming has completely passed him by. To re-write that paragraph:
If your Java function has Date date = new Date(); buried in its guts, then you have a side-effect which makes it much harder to reason about what that function is doing. Pass the date in as a function argument, and you now have (or are closer to having) a pure function: pure functions have no observable effect on the execution of your program, other than to compute a result given their inputs. A pure function like this is easier to reason about, easier to test, and easier to parallelize.
If DHH spent some time building something with Clojure/Haskell/OCaml/Rust/Scala/F# and less time racing cars, he might come up with some interesting views again.
Do you pass Random in? Or how about new file streams? Hell why should we even bother with local variables, lets just promote everything up the chain? Sure our method signatures would get longer and longer. We'd have to keep declaring variables again and again even though they're all used in one place.
There's a line where it gets absurd as it makes the code more complex. There's no good reason to be adding a date variable to your method signature just to make everything functional. Because you then have to declare and initialize that variable extra times.
That's the opposite of code reuse, that's bad programming. What happens when you need to refactor the method? What if the method call is in a different site to the date declaration? Why do that to yourself?
This hasn't got anything to do with functional programming. It's just simply about DRY and pointing out dogmatic coding approaches like DI being applied without actually understanding why you're doing it.
If you need to pass the same set of multiple things to multiple functions, you declare a type that represents all these arguments. At least this is the approach used in Haskell, with semi-explicit (explicit in type, implicit in the value) passing of parameters.
If you take a time argument, you have to declare an extra argument, not an extra variable, and in a powerful language, you can abstract its passing around.
If you move the method to a new location where its input is not available, you propagate up the input requirement in the form of argument passing, which conveys the new dependency up the call chain.
> If you need to pass the same set of multiple things to multiple functions, you declare a type that represents all these arguments. At least this is the approach used in Haskell, with semi-explicit (explicit in type, implicit in the value) passing of parameters.
In Haskell, they call this a Reader. This parallels the Writer, which lets code just 'write' to the provided values but not use them (more useful than it sounds, for instance, logging). But it is basically a context value, yes.
Well, the nice thing about this abstraction in Haskell, is that you can give a name to the notion of receiving a bunch of values (Reader), outputting a bunch of values (Writer), threading state that can be both Read and Written (State), non-determinism (List), etc. You can thus compose all these behaviors you want into a single type constructor and build your programs as composites of this type.
> There's a line where it gets absurd as it makes the code more complex. There's no good reason to be adding a date variable to your method signature just to make everything functional. Because you then have to declare and initialize that variable extra times.
Exactly, and that's why DI is useful: it forces you to make a distinction between dependencies and real parameters. You really want your method signature to only accept parameters that are going to change throughout the life of your application while being injected with the dependencies it needs. Something like:
@Inject
private Database db;
public void updateName(String id, String name) {
User u = db.find(id);
u.setName(name);
db.save(u);
}
Your logic is now clearly separated from the dependencies and trivial to test.
I tend to think that sometimes guys choose (or are forced to choose) solutions like configuration xml files over source code files just to keep quiet all these inquisitors.
I didn't see any concise answers there so I'll say why I use this a lot at work:
- If I use Spring.Net I never have to put stupid trace logging on any functions but I get it for free. I make one change in the XML config files and suddenly every function for every inject object has full tracing turned on.
- Getting new code installed into production is a huge drama/pain but changing a config file doesn't have the same level of bureaucracy.
Finally, somebody talking sense. You define a class to encapsulate rules (data, permissions) specific to a class of problems in your domain so that you don't have to repeat those rules elsewhere, which has all sorts of problems (the worst of which become impossible to address in some scenarios, e.g. where one executes potentially malicious code).
If you're passing in DateTime.Now to your Publish method then the caller (probably a UI element or a web callback) is implementing business logic that does not belong to it, which violates the single responsibility principle and will eventually cause your code to become unmaintainable.
Part of the problem with these posts is the authors seldom have experience building software in some of the more tricky real-world software development environments. While DHH is rightfully respected for many things, I can find nothing in his history [1] to suggest he has the relevant experience to criticise "enterprise development": as far as I can see, he's never had to work on a millions-of-lines code base with 3+ years of cruft written and maintained by large disparate teams of mediocre developers using poorly chosen technologies to solve problems within time, quality and scope constraints outside his control.
In these sorts of environments, saying "start over" or "choose better" is not an option and things like DI and well designed interfaces make it possible for a small team of architects to ensure the mediocre developers actually accomplish something and software gets released.
I'll repost what I posted yesterday on the other post:
- the consumer of the service doesn't need to know anything about the publish time: e.g. whether it's UTC or local time, whether it needs to have +1 second added to deal with a bug in some legacy interface, or perhaps a switch to database time, etc. The problem of "publish time" is strictly limited to the ArticleService, which is an expert on the matter of publish times.
- any new business rules regarding publish times or Publish-related activities are strictly limited to this service and none of the consumers have to be modified when the rules change
- the ArticleService is easily testable by swapping in one or more testing ITimeService implementations (e.g. one that returns a fixed test date time, one that returns bad times, etc.)
- your software has a consistent, single view of time (which is critical in a time dependent application, for example)
In which Peter Van-Roy states that: "In our experience, true state (destructively assignable entities) is essential for reasons of program modularity (which for this discussion I take as meaning: the ability to change one part of a program in a significant way without changing the rest of the program). Threaded state, e.g., monads as used by Haskell or DCGs as used by Prolog, cannot substitute for it."
He goes on to give an example that is also published in CTM. The evidence in the example is conclusive in support of his statement (I think). By using a mutable variable he is able to have communication of information between two indirectly related modules in a system without modifying any existing interfaces.
In a functional program you would be required to redefine all the method signatures in the call stack between the two modules (or make everything monadic or pass around the state of the world which are both edging twoard defining a stateful language on top of your functional one). Anyway, the pure functional solution is not modular. But some functional programmers would argue that this is a good thing because in fact what Van-Roy is does in his example is to introduce a side-effect with all the downsides of that. The functional programmer would say that this mutation of a variable behind everyone's back is a kind of sneaky behaviour. It's action at a distance. The functional approach is more explicit and transparent. I don't think anyone who has spent a significant amount of time chasing down bugs would disagree with more transparency.
However I think that its a tradeoff myself. Its fine to think that we should make everything explicit but often it just isn't practical in large systems to have to change the whole call stack for a minor update. We often do want genuine modularity. In Van-Roys example (a call counter) its easy to verify that the change is harmless. That's the point I think. Some side effects are harmless - maybe we need to relax a bit. Or as another comment on the thread said, the main thing to remember is that "local state is nice; global state isn't." (which is what the author is arguing in "Dependency Injection is a Virtue")
To me it seems that you (in your code example) have just split the function in two. Two thirds of the parameters are given to the constructor and the last one to the method. It might be what your application needs, but it seems arbitrary taken out of context.
In some other language one could implement this with a function returning a function and it would be about the same as declaring the fields in your example final:
Your "function returning a function" is a closure, and yup the class's instance variables act much like a (mutable) closure here.
The clever bit is that the consumer (e.g. a web callback or UI event handler) of IArticleService is not the constructor of ArticleService. The consumer simply receives a reference to an IArticleService and uses it, while the constructor of the service knows how to wire things up and hand out service references to consumers that need it so that the consumers don't need to know how to do the wiring up; this is precisely what a dependency injection engine does.
It's the difference between you walking to a Post Office and saying "mail(this envelope)" and you saying "mail(this envelope, put it into truck x, route it via [A, B, C, D], find it in the corner of truck D weeks later and route it through E, give it to Alice to give to Bob to place it in the customer's post office box)". Somebody constructed the Post Office with all these rules encapsulated and you just consume the service.
It's the reason society can scale despite nobody knowing how everything works exactly, and the same principles apply to creating maintainable code that scales among many developers of varying degrees of skill.
>as far as I can see, he's never had to work on a millions-of-lines code base with 3+ years of cruft written and maintained by large disparate teams of mediocre developers using poorly chosen technologies to solve problems within time, quality and scope constraints outside his control.
But he doesn't say that it's bad to use DI in those situations, does he? Just that it's probably a bad idea to do it when you're working on a ruby project.
If you're building a simple system then it probably won't matter what you do. My opinion is it's good practice to follow sound architectural principles and just because your language lets you get away with crazy stuff doesn't necessarily mean it's the best choice. I listed a number of features of the approach I took - I'm not sure why I'd want to give all that up just to save a few lines of code.
If you're building a simple system then it probably won't matter what you do.
Nothing could be farther from the truth.
When you're building simple systems you're up against hard limits for efficient team size - you need the system to remain simple. The reason is that a totally flat team structure only remains incredibly productive until you get to 5-8 people or so. Then you're stuck. Adding a person adds more communication overhead than useful work. Adding process makes everyone less efficient. The result is that you need to do both - and according to published literature don't get your old level of productivity back until your team has 20+ people on it.
Therefore the prime rule is to maximize efficiency. If you're going with the "small team of competent people" approach you can assume competence, but need to do what is efficient.
Efficiency here means long-run efficiency. This isn't just "throw together spaghetti and hope it works". This is throwing together stuff fast, and making it maintainable by a small team (partly through keeping it simple enough that people can hold program state in their heads).
I listed a number of features of the approach I took - I'm not sure why I'd want to give all that up just to save a few lines of code.
As long as the code is not crazy, lines of code is approximately the same as effort. Therefore adding lines of code reduces efficiency. You also added complexity - which gives more to think about during debugging which also reduces efficiency.
It is true that you gain a number of advantages. However the advantages you name fall under the YAGNIY principle. You Ain't Gonna Need It Yet. If you do need it, rewriting stuff to add that is likely to be a reasonable amount of work. In the meantime we save effort in writing, save effort in comprehending, and get to move on faster if we just leave it out.
That's how companies like Blizzard do to reproduce any game state in games like Warcraft III. For example a replay file is tiny: the only thing that is saved are player inputs and the time at which they happened, as well as the seed to the PRNG.
You're likely confused because you think "random" means "true randomness" while typically we're using seeded PRNGs.
In addition to that, in language supporting high-order functions it's very common to pass a function to/from a function.
Passing in a PRNG function and the current value the PRNG is at is a very good way to get a 100% deterministic function inside a functional programming language.
def publish!(time=Time.now)
self.update published_at: time
end
keeps the simple case simple (self.publish!) but allows injection if needed (custom publication time or testing stub).
My language of choice being Python, I tend to do that rather often for injecting "classes" which will have to be instantiated: the function or object constructors takes callables (since Python's instantiation is just calling) with defaults provided.
I think most of you are missing the point. I think the example DHH used should be interpreted as calling
post.publish!
is easier than
post.publish! Time.now
Since when you publish, the time of publication is the only value that make sense. Now, you can argue that you can set a default argument to Time.now, but that's weird code that only make sense if you need to supply a value other than Time.now, which you will only do in testing, so the code in DHH's example is optimal.
The point of DHH's blog post is, in a flexible enough language like JS, Ruby and Python, you can still test pretty much anything without DI. I honestly don't know any other reason why one would want to use DI other than making Java testable. In fact, you don't even need that in Java anymore because you can use Mockito these days and it can mock private stuff.
The stupidest reason I keep hearing over and over again is DI makes it possible to change an implementation at runtime. Wait what? Don't you have to change the application.xml file in Spring and restart the app? AFAIK there's nothing in Javaland that lets you listen to config file changes and then magically reconfigure the app context and make a copy of said context as a thread local in every new thread. Correct me if I'm wrong please. Even if there was, what magical alien land would require such a "feature", and such feature is so important that requiring your entire code base to be littered with all these indirect abstractions acceptable?
DI to swap out an implementation at runtime is an extreme edge case, though "runtime" may mean "the app can determine at startup which implementation makes sense". If you want to have a look at a nice usage of DI, peek into the source of Elasticsearch. You can swap out pretty much any relevant part of the implementation. You can for example change the HTTP acceptor to provide authentication [1] or add SSL [2] So there are cases where changing the implementation makes sense.
For which any simple configuration mechanism will do, and as long as you stick with coding to the interface, the problem is solved. So while technically this is called DI in Java, it's simply code taken as granted in many dynamic languages because of duck-typing. So yes, this is totally valid use case, but one with no need for a DI container in dynamic languages to solve.
First a DI container is not a necessary to do DI in any of the discussed languages. This is shown by the fact that you describe using duck-typing as method of doing DI in Ruby.
>the time of publication is the only value that makes sense.
Which time of publication? The time on the database server? the web server? The client's computer? What time zone? There is probably only one choice that makes sense to you, but I have spent too many hours dealing with bugs resulting from different people having different ideas on which one is the best one.
There is only 1 time acceptable - the ISO-8601 or equivalent time of publication in UTC, under the highest resolution you can get. Format to whatever you need on display. Give an option to the user to adjust the timezone in which the user agrees with, by default the user's timezone, if you know it a priori.
All your servers should use the same time server to sync time.
> The stupidest reason I keep hearing over and over again is DI makes it possible to change an implementation at runtime.
No, DI is about setting up your application at start time, not at runtime.
Configuration frameworks only cover a tiny fraction of this problem (basically by forcing you to pass a giant "Config" object everywhere). DI generalizes this process by making the information passed around more granular.
Your example is not DI. Your argument is in general sound, but it's tangential to the problem of DI. Your time object needs to be created at some point. DI would allow you to swap out the implementation of the Time class at that point.
In general, the time example is stupid. A better example is maybe that you have an app that wants to make http calls. Depending on the interpreter and the os you might want to switch the http library. Net/HTTP is a solid default that runs everywhere, but there are faster alternatives if you're on MRI only. In java, you need DI to solve this problem. In ruby, you don't.
This is just another example of how we're learning the lessons of FP about 20 years late.
I'm no expert - I've only muddled with FP languages (mostly Haskell). But what I learned there deeply affected my everyday imperative and OOP coding. Pure functions make for testable, clean code.
Just because someone keeps hitting his fingers instead of nails, doesn't mean a hammer is not a good tool. In other words, it is the programmer who leaks state, not the language.
I never directed my comment to Python. That because what I said serves to any language, from C to Python, and including ObjC, Pascal, JS, Ruby, etc. From WP:
In computer science, functional programming is a programming paradigm that treats computation as the evaluation of mathematical functions and avoids state and mutable data.
FP is a paradigm as OO is too. One thing does not exclude the other. Nothing stands between you and using the best of both worlds, in any language. You just have to think outside of the box. ;)
Agreed. However I don't need to waste my breath advocating for Ruby over Java. DHH apparently has plenty of time to devote and never seems to tire of it, even after nearly a decade.
It has generators, for instance, which makes it a lot easier to implement lazy data structures. Or context managers (though Java 7 has it too). And first-class functions, which beats defining one-method interfaces to get a callback any day of the week.
I still agree with the general point about valuable insights from FP. Perhaps the one applicable here is: impure functions can still make for testable and clean code as long as they are explicitly so.
Well put! Another reason to grab the time once and distribute it is that you typically don't want the passage of time during processing to affect results, that is, you want your system to perceive "atomic" time.
Would suck if half your update is December 31st 23:59, and the other half January 1st, 00:00.
Which if we're going to be honest is a hugely overhyped issue.
Note: I base this on how often I've swapped out the database for a project in the past 15 years (maybe twice?) of writing mostly database backed software.
I particularly like the example he gives near the end of always thinking they need to add the database and finally, after shipping to a bunch of customers, figuring out that they didn't actually need it.
In terms of test performance, I frequently see blog posts about reducing the time for a couple of hundred tests from half an hour to 15 minutes.
marcel@localhost[~]time testlogger -a MPWFoundation ObjectiveSmalltalk ObjectiveXML EGOS EGOS_Cocoa MPWSideWeb
tests:1: warning: 1055 tests (of 1055) executed, 0 failures (100 % success rate)
real 0m2.939s
user 0m1.607s
sys 0m0.438s
At some point you will need to test the unit that create the time object. With DI you could swap out the implementation of time that gets created and use some implementation that returns a constant time. But that still pulls the information about the implementation to use from some global scope: The DI library needs to know which implementation to use, and that information will come from some global scope - a configuration of some kind, thus making that argument a moot point. In ruby you can modify the classes at runtime to mock such functionality. You could argue that modifying the time implementation is some form of DI - you're changing the implementation at runtime.
DI is still relevant for dynamic languages, for two reasons:
- it lets you put your dependencies all at the same place
- it lets you define clearly what the dependencies of a piece of code are, without having to look at internals which may change tomorrow without warning
One thing I find aggravating with dynamic languages is that metaprogramming is a crutch people tend to rely on because they can't be bothered to write clean APIs. In my experience, it's particularly prevalent in Ruby.
Of course, the lack of dynamism can be a problem in statically typed languages, giving Spring's or Hibernate's stacktraces of hell with piles of proxy objects.
Regarding the 'publish!' method, the post linked to in DHH's example (http://weblog.therealadam.com/2013/01/03/design-for-test-vs-...) explains this as a 'trade off' - the summary of the other side being that if you don't pass 'time' to it you get a simper API.
This is somewhat more nuanced than your and DHH's conflicting opinions, and perhaps I show my inexperience (as primarily a ruby programmer), but this seems very sensible to me. It also reasonably describes why DHH's opinions may differ from others because of the focus of his work (in addition to the language used).
I'm a Ruby developer who disagrees with DHH strongly on this issue, as well as his stance on SOLID and testing. I also have years of experience in C#, using TDD and using dependency injection and IoC containers everywhere. However, I think there's a point to be pulled from DHH's post about unnecessary ceremony around dependency inversion (the more important thing) and static languages.
If I were to write this "publish" method in C#, I couldn't even put it in the Post object. It all looks fine and dandy now, but what happens when the first wave of change comes in? Say, if we were to have to stamp an "updated_by" field? Then we get to this:
def publish(time, updated_by)
self.update publish_at: time, updated_by: updated_by
end
And so on, and so on. The dependencies have to be passed to the object, so every dependency flows up to become dependency on any object that uses it. And since having method calls like ".publish(Date.now, CurrentUser.name)" everywhere really stinks for readability and DRY, and since I can't make Date.now a constructor requirement to create a Post, I always found myself having to do this (in C#):
public class PostPublisher
private IDate date;
private ICurrentUser;
public PostPublisher(IDate date, ICurrentUser){
this.date = date;
this.currentUser = currentUser;
}
public void Publish(post){
post.updated_by = currentUser.name;
post.published_on = date.now();
// save????
}
}
... and let the IoC container wire this sort of stuff up. Dependencies tied to this operation are isolated (as well as every other operation). But UGH, is this any better? So many classes and LOC just to do simple things.
I just don't see a good solution to this, and I'm not satisfied writing "good" code that handling is a pain. I decided to try Ruby as it seemed to have answers for this, and it does.
You can call the ability to use the Date object and still change it a "global variable" or "linguistic flexibility," but the truth is: In the applications we write, published_at is only going to be set to Date.now and updated_by to the currently logged in user. And as soon as we write the application and deploy, it will serve the business needs wonderfully. As a C# developer, I justified the costs of doing these end-arounds as necessary to maintain a TDD'd codebase that could easily be changed later. As a programmer, though, I can't justify the costs of writing dozens of lines of code when I can TDD one line of code. And I sure as heck don't want to eat the costs myself. So today, I just do it with one line of code really fast and make the client happy.
'the publishing will only happen now' is the kind of assumption that ends up being false very rapidly. Or immediately, if you're writing automated tests. That's kind of the point.
I tend to write stuff similar to your example in C# (though I tend to make them 'public readonly' instead, because I hate accessors), and I agree that it's kind of tiresome. But that is really just a deficiency in C# - it lacks a concise way to express such patterns.
And to be fair, it's absolutely true that if you wanted you could just use global variables. You are absolutely paying an up-front cost when you do this, but you should look at that cost as an investment: In the long run it will pay off by making your code easier to understand, easier to maintain, and easier to extend, because its dependencies on global state are painfully obvious. Sometimes that investment may not be worth it, so you have to make the judgement call.
> 'the publishing will only happen now' is the kind
> of assumption that ends up being false very rapidly.
I think you're right, and part of the problem here is the time example clouding people's judgement. The fact that we as humans experience time as a continuous global phenomenon is irrelevant from a software engineering perspective. The current time is data no different from any other data.
It's not even difficult to think of an example of your point about the assumption quickly becoming false. Presumably our "publish" method exists within a blogging application. An extremely common feature in blogging software is some tool for importing posts from other blogging applications. In that situation, you need to be able to pass in the time as input.
I write automated tests (using TDD), and it's not an assumption that would end up in my tests. Testing would not force me to introduce the date as an argument, at least not in Ruby.
That's the thing about Ruby, or perhaps another true thing that can be gathered from DHH's post: Ruby lets you make the judgment call you're talking about, without sacrificing the testing. C# would.
The existence of Time.now and the fact that it's possible to stub this method are not in themselves a good enough justification for excluding the current time from the list of explicit inputs. The length of the list of a function's inputs is one of the indicators of whether a function has too many disparate inputs and therefore too many responsibilities. Obscuring this number is likely to lead to more cluttered code.
In DHH's example, Time.now should be called somewhere higher up the call stack where its return value doesn't matter to the outcome of the function in which it resides. It should look like this:
That way the inputs are clear. It also happens to mean that you can test it without requiring the linguistic flexibility provided by Ruby, but that doesn't mean that testability in inflexible languages is the only reason why it's the right thing to do.