Hacker News new | past | comments | ask | show | jobs | submit login
Embracing Swift for Deep Learning (fast.ai)
127 points by zeristor on April 21, 2019 | hide | past | favorite | 58 comments



If I'm correct the key issue is the need for a deeper level of control, and that Python as as scripting language gluing things together doesn't afford that.

They talked about using Swift for differentiable programming, is there something unique in Swift that enables that? From what research I've done it seems the techniques of Deep Learning are much more adaptable than the standard sklearn fit/predict template.

I was a bit surprised since fast.ai seemed to be big on PyTorch, it seems a paradigm switch to TensorFlow, but that's what research is all about.

In addition to the TensorFlow / Pytorch dichtomy there seems to be Julia's Zygote, still in early stages. Deep Learning is like The Thing, you look away from it for a few minutes and it's sprouted legs out of its head and walked off into a different room: https://en.wikipedia.org/wiki/Differentiable_programming

"Some day this Singularity's goin' end..."


PyTorch and Tensorflow are not that different, particularly with TF eager.


Question for Jeremy:

See links below. Did you look at or consider Julia?

--

https://julialang.org/blog/2018/12/ml-language-compiler

https://arxiv.org/abs/1810.07951


Google Brain team said it considered Julia as well for TensorFlow but -"since its compilation approach is based on type specialization, it may have enough of a representation and infrastructure to host the Graph Program Extraction techniques `they` rely on"[1].

[1]:https://github.com/tensorflow/swift/blob/master/docs/WhySwif...


It sounds like they're saying they think it can be used, but "The Julia community shares many common values as with our project, which they published in a very like-minded blog post[1] after our project was well underway." I think the key point is "after our project was well underway" -- that if they were starting now, they'd likely be consider Julia much more seriously than they did at that time, which was before projects like Zygote got started.

Later on from your link: "[We] picked Swift over Julia because Swift has a much larger community, is syntactically closer to Python, and because we were more familiar with its internal implementation details - which allowed us to implement a prototype much faster."

[1]: https://julialang.org/blog/2017/12/ml&pl


As always, a disclaimer, I am deeply involved in the Julia community so beware of my biases.

Honestly, I think the best argument (and the only one I buy) for Swift is the latter part: “…we were more familiar with its internal implementation details - which allowed us to implement a prototype much faster.” – just like I said about a year ago [1]. If you are sitting on a team deeply familiar and passionate about a language – Swift – what kind of managerial fool would not let them take a stab at it? Especially with Lattner’s excellent track record.

[1]: https://news.ycombinator.com/item?id=16939525

The ideas behind Zygote dates to somewhere around spring 2017, but I think it took about a year to hammer out compiler internals and find time to hack, so you are still right that nothing was public when Google settled on Swift – I think there been at least one Mountain View visit though over XLA.jl, but do not quote me on that one.

The race is still on and I am looking forward to seeing what all the camps bring to this budding field. I have worked with an excellent student on SPMD auto batching for his thesis project and we now have some things to show [2]. This is still a great time to be a machine learning practitioner and endlessly exciting if you care about the intersection between Machine Learning and programming languages.

[2]: https://github.com/FluxML/Hydra.jl

My only request would be for Jeremy to explain “Swift for TensorFlow is the first serious effort I’ve seen to incorporate differentiable programming deep in to the heart of a widely used language that is designed from the ground up for performance.” to me. Is it the “serious” and/or “widely used” subset where the Julia camp is disjoint? =)


Nice comment, thanks. While I have several years of professional TensorFlow under my belt, since last fall I have spent a fair amount of time at home trying Julia + Flux. I find Flux very nice to use but it is a chicken and egg problem: I think Flux needs a lot more users in its community to really take off. TensorFlow and Pytorch dominate mindshare.


It is indeed a chicken and egg problem, but that is always the case and is mostly an argument for the status quo. When I teach there is always a handful of students asking “Why examples in Julia? Python is what is used in industry.” and my response is always the same: “Yes, true, and I could have used the same argument against Python in favour of Java in 2004. I think Julia has a reasonable potential of being ‘the future’ based on my experience as a scientist and programmer, so my hope is that I am giving you glimpse of the future. Besides, are you not at a university to learn a diverse toolbox rather than being cast into an employee mould?”.

There must be points when a technology takes off and becomes mainstream, what predates those points? That to me, this is the interesting question. In 2015 Torch (Lua) dominated the mindshare, why did TensorFlow succeed then? I think Lua itself caused it, lack of a coherent object model, etc. – sure as heck it was not the speed as I joked around by writing `import TensorFlow as TensorSlow` in my scripts for at least a year past the initial release. There was resistance against writing Python bindings for Torch, but in the end it happened and PyTorch was born; at this point TensorFlow dominated the mindshare. Why have PyTorch now become the favoured framework among all my colleagues and students then? Despite them being solidly in the TensorFlow camp prior to this. I think the answer is eager execution and the move in TensorFlow 2.0 to mimic exactly this speaks in my favour. So what would the Julia moment be then? If I knew, I would tell you, but I and several others in the Julia community are at least hard at work cracking this nut.

Just like I said a year ago, I am biased in favour of the bazaar and big tent that Julia represents. Swift will not have physicists, mathematicians, ESA employees, HPC people, etc. present and I would miss them as they bring wonderful libraries and viewpoints to the Julia bazaar. For example, I think TensorFlow with its initial graph model was designed that way precisely because it fit the mindset of non-practitioners with a compiler background – or perhaps it was the desire to “write once, deploy anywhere”? PyTorch could then take a big piece of the pie because they saw eagerness to be essential due to their academic/practitioner background. Only time will tell who is right here and I think me favouring Julia is not a safe bet, but certainly a reasonable one given the options available.


Great response, thanks! I am sure Julia will keep growing in use and influence. BTW, really off topic, but I am sometimes inconvenienced by the opposite problem of being very used to using older languages like Common Lisp (and to a lessor degree Prolog) that are still very relevant but are losing mind share. On the other hand just this hour an old customer contacting me to do work in Common Lisp and Prolog so there is still some interest in older languages.


Do you think this might just be people wanting something familiar "at the bottom"?

E.g. both Swift and Julia (and Rust, and Clang ...) have a common underlying backend (LLVM) in common, whereas in Common Lisp, it's ... Common Lisp all the way down, and it could potentially be a much better world, but it would just require duplicating too much stuff to be a viable candidate?


Julia is GC based language. If you are ok with then there is whole world of other languages with similar disadvantages. For high performance numerical computing most GC based languages won’t cut it and that fact is typically swept under the rug by creating C++ implementation wrapped by your GC based language bindings.


High performance numerical computing is one area that garbage collection doesn't really matter though, unlike real-time applications that needs consistency more than raw speed. Some occasional milliseconds (or even seconds) long stop-the-world GC won't really affect anything in an hour long or days long simulation, training or calculation. Not to mention you don't really allocate that much in many of those applications, in ML for example is not unusual to allocate all tensors once and just update the values during the training loop (usually on GPUs or TPUs), so there won't be anything to even trigger GC.

And for Julia in particular, there isn't really a need for bindings over C/C++ (the whole purpose of the language is not requiring another language for performance). Many of the most popular libraries for numeric computing are 100% written Julia, including Flux.jl [1] for Machine Learning and DifferentialEquations.jl [2]. Plus even real-time applications can be done with careful programming, since Julia allows writing programs with either no or minimal allocations, for example [3].

[1] https://github.com/FluxML/Flux.jl

[2] https://github.com/JuliaDiffEq/DifferentialEquations.jl

[3] https://juliacomputing.com/case-studies/mit-robotics.html


Just like Swift is one.


One can argue that automatic reference counting is a form of garbage collection. It is, however, deterministic and that’s why some engineers tend to prefer it.


It is not deterministic in the presence of deep nested data structures like acyclic graphs, where the amount of time running cascading deletes canny be fully calculated. Likewise the amount of stack space cannot be fully determined. Finally, sharing pointers across threads introduces non-deterministic cache locks and delays.

Herb Sutter has a quite good CppCon talk about it.

Many engineers prefer it due to cargo cult and anti-tracing GC bias, despite years of CS papers proving the contrary since the early 80's.


It is deterministic wrt. the object lifetime which is exactly what you need if your "delete" is actually controlling access to some shared resource that must be freed promptly when it goes out of use. The drawbacks you mentioned are largely-unavoidable consequences of that requirement, and they mostly matter when the allocation graph is large and complex. It's why obligate RC as found in Swift is likely a bad idea, and even inferior to obligate tracing GC - but other languages can do better than that, e.g. introducing arenas that can be used to allocate or free objects as a group, in a single operation.


Plenty of tracing GC also offer the same control over arenas and deterministic destruction, one just needs to actually use them, e.g. D, Modula-3, Mesa/Cedar, Active Oberon, Eiffel, System C#, C# 7.3/8...


There is a proper explanation of why Swift makes sense for TensorFlow at https://github.com/tensorflow/swift/blob/master/docs/WhySwif...


Which is in part written by the author and creator of Swift. Keep this in mind while reading this. Not to sound judgmental but I would be curious while Bill Gates explains to me why windows makes sense for X.


Given that he explains his thinking and reasoning in that post, one can either accept or challenge his logic and discuss it accordingly. There is no need for unprovoked attacks on his credibility. Doing so just suggests you don't feel qualified to evaluate the arguments he makes, in which case it's probably better to ask questions than attack.


It does not seem like an attack to me, rather informing readers of possible blindspots or conscious/unconscious bias in the author's point of view. I think it's only human nature to cheerlead something that one has put a significant amount of effort into making. I, as someone who does not know Swift and who its author is, would be more inclined to seek out alternate points of view after knowing that the author of this article is also the author of Swift.

Just as one would take any marketing/promotional material for why a company's products are better than competitors with a grain of salt, and would want such material to be clearly marked as marketing material, I think bringing to light the author's relationship to the Swift helps, and does not hinder, further discussion.


If presented in the context of a substantive comment about the assertions, I'd agree with you. Without any such substantive comments it's simply assuming the worst about others. The HN guidelines [0] are actually pretty darn good guidance for encouraging intelligent discussion of complex topics, and by my read advocate for more thoughtful approaches to commenting on multiple fronts here, including not posting shallow dismissals or assumptions about astroturfing and shillage. Again, if there is a concrete objection to the logic the author took time to detail in their post, by all means that's a great topic for discussion, and that's where we would all actually learn something positive instead of just insinuating something negative.

[0] https://news.ycombinator.com/newsguidelines.html


To be fair author should have disclosed this fact as disclaimer in the article.


He's not attacking his credibility, just bringing up the point that there may be bias in reasoning.


We should NOT do this. Swift is not cross-platform! As nice a language as it is, Swift support for Linux and Windows seriously lags behind Mac.

If we choose to elect Swift as an ML language, we're handing keys to a certain platform vendor. I don't want to be tied down like that.

Keep using Swift for app development--it's great at that. But keep it far away from research and development.


Swift on Linux is great! I use it for a ton of projects, both standalone as well as for microservices development. Aside from cryptographic operations (provided by CommonCrypto on the Mac) I haven't really encountered any scenarios where it can't get the job done. And, for those, I could theoretically make use of OpenSSL (if I felt like going through that exercise).


Just one data point: trying Swift + TensorFlow was as easy using Ubuntu Linux as it was for macOS.


I don’t think google has any interest in letting swift remain mac only, so in the long run i think we’re safe.

Not to mention the fact that mac doesn’t have any technology on the server, so if any code is supposed to reach production one day, it’ll have to at least run on linux.


> we're handing keys to a certain platform vendor

Swift is open source


Swift is completely supported by Linux iirc, not considering max or iOS specific UI libraries. Windows support is probably not a priority because.. why would it be?


When this article came out in March, I tried Swift and TensorFlow on both macOS and Linux - no problems but it seemed like early days, something to check in on occasionally.

EDIT: there is a new drop for macOS from April 18 which I am now downloading.

I really like modeling with TensorFlow but I have a like-sometimes/hate relationship with Python. If Swift support gets stable and well supported then that would float my boat.


Likewise, I tried _swift_ back when it was first announced. I had trouble finding tools in standard library to handle simple FILE operations. There are many resources on using PythonInterops to get the job done. But kinda defeats the purpose. Maybe these are early days.

That said, I remain optimistic given Jeremy's involvement — he/Rachael/fastai's been nothing but a source of good for ML in general.


You have access to all of libc and Foundation; was this not enough for you?


I am familiar with libc (I will have to look up Foundation). My opinions were based on cursory browsing at best.

To be clear, coming from mostly python/c++, I was looking for something like `import pathlib` or `#include <filesystem>`. Lot of modeling work I do involves boring things like moving files around, plotting them etc.

Again, its likely that I did not look around enough.


Yeah, you are probably looking for Foundation.


At least Python and Julia embrace Windows.


I just hope Fast.ai adopts some kind of style guide for Swift at least. Python version of fast.ai is an unreadable mess, and it doesn't need to be.


It has a style. It is ok if you don't like it. But it is not ok saying it hasn't.

By the way, Jeremy Howard spends several minutes on one of his videos explaining the style and why despite seeming strange it is the right choice for fast.ai. I just can't remember which video, but I would guess it is in dl 2 2019 first video.


Swift large projects takes huge time to compile whether as interpreter wins here. Though there are model building layer on compiler level which Swift offers in Tensorflow which is unique but apart from that I don’t think it is going to hit nail on Python community.


Sounds really interesting, but it seems like this is all due to the graph extraction algorithm needing to use static analysis. Is this such an important requirement that it requires switching languages? I'm not too familiar with how PyTorch works under the hood, but I suspect it doesn't builds a graph (correct me if I'm wrong) and they seem to get good perf. I'm all for new tech, but just wanna get a sense of how impactful this change would be.


> But Python is not designed to be fast, and it is not designed to be safe.

When decisions made by an algorithm can result in real monetary costs, one must always err on the side of caution. Only an irresponsible person would choose Python or other unsafe, dynamically typed language when implementing a crucial piece of business logic.


Python is a fine, accessible language for small programs. ML applications need to be written by data scientists, who are often trained in statistics or math, not software development. For this reason Python is suitable. ML applications built using tensorflow and other ML libs tend to be small -- hundreds of lines, not tens of thousands.


Python 3.6+ does have type annotations and multiple static analysis tools for type checking. Only an ignorant or biased person would yell at Python for flaws it doesn't have. For crucial business logic, one also writes unit tests.


From the link:

> recent versions of Python have added type annotations that optionally allow the programmer to specify the types used in a program. However, Python’s type system is not capable of expressing many types and type relationships, does not do any automated typing, and can not reliably check all types at compile time. Therefore, using types in Python requires a lot of extra code, but falls far short of the level of type safety that other languages can provide.


Unit tests are not an adequate replacement for compile-time checks performed by a compiler. The former is lax and expensive, the latter is thorough and inexpensive.


Language flamewars are silly.

Sometimes python is appropriate for production code, weak type system or no. Sometimes it isn't. As a project gets larger, more complex, and more interconnected with other things, static type systems become more useful.

End of story, surely? Didn't we all know that already?


But you loose so much flexibility. When you start working on a problem, you want your language should give you as little headache as it can with minimum rules and constraints. Then on iteration you can improve like having type annotation, unit tests etc.

You can write perfectly fine production level code in Python. It like lego block. Start with simple and then add on.

Adding compile type checks comes with its own demerits. I guess here Swift is trying to offer more tool chain on compiler level for model building rather than just being type safe.


I’ve used mypy and pyre on large Python 3.6 codebases and they aren’t even close to type systems in conventional imperative languages (C++, Java, etc) let alone the type systems in the ML family of languages where type system innovations are usually born. I’m sure they’ll get better with time, to an extent.

Currently, programmers are forced to pepper the code with lots of ignore annotation comments and redundant union types, just because the type checkers aren’t smart enough.


When you start seeing comments like these, you know the whole static typing thing has gone peak cargo cult.


Have you written significant sized code base with dynamic languages (at least > 50K LOC)? If so how do you manage quickly and effortlessly finding errors like bad typo, passing around bad types etc? How do you refactor your code efficiently? My experience is that dynamic languages are great for small size projects but as the size grows the tax of maintaining things grows much faster than for statistically typed languages. People who have worked on large size projects in dynamic languages have to pay very careful attention to unit testing, for example. Little bit of lax attitude and you are looking at dreaded issues that will be uncovered only at runtime.

So genuinely wish to know what best practices you follow that makes developing large projects in dynamic languages make as inexpensive as typed languages to backup your comments.


From my perspective, I don't need to write a 50k LOC monolith codebase in clojure, which has just a few core types. I have become a fan of its `clojure.spec` data validation system if I do ever end up in such a world.

I've written those huge codebases in java, where our weekly github commits pushed like 5-10k lines of code per user. It would have been worse if it weren't for Lombok, and totally untenable if it weren't for IDE-assisted refactoring. I've since worked on much larger and more impactful problems with smaller clojure teams and seen work consistently get done in 1/10th of the LOCs.

I used to be a fan of static types -- I still am in limited contexts, but I've come to realize that statically-typed languages (especially object-oriented ones) usually lead to projects that _require_ static typing to even be maintainable. In contrast, a philosophy common to functional languages (which don't always have to be dynamic) is:

> "It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures." —Alan Perlis

I think discussions focused around type systems often miss the context of what languages' type systems we're actually talking about, and that context is way more important than making sweeping judgments about types.

I can count _on one hand_ the number of times I've defined a custom type in clojure in the 3 years I've been writing it professionally. Yes, sometimes I've called a hash map function on some kind of a list, but that cost has felt really small (and caught early) compared to the cognitive cost of keeping track of class hierarchies, interfaces, access rules, and annotations. And for what win? I still get NPEs, ClassCastExceptions, and RuntimeExceptions in Java, but now I have a whole class of errors and processes introduced around playing well with its inheritance model.

So, great, it's stopping me from directly instantiating an abstract ApiClient class, and quickly catching that I tried to get an HttpApiClient from an HttpsApiClientBuilder.build() call -- but that pattern would not have existed to present an issue were it a different language


>Have you written significant sized code base with dynamic languages (at least > 50K LOC)? If so how do you manage quickly and effortlessly finding errors like bad typo, passing around bad types etc?

Have you written 50K LOC C/C++? How do you manage finding errors like buffer overflows, use after free, and so on? And did the program have the same functionality as 50K Python, or had 1/10th the features?

(Also: you can run a linter and find typos, and you can have tests and know about bad types. And that's assuming those are actual errors people have and matter in the first place...)


I am curious what do you think when you see some JavaScript developers moving to Typescript or talks about Ruby 3 having types?


Types errors represent such a small class of errors. It’s just puzzling to see what all the hype is about.


In large sized projects, one of the most frequent activities you do is refactoring. For every refactoring, type errors often constitutes the largest class of errors by volume during initial development phases.


Then again, a language like Python keeps projects smaller, by not taking 100 lines to do what you can do in 10.


Nope, all computation is representable in terms of types.

https://en.wikipedia.org/wiki/Curry–Howard_correspondence


Which is neither here nor there, as a contrived case.

In actual languages people us e(e.g. that have more than 2% of the job market), type errors caught by the compiler don't catch "all computation" errors.

Not even in Haskell...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: