Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: A dependently-typed programming language with static memory management (github.com/u2zv1wx)
456 points by u2zv1wx on May 23, 2020 | hide | past | favorite | 78 comments



You are almost 3500 commits in to this project already, with no other contributors? The dedication to this is incredible. It sounds super compelling and I wish you luck!

I hope you get more recognition to encourage you to continue. These new languages are so important in pushing forward our tooling and our understanding of workable abstractions as an industry. I am only recently getting into functional programming and it has already fundamentally changed a lot of my perspective on OO, composition vs inheritance, immutability, pure functions, etc.

Have you considered trying to make this a little more accessible (a bit of a focus on the marketing side?) I would really like to digest the main benefits of your language more easily. One example is that when skimming your readme, the first thing my eye is drawn to is the bulleted section which talks about other limited memory management solutions. But if I didn’t read the small text before it that says “neut doesn’t use these” I do not understand the compelling feature of the language to dive in further.

Have you followed Zig Lang? Andrew Kelly is doing something similar (in so far as he’s building a new language focused on memory management) and even though I don’t use it, I see the value in this work and support him on Patreon.

I would be happy to help you with reviewing the copy on the readme from the perspective of someone who is technical but not super knowledgeable in this domain to help you summarize the key concepts and advantages up front. Reach out to me with the email in my profile if you would like to discuss!


Personally, I found the initial intro surprisingly easy to read and digest. I got double surprised when I noticed this is written in Haskell (IIUC) - for a thing created by a Haskell guy, this is incredibly readable and easy to digest. Typically you'd get swamped by monads, morphisms, equations and deep CompSci references. I can only say thank you for really putting an effort into making the readme approachable. I know how much work a reader-oriented readme takes. And another huge congrats for releasing and publishing on HN! This looks super interesting.


I'd be interested to hear what about it made it easy to read and digest. For example, it starts with

> The (with identity.bind (...)) is the same as the do-notation in Haskell or other languages, specialized to the identity monad.

The next section is called "Types as Exponentializers". The section after that explains

> That is, a lambda-abstraction is translated into a tuple consists of (0) the type of its closed chain, (1) its closed chain, and (2) a pointer to an appropriately-arranged closed function

As a Haskell fan this sounds like perfectly normal Haskell-speak. Is there something about it that makes is more digestible to you than the average Haskell-speak?


There are various reading/learning styles. For a "understand each step before moving to the next" style, local incomprehensibility has strong implications for global. But for a "didn't understand that bit, so skip ahead" style, what matters is recoverability - robustness to localized incomprehension. How "lost" you become when you don't understand some bit.

Here, if you don't understand identity.bind, or even do-notation, you can read on without losing much. Not understanding the section title, is not an obstacle to understanding the section. And so on.

The intro has nice non-local clarity. The dialog structure; having summaries.

I'd not stereotype a "Haskell writing style", but I certainly encounter writing elsewhere which seems to reflect a mindset of "since you've reached this paragraph N, you obviously fully understand, remember, and appreciate the implications of, everything that has been said earlier, so we can happily take this next step without any distracting redundancy, context, motivation, or other annotation". Which... needs to be approached in a particular way to avoid degrading nongracefully.

I also appreciated the intro's "here's how I suggest approaching learning the language".


Heh, sorry, you're totally right, parts are definitively 100% Haskell-speak for sure (and I totally jumped over them with 0 understanding); what I really meant and failed to clearly convey, is that there are the other parts, including esp. the intro paragraphs, that (at least) to me personally were very approachable. Looking over again:

"Neut is (...) memory management" - first para ok, incl. the bullet list - specifically what original commenter complained about, for me was no problemo

"Theoretically (...) terms of the type" - 100% Haskell-speak, no slightest idea what it means, skipped; but relatively short optically, and author still has some credit after first para, so I still take a look further

"Practically, this means (...)" - cool, I can understand again! I think that's where I got hooked: now I know the author sprinkles their Haskell-speak because of an unstoppable inner need to be precise and probably also so that the tribe doesn't expell them, but they also seem to have this amazing spark of humanity and empathy towards common programmer folk, so that if I skip over the mysterious ivory-tower spells, I will find more nuggets of commoner-speak for me. Knowing a bit more about writing, I believe they additionally probably consciously put a lot of effort to write in an approachable way. Which is a surprisingly tough skill. This is a mix I honestly bow to. I can try and list more paras that were cool for me if you'd specifically like me to.


That is a great summary from an outsider perspective, I hope when I post my in-progress language release (on July 1), someone like you is around to give such an honest read.

P.S. Also, I only put that July 1 date as further personal reinforcement that I am going to meet my self imposed release deadline. Gotta keep the pressure on or I’m sure I’ll hold it as a private project forever.


Thank you for kind words! Yes, I think 3500 commits are enough to make things that might need further explanation too obvious to the author. A feedback from someone like you is really helpful. Please feel free to open an issue (or send me an email if you'd like).

I know Zig language (I've seen it before in an article that compares the binary sizes of hello world of various languages IIRC), though I don't know it in detail. Sounds interesting, I'll take a look at it.


We've entered an era where new languages are almost never used. The Go and Rust story are exceptions. This is the long thin tail of new language adoption.


> We've entered an era where new languages are almost never used

I disagree. The same was said when perl dominated before Ruby and Python came along. And Pascal, C and C++ before that. Nowadays Nim, Crystal, Rust, Go, F#, D, Zig, JavaScript, Haskell, and more are all viable options for application development.

We have more viable programming languages than ever.


I live in a moderately large urban area in france (~800k inhabitants) and for your list ("Nim, Crystal, Rust, Go, F#, D, Zig, JavaScript, Haskell") I've never seen any local job ad for any of those except JS and Go, and only saw a bit of Haskell at the university.

From a quick glance over a few dozen pages of job ads, it's mostly Java & PHP, with a bit of JS, Python and C# here and there and some C/C++ in embedded. Saw 2 node.js ads, as well as a COBOL and a Kotlin too.

So, yes, maybe they are viable. But.. used ? they're blimps in the radar next to the big ones.


In my city in the US:

+ Node.js is plentiful

+ Golang is up and coming

+ Some elixir

+ Haskell is pretty rare but it shows up as a secondary language

+ Java and php are plentiful but these tend to be large, older corporate gigs

+ Rust is rare

+ No crystal/nim/f#/zig/D that I've seen

Obviously anecdotal and your case my differ.


Sadly just because a language has a quality implementation and lots of libraries available and so is suitable for application development from an engineering point of view doesn't mean that PHBs are going to allow it to happen at the workplace, for various business reasons.


Languages like this are usually not meant for mainstream adoption.

It's more of a research language useful as a vehicle for exploring new concepts and approaches. Target audience is probably other programming language researchers.


Jean Yang made a great analogy here: https://twitter.com/jeanqasaur/status/1262833050473259009

> Programming languages researchers are like fashion designers and "research languages" like Haskell and Idris are like runway looks. Nobody expects people to go around snakes on their bodies. They're pushing the boundaries of art and science and showing what's possible.

And just like in fashion, the runway looks eventually change what people are wearing. Rust’s memory management would never exist in its current form if Cyclone hadn’t already shown it was possible.


Interesting analogy, but people are using Haskell, Prolog, etc. in production... if I were to stick to the comparison, I'd suggest that projects like Lighttable are more akin to the runway outfit.


People also wear high fashion for certain events.


Hm. Interesting, but I’d consider the languages with actual production use to be the “ready to wear” lineups. Where as things like this might be the haute couture runway looks.


That'd probably be your languages with large/cohesive libraries (I'm thinking ready to wear as in large retail).

Production-history without expansive libs is probably bespoke and half a complete library (eg rust/go, having half of what you want, and missing the other half) is boutique :-)


Sure. But rust was not possible without all the research and little prototype languages. I think it’s good that many people are trying many things (as long as they clearly report their findings).


People have been saying this for many decades, and they've been wrong ever since. (I ran across, at one point, someone mentioning in the 1960s or so that they were loathe to design a new programming language because how could it possibly compete with existing languages like assembler, FORTRAN, or Lisp?) There are constantly new niches opening up and new areas of growth that a programming language can grow up within. It's a big world out there. There's probably a language stumbling backwards into success right now in China neither you nor I have heard of.

As so often, Alan Perlis said it best: "In a 5 year period we get one superb programming language. Only we can't control when the 5 year period will begin."


I'd say this claim falls victim to a survivorship bias viewpoint. I suppose new languages are now used with more or less the same uptake as in any earlier "era" of software. (I.e. tons of "old" languages, which were new when published, also died with basically no users. Conversely, there's also Nim, Zig, TypeScript, Crystal, .......... just from the top of my head. Totally making their rounds and worth watching, and with people trying to use them.)


Kotlin, Dart, Typescript, F# and Swift. Although all of them (like Go) are backed by big companies.


This looks amazing, albeit waaay over my head.

The introduction says it "is made possible by translating the source language into a dependent variant of Call-By-Push-Value.". What makes such a translation impossible in the existing languages you mention (Haskell/OCaml/etc)? Are there restrictions on expressivity not present in those languages/augmentations to their type system needed?


Thank you for your interest. I think this is not a possible-or-impossible problem, but a what-to-choose problem. Both Haskell and OCaml happen to choose GC today, but they could have chosen an alternative memory management system. Memory management of a lambda-calculus based language can be realized in a few ways, and my project can be thought of as a one that suggests an alternative way.

Having said that though, I don't think the approach in Neut can be directly applied to Haskell or OCaml. This is because mutable variables won't make sense in this memory management system since every use of a variable (theoretically) creates a distinct object.

I hope this answers your question..


Isn't CBPV more of a way of accounting for both strict (call-by-value) and lazy (call-by-name) evaluation in the same programming language? Not sure how that would help wrt. static memory management.


Call by Push Value does account for both CbV and CbN language semantics, but the reason it can do so is by basing the language in a rather particular categorical semantic.

Based on the linked intro, it would seem that the language is leveraging the ‘computational’ types that are an intrinsic part of the CBPV semantics to force the ‘thunking’ of the dependent types. Effectively, all of the types become ‘functions’ from the CBPV lense and those functions are linear by construction (it is a categorical, as in category theory, feature of the underlying semantics). Although not cited, it seems like the underlying type theory takes notice and inspiration from Levy’s work on adjunction models for CBPV.

I tried to find a way to make this comment that wasn’t too acedemic sounding, but I think I missed the mark.


You give excellent commentary


I would guess the main problem is that this approach will be slow, since it keeps making copies of all data. If you have a garbage collector you can just pass pointers around.


Which Neut gets around with its borrowing facilities, as described in the README.


The introduction at the top of the Readme is great – it succinctly explains what the project does, how it relates to existing languages, and why the reader should care.

Way too many projects on GitHub and the likes don't do this well (or at all).


Have you considered submitting this work to any computer science conferences? PLDI is the obvious first candidate.


Possibly, but not sure. Currently my life is in a peculiar state (?) so I think firstly I need to stabilize it somehow.


This is the most interesting language introduction I've seen in years. Work like this gives life meaning. You have a larger purpose for stabilizing your life; our hopes are with you.


If everyone had to wait until their life settled down before publishing science would be behind by a hundred years.


If I understand correctly (and I'm not sure I do), neut achieves its memory management by not sharing data between structures, and instead copying it. This works well when all data structures are immutable.

However, I feel like it would be more performent to just use reference counting here. After all, incrementing a counter must be faster than a memcpy, no? Since immutable values can't create cycles, no memory will be leaked.


I haven’t done a deep dive into the implementation, but based on the theory employed, particularly the linear nature of CBPV’s computational types, the copying would most likely be elided in all cases except for when a programmer writes a function which explicitly copies data to a new term.


I can't believe my good fortune to have a wonderful reader like you, by the way.


> Since immutable values can't create cycles, no memory will be leaked.

this is not generally true. in a lazy language, you can certainly say:

    main = mdo
      y <- f x
      x <- g y
      return y
the requirement is simply that you don't inspect the value of x until later (f makes something, y, to use later; when you use y, it inspects x). x and y now have references to each other.


> in a lazy language, you can certainly say:

Not in haskell. Which language do you specifically have in mind?


Yes indeed in Haskell. You have enable RecursiveDo for that particular example to work. I'm not sure why ferzul chose a recursive monadic computation when

    let { x = 0:y; y = 0:x } in x
seems to demonstrate the same thing.


The concept of linearity is referenced 21 times in various sections, but not in the introduction. As a first-time reader, I would appreciate if the introduction were to mention the role of linearity in this language before I encounter it in a subsection.


Thank you for your kind feedback. I'll try to come up with a concise way to mention it in the introduction.


Very cool, I'll dig into this later. I've been meaning to gain a better understanding of dependent typing. Been working on an SML clone with first class modules (ala 1ML/F-ing modules), and I understand that the module language is typically modeled with dependent types. It's been challenging to try to enable modules-as-existentials without too much compiler-side hackery going on.


Since this seems to support inductive types, would it be correct to say that it's based on the calculus of inductive constructions (CIC), not the plain CoC? Basing a dependently-typed language on pure CoC (plus some other features weaker than inductive types; it's been proven impossible in CoC alone) is an open research problem (see e.g. Cedille and Formality).


What do you mean? CoC definitely supports impredicative encodings, and as far as expressivity goes, this allows to implement a lot of programs. Proving them correct is another matter, but that's not what you implied. Also, CIC is notably not Turing-complete.


I was referring to https://link.springer.com/chapter/10.1007/3-540-45413-6_16; induction is not derivable in pure CoC. I suppose yes technically you could base a dependently-typed language on it if you didn't mind not supporting induction, but I can't imagine many people wanting to use it, as it would be quite limited compared to one supporting inductive proofs. Or at least when people think of "dependently typed programming language", they're usually thinking of something at least as expressive as CIC.


Wow, this is mind-blowing. Great respect to the author.

May I ask which preliminary knowledge or directions I need to look at, in order to have a decent understanding of the codebase?


Super cool! Any idea how big the runtime is? Is it easy to run a libc free version? I honestly feel like the embedded/firmware slice of the stack desperately needs a next gen language.


What you need are the two functions `malloc` and `free` that have the following signatures:

  declare i8* @malloc(i64)

  declare void @free(i8*)
So if you have implementations of them written in LLVM IR, I think that's enough.

Disclaimer: I'm not very good at low-layer concepts. Correct me if I'm wrong here.


Is there a big runtime that’s linked to support all the language features? Like what is the fully statically linked size of

int main(void) { return 0; }

If it’s just providing free/malloc symbols, that’s wonderful!


This reminds me a lot of Carp: https://github.com/carp-lang/Carp


TLDR dependent lambda calculus using linear type system memory management with LLVM backend. Oddly, the source language a [dependent] lambda-calculus or a [dependent] Cartesian closed category would seem more restrictive than the linear types or closed monoidal category used to implement the compile time memory management system.


If you're not familiar with linear types, a fun paper to read is "Linear Types Can Change The World!"[1] although I might just be partial to it because of the wonderful name.

[1]: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.5... by Philip Wadler


For anyone interested in Wadler’s publications on Linear Logic, his homepage has a lot of pdf versions of his work in the area.

[1]: https://homepages.inf.ed.ac.uk/wadler/topics/linear-logic.ht...


This is great, thanks!


As I mentioned in another comment, the particular category he is deriving the semantic in is dictated by the CBPV semantics. I certainly agree that a closed monoidal category is the appropriate ambient category for the standard linear type theory, the structure required to reach the full semantics of CBPV, as designed by Levy, require a narrower framing. Although, it should be noted that there is only a small amount of formalism to work through and the adjunction model of CBPV can be seen as right adjoint to the linear/non-linear systems Benton describes in his 94 paper with Wadler.


I didn't find an explaination of how it deals with variables (e.g. strings) allocated on the stack, does/how borrowing work for those?


Ability to use the stack is a very important performance oriented feature. Similarily, ability to nest structs or other data structures without intermediate pointers. I hope this could be solvable!


It's kind of hard to decode the explanation given that it spends a lot of text on useless formalism instead of substance, but it seems like this language has three fatal problems:

1. Borrowed pointers are not a first class concept, but just syntax sugar over returning a parameter from a function, i.e. &T or &mut T are not actual types in Rust parlance

2. There is no mention on how to achieve safe shared mutable data or even just shared read-only data, i.e. the equivalent of Rust's Arc<Mutex<T>> or Arc<T>, which probably means the language has no support

3. It seems there is no way for a struct/record/tuple to contain another non-primitive data type without the latter being allocated on the heap

So as far as I can tell aside from the dependent types this language is much less powerful than Rust and cannot fully utilize the CPU, and hence far from the goal of having a perfect Rust+dependent types language.


I don't think it aims to be "as powerful as Rust", and I think that's okay. It's decidedly a narrow, pure, opinionated language, while Rust has practicality as one of its main priorities. This language is more like Haskell, or even more Haskell than Haskell, in that while some people might want to do real projects in it, its main purpose (from what I can tell) is to explore a concept. Nothing wrong with that.


Just because you're not capable of interpreting the formalism doesn't make it useless or not substantive. This readme wasn't written for you it was written for a type theorist.


Eh, I'm not sure about that. A type theorist would expect to see a clear description of what translations are performed as part of compiling this language, and a rigorous argument that this helps solve a real issue, e.g. wrt. memory management. It's hard to see either in the linked readme - it reads like a description of some promising, rough experiment, but not quite fully worked out in a way that would make it clearly understandable to uninvolved folks. I'm not saying that there's anything wrong with that, and it's definitely on par with many Show HN's. Just trying to call for some perspective.


The parent comment comes off as arrogant but so does this one. Oftentimes concepts like these are presented in language that's more impenetrable than necessary. I don't know if that's true here but it's a genuine problem.


From a Rust perspective, this may be true.

But from a Haskell perspective, this language promises to completely eliminate the garbage collector, which is a big deal.


Is it something that could potentially be back-ported to Haskell and friends? I'm afraid it was a bit over my head too


Well, they say that:

> Practically, this means that you can write your program in the ordinary lambda-calculus without any extra restrictions or annotations, and at the same time are allowed to control how resources are used in the program.

You can rewrite any Haskell program into the ordinary lambda-calculus, and this can be automated. The only possible problem might be that Haskell uses lazy evaluation, and I don't expect this to interfere with the control of resources but I could be wrong.


Haskell also allows you to share heap-allocated immutable data structures between threads, right? I think someone said this language doesn't support reference sharing across threads, but that could've been wrong


Yes, indeed threads could be problematic.

But as long as you're not storing closures in those shared data-structures, these structures will always be acyclic, and could be collected through simple reference counting. Perhaps that's an acceptable compromise.


isn't:

    let x = 1:x
cyclic, but without closure? unless closure is so broad as to capture essentially any haskell value. it's not going to be obvious when there is a cyclic reference (e.g. if you use `let x = repeat 1` instead of the definition above)


You could have easily made a comment with the same substance without the arrogance.


Why should any of what you talk about be the responsibility of the language as opposed to the linter, an annotation library and static analyzer, a library, or clean design? Remember the Unix philosophy.


The results of that philosophy in terms of CVEs shipped to end users can be seen on https://www.cvedetails.com/vulnerability-list.php?vendor_id=...


So you have linked me to a list of security vulnerabilities - are you suggesting all security vulnerabilities need to be addressed by the language spec and the compiler? I doubt such a wide and diverse range of problems be addressed at such a low level.

At a glance, most of those look like bounds checking problems. tcc (Tiny C Compiler) allows you to compile in bounds-checking with all pointer dereferences.

C++ STL implementations generally have bounds checking all over the place - this requires compiling in debug mode IIRC, but it is evidence of the possibilities of safety without language or compiler support.

Lisp-style meta-programming can get you basically anything you want from a language without language or compiler support.


> So you have linked me to a list of security vulnerabilities - are you suggesting all security vulnerabilities need to be addressed by the language spec and the compiler?

These vulnerabilities are typically in code implemented in C or C++.

I am suggesting that if you want to have a secure system, they have to be addressed by the language; if you are happy with systems that have exploitable memory management bugs, there are lots of existing UNIX variants to choose from.

> I doubt such a wide and diverse range of problems be addressed at such a low level.

May I suggest reading about some low-level languages that have been used in production and address these problems to varying degrees of success, for example ESPOL (from 1960s), Mesa/Cedar, Modula-2, Ada, or Rust.

https://en.wikipedia.org/wiki/Executive_Systems_Problem_Orie...

> tcc (Tiny C Compiler) allows you to compile in bounds-checking with all pointer dereferences.

This sounds impossible, there's not enough information in the C type system to know in the general case what the bounds are.

> C++ STL implementations generally have bounds checking all over the place

This wasn't true last I checked for libc++ (the most "modern" implementation), and isn't applicable for any non-STL container in a program anyway; nothing prevents using a plain C array.

> this requires compiling in debug mode IIRC

I'm not aware of anybody who enables this in production because the culture of C++ is all about performance.

> but it is evidence of the possibilities of safety without language or compiler support

Language or compiler support would reduce the performance overhead, because it makes it easier for the compiler to elide checks that can be statically proven to always succeed.

> Lisp-style meta-programming can get you basically anything you want from a language without language or compiler support.

Meta-programming cannot remove features from a language, which is typically the problem here.


   > This sounds impossible, there's not enough information in 
   > the C type system to know in the general case what the 
   > bounds are.
Good point. My thought was that the runtime could keep track of all the mallocs and then make dereferences check whether the memory accessed was in a valid region of the stack or the heap. But this doesn't sound workable. You'd have to resolve every single dereference to a list of ranges in memory (`O(log n)` for every `foo[bar]` I would imagine).

Perhaps every pointer could be implemented not only as a raw pointer but one with a "valid" range attached, which indicated how many bytes prior and following the pointer that were part of the block original block the pointer was calculated as an offset from. Any dereference would check that it is within the range, and that is only O(1) for every `foo[bar]`.

   > This wasn't true last I checked for libc++ (the most "modern" implementation)
   > ...
   > I'm not aware of anybody who enables this in production 
   > because the culture of C++ is all about performance.
This conversation is happening in the context of what is possible without language (and maybe without even compiler) support. The original comment I replied to was saying neut had fatal flaws because it didn't address borrowing and lacked some features that Rust has. The fact that some people refuse to use C++ in certain ways is interesting but doesn't detract from my original point.

   > Meta-programming cannot remove features from a language, which is 
   > typically the problem here.
The simplest type of Lisp macros only add features, but it is possible to create a new kind of "top-level context" (for want of a better term). Your macro system does have to be aware of all the primitives in your Lisp dialect, though, for this to work. There is a certain term for this that I can't recall at the moment.


> Perhaps every pointer could be implemented not only as a raw pointer but one with a "valid" range attached

The main problem with this is that it's incompatible with every existing system C ABI.

There's also the problem of real-world C code converting pointers to integers and back again, but the compiler could define uintptr_t and intptr_t accordingly and code that uses other integer types is broken anyway.

> it is possible to create a new kind of "top-level context"

I'm not familiar with that, it sounds like quite some effort but I'll grant you that likely it can achieve what you claim.


I'm pretty sure you're right on all counts; I'm not even sure that shared immutable borrows would work under the very simple transform given in the README.

Edit: Also, the user-facing language doesn't seem to have linearity at all, i.e. all values may be cloned. But linearity is something that people use to guarantee certain invariants all the time in Rust (and ATS, I've heard), so this seems like a misstep. It also means that memory allocation is implicit, which makes things less predictable for users.


yes, i found the first example quite surprising. there doesn't seem to be any request to copy the string; it is simply unintuively copied/produced three times. i would prefer the language have some kind of `dup` or something, which gives you an additional reference to the object.

as it stands, it has transferred allocation from the runtime to the compiler. but the objective is to put it into the hands of the developer, since only then is it known when reading and writing code.

i begin to suspect that a typed joy may be more practical (or rather: match my desired improvements) than clarifying allocations in haskell.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: