> In Rust, a boxed pointer sometimes includes an extra word (a “vtable pointer”) and sometimes don’t. It depends on whether the T in Box<T> is a type or a trait. Don’t ask me more, I do not know more.
For those wanting to know more about this, the idea is that types whose size is unknown at compile-time receive this two-word representation. I tend to refer to these as "fat pointers", which is terminology from Cyclone (though Cyclone's fat pointers serve a different purpose). More documentation on these can be found at https://doc.rust-lang.org/beta/nomicon/exotic-sizes.html#dyn... and in the section in the book on slices (terminology taken from Go, whose slices are similar though with an extra word) https://doc.rust-lang.org/book/second-edition/ch04-03-slices...
> terminology taken from Go, whose slices are similar though with an extra word
Interesting; I've always thought of Rust slices as being rather different from Go slices. A Rust slice is always used through a reference and doesn't own its data, whereas a Go slice is not generally used through a pointer and sometimes points to a heap allocated section of memory, so it's basically the union of Rust's vector and slices.
Tangentially, the inability to tell whether some value is heap allocated or not from the type is one of my main gripes when working with Go as opposed to Rust; in Rust, I can be sure that `Vec`, `String`, `Box`, `Rc`, and `Arc` are all heap allocated and that slices, arrays, `str`, `&T`, and `&mut T` are not. In Go, slices and pointers might be heap allocated--or they might not.
> in Rust, I can be sure that `Vec`, `String`, `Box`, `Rc`, and `Arc` are all heap allocated and that slices, arrays, `str`, `&T`, and `&mut T` are not.
That's not really true, because you can easily create e.g. a `&str` out of a
`String` or a `&[T]` out of a `Vec<T>`.
The String and Vec objects live on the stack in the same way, but they contain pointers to heap data. I think the grandparent was trying to get at the fact that &str etc don't force things onto the heap nor do they keep things on the heap alive.
Yep, from looking at all the comments in response in my original comment, it seems like I didn't do a great job explaining what I meant here; the basic idea is that if I'm trying to optimize my program by minimizing heap allocations, I can safely ignore any instances of &str, &[T], etc. and just focus on String, Vec, etc.
`String` copies are always `String`s and will live on the heap. AFAIK there are no stack strings (plain `str` which I guess is what you mean?)
IIRC small fixed-size byte arrays (`[N; u8]`) are sometimes allocated on the stack depending on their size, but they are plain bytes and not full-fledged UTF8 strings. You can convert them into `String` but that would heap-allocate them.
To expand on this: in Rust copy strinctly means there is a new owner that is tasked with deallocating that data once it goes out of scope. Move is the same but the ownership is transferred (thus the old owner is no longer responsible for deallocating anything) insted of having a new copy and an additional owner. Copy always results in a new object of the same type.
& types are always borrowing. You can't copy into a reference since references are just borrowing of data, and owners of references won't deallocate anything (since they assume that, as long as they can hold an &, the data they reference is still alive, which is true because owners can't deallocate anything if there is a borrow in place).
EDIT: I was wrong. As sibling comment says, you can convert a stack-allocated fixed-size array slice into `&str` with `str::from_utf8`.
> whereas a Go slice is not generally used through a pointer
I think this is referring to a syntactic difference more than an implementation difference. In Rust, a &[u8] (usually called a "slice" but maybe more technically a "slice reference") is a pointer + a length. This is basically the same as Go's []byte, which is a pointer + a length + a capacity.
Rust also sometimes uses the [u8] type (without the &). This is an "exotic" type, in that it has no fixed size. It refers to the bytes inside the slice, but it's not really a pointer to them -- it is the bytes themselves, however many of them that might be. This mostly comes up when you're dealing with generic traits like AsRef or Deref, which will put the & back in all of their method signatures.
I'm not sure that I'd agree that `[]byte` in Go is "basically the same as `&[u8]` in Rust; making a `&[u8]` will never cause a heap allocation, which is what I was trying to get at with my original comment. If I want to track down all of my heap allocations in my program in Rust, I can safely ignore all of my slices, whereas in Go, I have to carefully reason about each usage of one.
Got it. In that sense a Go slice and a Go array pointer (a &[n]byte) could have the same effect, is that right?
What I wanted to emphasize was that whether you're reading bytes through a Go []byte or a Rust &[u8], the same "number of hops" is happening at runtime.
Normally, it isn't! Unfortunately, "good enough" is relative, and for some applications this is vitally important.
I'm not trying to knock Go's performance here; from a naive standpoint, GC'ing only some pointers is better than GC'ing all of them like you have in more traditional garbage-collected languages, but it's still easier to know what exactly is being heap allocated and what isn't in a language like Java precisely because you know that all objects are on the heap. From what I've seen of low-level optimizations in Go code, it relies heavily on techniques like generating flame graphs to analyze where allocations are occurring, which IMO isn't a very good workflow, whereas in Rust you could do this much more easily by just looking at the types that are used. I don't think this approach is necessarily incompatible with garbage collection; theoretically a language like Go could have separate vector and slice types like Rust does, and I think that would make these types of optimizations much easier!
(I'm also not sure why you were downvoted for asking this; it's a perfectly reasonable question)
I wonder if you have seen the OpenHFT project written in pure Java for high frequency trading (https://github.com/OpenHFT). Would 175 _million_ trading transactions per second on modest hardware be considered good enough? Check out the Chronicle log in the same project, that persists tens of millions of records on disk.
All it takes is a basic understanding of cache architecture and of generational GC, and simple data structures.
Sure, for high frequency trading, I think that's good enough! On the other hand, if I'm writing an operating system or a device driver, getting a GC'd language to be "good enough" is a very different type of problem.
As an aside, I don't think Java actually suffers from the specific problem that I was mentioning in my original comment, namely that it's hard to tell what's on the heap or not. I was under the impression that all objects on Java are on the heap, which makes it trivial to determine whether something is heap-allocated or not based on the type like in Rust.
> On the other hand, if I'm writing an operating system or a device driver, getting a GC'd language to be "good enough" is a very different type of problem
Nicklaus Wirth's Oberon OS (written in Oberon), Microsoft's Singularity OS (written in a variant of C#), the Mirage Unikernel written in OCaml, these are all examples of OSs written in GC'd languages. I am not aware of performance being an issue in any of these cases. Oberon was extensively used at ETH, and the components of Mirage that I am aware of (such as their OpenSSL and DNS) are competitive in performance with their C counterparts.
> I don't think this approach is necessarily incompatible with garbage collection; theoretically a language like Go could have separate vector and slice types like Rust does, and I think that would make these types of optimizations much easier!
Absolutely. It's not a GC issue, it's a design issue. Adding this kind of control would make the language harder to use.
Making the hard case easier to handle for experts makes the easy case harder to handle for everyone.
There's problems when you're taking a slice and storing it for a potentially long time. If the slice is actually a subslice of a super large array, the backing slice will be kept around until the subslice goes away (assuming the implementation doesn't try to be clever about things, but it probably doesn't).
If the Go compiler can't tell anything about the length of storage of a slice, it'll end up in heap. The stack is usually only used when escape analysis determines that the value does not survive the function call, atleast IIRC.
Slices should be capable of being partially deallocate so long as the backing arrays are not referenced anymore.
Go doesn't have a generational & compacting GC (where allocation can be made really efficient: just bumping a pointer), hence heap allocation are expensive in Go (but idk the details, and maybe they're not as expensive as they are in C or Rust).
Then to avoid performance penalties, you need to reduce allocations to the minimum, but since Go use escape analysis to decide whether to allocate on the heap or not, you don't have full control on what is heap-allocated or not, and avoiding allocations can be quite tricky.
Rather than being a comment on where each was stored, my comment was intended to highlight how both Rust and Go slices are fixed-size pointer+metadata "windows" into some underlying array.
Fair enough! Not having done any PL work in a while, I tend to think of types in terms of the properties of how I use them rather than their implementation, so I was just surprised to see them compared this way.
Isn't it the case that the compiler allocates objects on the heap if the reference escapes the function, and allocates on the stack if the reference doesn't?
The difficulty is knowing when that happens, and you're probably guessing wrong (the -m gcflag will tell you).
Furthermore fitting Go's theme the escape analysis is pretty simplistic, so there are many cases where it will somewhat unexpectedly assume escape (note: link is from 1.15, some have been fixed since like the …arg one or the slice assignment): https://docs.google.com/document/d/1CxgUBPlx9iJzkz9JWkb6tIpT...
As I mentioned in a couple sibling comments, it looks I didn't do a great job explaining what I meant here, but my basic point was that if I want to optimize my program by minimizing heap allocations, I can ignore all `&str`, `&[T]`, `&T`, etc. and just focus on `Vec`, `String`, `Box`, etc. In Go, there isn't any clear boundary like this, so I'm forced to reason about every slice and pointer to determine if they're doing heap allocations or not.
They are both pointers, but `b` is an owned pointer (it "owns" a heap allocated value) and `r` is a borrowed/reference pointer (a "pure" reference to something (heap or stack) that it doesn't own and isn't responsible for).
`b` cannot be moved, mutated, or deallocated until `r` is gone. When `b` goes out of scope the heap value it points to will be automatically deallocated, unless `r` still exists somewhere (saved off in a struct for example), in which case the program won't compile.
They both point to the same place. However, r doesn't have ownership, so when r goes out of scope, the memory won't be freed. However, b does, and so when b goes out of scope, the memory will be freed.
ok, but how does it link to where the value is allocated ?
My question was, is it wrong to say that `b` isn't heap allocated either since : «Here, b is on the stack (or in a register), but is pointing to something on the heap.»
It's wrong to say that b isn't heap allocated because it's not stored on the heap. &Ts can refer to something anywhere, heap or stack, and can also be anywhere, heap or stack. A Box<&i32> is going to have a &T on the heap.
It's a very important topic to understand to be productive in Rust.
My knowledge of C made learning Rust so much harder for me. It's really hard to stop thinking in pointers. While Rust's references are technically implemented as pointers, for the purpose of "fighting with the borrow checker" it makes more sense to think of them as read/write locks for regions of memory.
Yeah, interestingly I think it's hard to understand what is going on with C because `T*` pointers can be used for many things. I found it easier to go to C after doing Rust because I had a deeper understanding of the semantics behind them. Was frustrating though because it never caught my mistakes!
I spent a whole day struggling with a bug because I equated &[T] with *T and thought you could cast one to the other (for interop with C++ code). It took me too long to figure out that &[T] is two words long, but now its obvious. I'm not sure where I thought the "length" part was being stored.
For reference, the proper way to get a *const T from a &[T] is the .as_ptr() method. The way &[T] (and any other type not marked with #[repr(C)]) is laid out is implementation-defined, so accessing its internals isn't recommended.
As someone who has just spent the last 15 minutes escaping to HN from Rust due to reference errors, this was amazingly useful and actually helped me fix the error I was getting.
If you ever find yourself stuck for too long, drop a code snippet into https://play.rust-lang.org/ and share a link to the code in IRC channel at #rust or #rust-beginners (irc.mozilla.org). Very friendly community.
I second this. I asked a bunch of dumb questions on IRC, Reddit, and the forums and every single time the responses were so patient and helpful.
I work at GitHub and I’ve been telling people that for the future of open source we really ought to be looking at the Rust community, both the amount of automation they have and also their general communication style.
> These 3 types all have equivalent reference types (again: a reference is a pointer to memory in an unknown place): &[T] for Vec<T>, &str for String, and &T for Box<T>.
This seems to accidentally imply that these reference types are for things on the heap. i.e., that &T is borrowed equivalent to Box<T> which is not true. All three of these reference types can point to memory not on the heap. The former two 'usually' don't, while the latter will vary wildly depending on the application.
&[T] are commonly created from stack allocated arrays, and &str are even more commonly created from read only string literals... so I don't think it's correct so say that those "usually" point to things on the heap. (But of course the definition of "usually" could vary, it wouldn't shock me to find out they did 60% of the time).
Or did you mean &T usually points to things on the heap, in which case I should just say it very very commonly points to stack allocated things as well.
&[T] are commonly created from stack allocated arrays,
Really? I would say that in my typical Rust code &[T] is created from a heap-allocated array >90% of the time. Most functions that do not require ownership of an argument will use &[T] and not &Vec<T> (or perhaps S: AsRef<[T]>), since &[T] works for stack and heap memory and &Vec<T> is automatically converted to &[T] through Deref coercion.
E.g.:
fn main() {
let v = vec![1, 2, 3, 4, 5];
blah(&v);
}
fn blah<T>(s: &[T]) {
println!("{}", s.len());
}
When you pass a `Vec<T>` directly to a non-mutating function or method not implemented on `Vec<T>` itself you pass it as a `&[T]`. But more often I pass it as part of a struct so it remains as (indirectly) `&Vec<T>`. However pretty much whenever you use a stack allocated array you use it as a &[T], part of a struct or not. I'm sure I use a heap allocated &[T] more often, but I doubt it reaches 90%.
For &str you have to remember that every string literal in your program is one. When you do `some_String.starts_with("/mnt")`, `println!("hi there {}", name)`, etc you are using a new &str. I suspect most programs use more static strings than dynamic Strings (particularly since Rust isn't heavily used in GUIs yet).
> The most important thing about Rust (and the thing that makes programming in Rust confusing) is that it needs to decide at compile time when all the memory in the program needs to be freed.
> ...
> When the function blah returns, x goes out of scope, and we need to figure out what to do with its my_cool_pointer member. But how can Rust know what kind of reference my_cool_pointer is? Is it on the heap?
> ...
> If we knew that my_cool_pointer was allocated on the heap, then we would know what to do when it goes out of scope: free it!
The way this is written kind of seems to suggest that Rust will sometimes free heap memory when a reference to that memory goes out of scope, which I think is misleading.
As I understand it, this is not the case, and the point is just that Rust needs to be able to prove that nothing else freed the referenced heap memory at any point where the reference may be used.
Great post! I appreciate the socratic style. I agree with other posters that stuff like this is important to be comfortable with when writing Rust, and more material like this blog post is fantastic. I think if I were to write a part 2 of this blog post, it would be about learning how to read Rust code such that you know what is a reference and what isn't, and more pointedly, when something is behind two references. These things are important for effectively using pattern matching among other things.
With that said, I'd like to add some advice by spring-boarding off a part of the post.
> Converting from a Vec<T> to a &[T] is really easy – you just run vec.as_ref(). The reason you can do this conversion is that you’re just “forgetting” that that variable is allocated on the heap and saying “who cares, this is just a reference”. String and Box<T> also has an .as_ref() method that convert to the reference version of those types in the same way.
While on the surface this is absolutely correct, there is a subtle point missing here: as_ref on Vec/String/Box is implemented as part of the AsRef[1] trait, which is _intended_ for use in generic programming. Aside from intent, practically speaking, using as_ref in a non-generic context can often be somewhat unergonomic, since depending on how you use it, it might require a type annotation (because it's generic!).
Where AsRef is useful is in making the types of parameters to functions a bit more liberal. One particularly convenient place where it's used in the standard library is for defining functions that accept file paths. For example, the type signature of the function that opens a file is[2]:
fn open<P: AsRef<Path>>(path: P) -> Result<File>
Basically, this function says that it accepts a parameter `path` with a type `P` that can be infallibly converted into a `Path`. Why is that convenient? Because lots of useful types implement `AsRef<Path>`. They include OsStr, Cow<'a, OsStr>, OsString, str, String, PathBuf, and of course, Path itself. This is what let's you write `File::open("foo/bar")`. Without the generic `AsRef<Path>` constraint, the signature would look like this:
fn open(path: &Path) -> Result<File>
Which would mean that you'd need to write something like `File::open(Path::new("foo/bar"))` instead.
So what's the alternative to using `as_ref` if I'm here poo-pooing it? In my experience, the typical thing to do here is to rely on something called deref. That is, if `s` is a `String` then `{STAR}s` is a `str` and `&{STAR}s` is a `&str`. In many cases, the explicit dereference (so that's `&s` instead of `&{STAR}s`) can be elided and the compiler will "auto-deref" for you. For example, given a function like the following
fn repeat(string: &str, count: u64) -> String
and a string `s` with type `String`, then
repeat(&s, 5)
will "just work." If you prefer the explicit, then I think the recommendation is to use type specific conversion methods. For `Vec<T>`, `as_slice` will give you a `&[T]`. For `String`, `as_str` will give you a `&str`.
OK, that's enough for now! This rabbit hole goes deeper, but I'll stop here. :)
> One question I have (that I think I will just resolve by getting more Rust experience!) is – when I write a Rust struct, how often will I be using lifetimes vs making the struct own all its own data?
If I were forced to give a pithy answer to this question, then I think I would say (predominantly from the perspective of a library writer): "It's a healthy mix, but if I don't care about performance for $reasons, I can usually ignore lifetimes in the types I define."
I recommend watching this excellent rustconf 2017 talk for more information; it heavily features information on how zero-sized types can be used: https://www.youtube.com/watch?v=wxPehGkoNOw
By opening with "Not true" you're establishing a contrarian position, which puts people -- likely the author, potentially even the reader -- on the defensive, emotionally.
It's sufficient and actually a lot nicer to simply state your point: e.g. "Zero-sized structs are quite useful too."
I agree for the author's point of view, but as a reader I enjoy contradiction and argumentation because that's where I learn most. Then when I see someone starting with `not true` or `I disagree`, I immediately interested in reading more. YMMV though.
> I know in Java you have boxed pointer versions of primitive types, like Integer instead of int. And you can’t really have non-boxed pointers in Java, basically every pointer is allocated on the heap.
That's not true. In Java pointers can very well be allocated on the stack, but the objects that they point to will be on the heap
So the article is pretty consistently misleading/incorrect. A pointer is a data structure like any other, in fact Java is pass-by-value, the pointer values are copied when objects are being passed as function arguments.
To me it seems it's just using different terminology than you expected. I've heard and used the article's version plenty of times and it generally works in context.
For me, the question in Rust is not, what's a reference. But how do I find all functions applicable to a given type? In C/C++, I can just grep the header files for the type name and voilà. I find header-less languages like Rust or Swift really obscure in that way.
Most crates have documentation available as well (generally linked directly from their entry on crates.io) and if it's not online for some reason you can just run "cargo doc" to generate it locally. Randomly taking the "image" crate as an example: https://docs.rs/image/0.17.0/image/
who greps header files in 2017 (or even 2010) ? just fuzzy search a few characters that more or less looks like what you want in your IDE's search box.
I still grep header, as well as implementation, files a lot.
I miss being able to fuzzy search sometimes, but I keep coming back to vim. IDEs just don't cut it for me. They are too slow (Visual Studio 2017 on my desktop from 2011 is unbearable for even starting a new project). And most things I really need to do - in vim they are a few memorized keypresses or a plain shell command in a Makefile away, while in IDEs I have to dig through wizards which really brings me out of the zone.
Not relying on API search much has the huge advantage of not relying on external APIs, which leads to good modularization. As a general rule, a module shouldn't call into other modules much.
And by the way it's the same for OOP: OOP has the advantage of supporting IDE member/method autocomplete (noun first syntax), but it's just the wrong mindset for me and leads to really broken architectures.
> Not relying on API search much has the huge advantage of not relying on external APIs, which leads to good modularization. As a general rule, a module shouldn't call into other modules much.
When writing Rust, you'll likely use the standard library a lot; this rule might not be as applicable as in other languages/environments.
Data structures (vectors, hashmaps, trees), I/O, etc. are all part of the stdlib, and their rich feature sets make an API reference essential. You can certainly write Rust without it, but you’d be missing out on a lot of useful functionality.
I'm a bit confused, can you be more specific as to what you're asking for? grepping for types works just as well for headerless languages as it does for C++, though "finding all functions applicable to a given type" can't be done via grep for either C++ or Rust given that generics exist.
Here, you have sections for: 'Methods', 'Methods from Deref<Target=[T]>' and 'Trait Implementations', and then it seems that if you look through all these sections, you can see everything that can be called on this type, highlighted in the same light brown colour.
It would be quite nice to get an alphabetically ordered list of just these method names, also..
Header diving certainly works well for some C/C++ codebases, but not all I've sadly discovered. The rough analog in Rust might be grepping rustdoc generated documentation, which should at least generally tell you what exists / is publicly exposed. Grepping the full source with extra filters like \bfn\b or \bpub fun\b might be another option.
Like C++, you can also (ab)use intellisense to find a lot of them as well. I should hack more on Visual Rust to improve the situation there...
Generally most inherent methods are listed in the same file as the type.
Trait implementations may bring in other methods and may be listed elsewhere, but C++ doesn't help with this either (C++ doesn't have traits but there are common patterns that provide similar functionality)
Most folks use the autogenerated docs (cargo doc), which list all the methods. But also when reading code it's not hard to grep for impls.
So in comparison to C++ would it be correct to say that Box<T> is like unique_ptr<T>, Vec<T> is Vector<T>, and that references are the same in both languages?
Rust's Box and Vec are analogous to C++'s unique_ptr and vector, yes. But references in Rust really aren't anything like C++ references, given that Rust references 1) are first-class, 2) come in two varieties (mutable/exclusive and immutable/shared), 3) feature mechanically-checked lifetimes, and 4) will be two words in size (rather than one) if the underlying type is dynamically-sized.
what makes them more first-class than C++ references ? eg in C++ given a type T, you can use `std::add_lvalue_reference<T>`, `std::remove_reference<T>`, overload on references, check if a type is a reference to another...
C++ reference types are first-class. But instances of reference types are not first-class values. References are not objects, in standard speak, they do not have a memory location, you can't take their address, you can't pass them to functions (a reference parameter means passing a value by reference, not a reference by value). And so on. Rust references are more like C/C++ pointers or Java references in that they are actual values, and AFAIK Rust functions, like Java and C functions, are strictly pass-by-value.
I think they just meant that Rust references are like normal first-class generic types. Eg, you can nest them to get a &mut &T for example, since they behave more like a pointer in that regard.
C++ references on the other hand are more like modificators of a type, eg you can have a T or a T&, but having a (T&)& does not make sense. (Outside of templates, where it gets folded down to a T&.)
> I’ve written a few hundred lines of Rust over the last 4 years, but I’m honestly still pretty bad at Rust and so my goal is to learn enough that I don’t get confused while writing very simple programs.
This makes me feel hopeless, as I'm only about to start using Rust in my hobby projects after reading the essential book chapters. I hope it's just excessive humility on her part ? At the same time, I'm excited because if I commit myself to mastering such a language it can make me stand out. I still have an opportunity to be an early adopter, and have a head start in a promising new language.
A few hundred lines of the course of four years would imply that the author is idly dabbling with Rust rather than using it in anger. (It also implies that she's been using Rust since before its 1.0 release, which would probably make it harder to get a handle on modern Rust, as it changed significantly back then.) Trust me, it won't take you anywhere close to four years to get proficient in Rust. :)
don't forget stackoverflow ! It doesn't have all the answers easy to google, but /u/shepmaster is doing an amazing job as a curator here. You usually get an answer in less than half an hour (assuming he's awake, but I'm not even sure he sleeps :p)
Really just this year is the first time i've been using it for larger side projects. While I still run into some things with the borrow checker, I find I'm much better at predicting them, and figuring out a strategy around it. Really once you get through the book and are comfortable with types, you really just need to start working on something bigger. You will need to change things and refactor as your original ideas don't pan out. But you learn from it.
For those wanting to know more about this, the idea is that types whose size is unknown at compile-time receive this two-word representation. I tend to refer to these as "fat pointers", which is terminology from Cyclone (though Cyclone's fat pointers serve a different purpose). More documentation on these can be found at https://doc.rust-lang.org/beta/nomicon/exotic-sizes.html#dyn... and in the section in the book on slices (terminology taken from Go, whose slices are similar though with an extra word) https://doc.rust-lang.org/book/second-edition/ch04-03-slices...