|
| 1 | +% A 30-minute Introduction to Rust |
| 2 | + |
| 3 | +Rust is a systems programming language that combines strong compile-time correctness guarantees with fast performance. |
| 4 | +It improves upon the ideas of other systems languages like C++ |
| 5 | +by providing guaranteed memory safety (no crashes, no data races) and complete control over the lifecycle of memory. |
| 6 | +Strong memory guarantees make writing correct concurrent Rust code easier than in other languages. |
| 7 | +This tutorial will give you an idea of what Rust is like in about thirty minutes. |
| 8 | +It expects that you're at least vaguely familiar with a previous 'curly brace' language, |
| 9 | +but does not require prior experience with systems programming. |
| 10 | +The concepts are more important than the syntax, |
| 11 | +so don't worry if you don't get every last detail: |
| 12 | +the [tutorial](tutorial.html) can help you out with that later. |
| 13 | + |
| 14 | +Let's talk about the most important concept in Rust, "ownership," |
| 15 | +and its implications on a task that programmers usually find very difficult: concurrency. |
| 16 | + |
| 17 | +# The power of ownership |
| 18 | + |
| 19 | +Ownership is central to Rust, |
| 20 | +and is the feature from which many of Rust's powerful capabilities are derived. |
| 21 | +"Ownership" refers to which parts of your code are allowed read, |
| 22 | +write, and ultimately release, memory. |
| 23 | +Let's start by looking at some C++ code: |
| 24 | + |
| 25 | +```notrust |
| 26 | +int* dangling(void) |
| 27 | +{ |
| 28 | + int i = 1234; |
| 29 | + return &i; |
| 30 | +} |
| 31 | +
|
| 32 | +int add_one(void) |
| 33 | +{ |
| 34 | + int* num = dangling(); |
| 35 | + return *num + 1; |
| 36 | +} |
| 37 | +``` |
| 38 | + |
| 39 | +**Note: obviously this is very simple and non-idiomatic C++. |
| 40 | +You wouldn't write it in practice; it is for illustrative purposes.** |
| 41 | + |
| 42 | +This function allocates an integer on the stack, |
| 43 | +and stores it in a variable, `i`. |
| 44 | +It then returns a reference to the variable `i`. |
| 45 | +There's just one problem: |
| 46 | +stack memory becomes invalid when the function returns. |
| 47 | +This means that in the second line of `add_one`, |
| 48 | +`num` points to some garbage values, |
| 49 | +and we won't get the effect that we want. |
| 50 | +While this is a trivial example, |
| 51 | +it can happen quite often in C++ code. |
| 52 | +There's a similar problem when memory on the heap is allocated with `malloc` (or `new`), |
| 53 | +then freed with `free` (or `delete`), |
| 54 | +yet your code attempts to do something with the pointer to that memory. |
| 55 | +This problem is called a 'dangling pointer,' |
| 56 | +and it's not possible to write Rust code that has it. |
| 57 | +Let's try writing it in Rust: |
| 58 | + |
| 59 | +```ignore |
| 60 | +fn dangling() -> &int { |
| 61 | + let i = 1234; |
| 62 | + return &i; |
| 63 | +} |
| 64 | +
|
| 65 | +fn add_one() -> int { |
| 66 | + let num = dangling(); |
| 67 | + return *num + 1; |
| 68 | +} |
| 69 | +
|
| 70 | +fn main() { |
| 71 | + add_one(); |
| 72 | +} |
| 73 | +``` |
| 74 | + |
| 75 | +Save this program as `dangling.rs`. When you try to compile this program with `rustc dangling.rs`, you'll get an interesting (and long) error message: |
| 76 | + |
| 77 | +```notrust |
| 78 | +dangling.rs:3:12: 3:14 error: `i` does not live long enough |
| 79 | +dangling.rs:3 return &i; |
| 80 | + ^~ |
| 81 | +dangling.rs:1:23: 4:2 note: reference must be valid for the anonymous lifetime #1 defined on the block at 1:22... |
| 82 | +dangling.rs:1 fn dangling() -> &int { |
| 83 | +dangling.rs:2 let i = 1234; |
| 84 | +dangling.rs:3 return &i; |
| 85 | +dangling.rs:4 } |
| 86 | +dangling.rs:1:23: 4:2 note: ...but borrowed value is only valid for the block at 1:22 |
| 87 | +dangling.rs:1 fn dangling() -> &int { |
| 88 | +dangling.rs:2 let i = 1234; |
| 89 | +dangling.rs:3 return &i; |
| 90 | +dangling.rs:4 } |
| 91 | +error: aborting due to previous error |
| 92 | +``` |
| 93 | + |
| 94 | +In order to fully understand this error message, |
| 95 | +we need to talk about what it means to "own" something. |
| 96 | +So for now, |
| 97 | +let's just accept that Rust will not allow us to write code with a dangling pointer, |
| 98 | +and we'll come back to this code once we understand ownership. |
| 99 | + |
| 100 | +Let's forget about programming for a second and talk about books. |
| 101 | +I like to read physical books, |
| 102 | +and sometimes I really like one and tell my friends they should read it. |
| 103 | +While I'm reading my book, I own it: the book is in my possession. |
| 104 | +When I loan the book out to someone else for a while, they "borrow" it from me. |
| 105 | +And when you borrow a book, it's yours for a certain period of time, |
| 106 | +and then you give it back to me, and I own it again. Right? |
| 107 | + |
| 108 | +This concept applies directly to Rust code as well: |
| 109 | +some code "owns" a particular pointer to memory. |
| 110 | +It's the sole owner of that pointer. |
| 111 | +It can also lend that memory out to some other code for a while: |
| 112 | +that code "borrows" the memory, |
| 113 | +and it borrows it for a precise period of time, |
| 114 | +called a "lifetime." |
| 115 | + |
| 116 | +That's all there is to it. |
| 117 | +That doesn't seem so hard, right? |
| 118 | +Let's go back to that error message: |
| 119 | +`error: 'i' does not live long enough`. |
| 120 | +We tried to loan out a particular variable, `i`, |
| 121 | +using a reference (the `&` operator) but Rust knew that the variable would be invalid after the function returns, |
| 122 | +and so it tells us that: |
| 123 | +`reference must be valid for the anonymous lifetime #1...`. |
| 124 | +Neat! |
| 125 | + |
| 126 | +That's a great example for stack memory, |
| 127 | +but what about heap memory? |
| 128 | +Rust has a second kind of pointer, |
| 129 | +an 'owned box', |
| 130 | +that you can create with a `~`. |
| 131 | +Check it out: |
| 132 | + |
| 133 | +``` |
| 134 | +fn dangling() -> ~int { |
| 135 | + let i = ~1234; |
| 136 | + return i; |
| 137 | +} |
| 138 | +
|
| 139 | +fn add_one() -> int { |
| 140 | + let num = dangling(); |
| 141 | + return *num + 1; |
| 142 | +} |
| 143 | +``` |
| 144 | + |
| 145 | +Now instead of a stack allocated `1234`, |
| 146 | +we have a heap allocated `~1234`. |
| 147 | +Whereas `&` borrows a pointer to existing memory, |
| 148 | +creating an owned box allocates memory on the heap and places a value in it, |
| 149 | +giving you the sole pointer to that memory. |
| 150 | +You can roughly compare these two lines: |
| 151 | + |
| 152 | +``` |
| 153 | +// Rust |
| 154 | +let i = ~1234; |
| 155 | +``` |
| 156 | + |
| 157 | +```notrust |
| 158 | +// C++ |
| 159 | +int *i = new int; |
| 160 | +*i = 1234; |
| 161 | +``` |
| 162 | + |
| 163 | +Rust infers the correct type, |
| 164 | +allocates the correct amount of memory and sets it to the value you asked for. |
| 165 | +This means that it's impossible to allocate uninitialized memory: |
| 166 | +*Rust does not have the concept of null*. |
| 167 | +Hooray! |
| 168 | +There's one other difference between this line of Rust and the C++: |
| 169 | +The Rust compiler also figures out the lifetime of `i`, |
| 170 | +and then inserts a corresponding `free` call after it's invalid, |
| 171 | +like a destructor in C++. |
| 172 | +You get all of the benefits of manually allocated heap memory without having to do all the bookkeeping yourself. |
| 173 | +Furthermore, all of this checking is done at compile time, |
| 174 | +so there's no runtime overhead. |
| 175 | +You'll get (basically) the exact same code that you'd get if you wrote the correct C++, |
| 176 | +but it's impossible to write the incorrect version, thanks to the compiler. |
| 177 | + |
| 178 | +You've seen one way that ownership and borrowing are useful to prevent code that would normally be dangerous in a less-strict language, |
| 179 | +but let's talk about another: concurrency. |
| 180 | + |
| 181 | +# Owning concurrency |
| 182 | + |
| 183 | +Concurrency is an incredibly hot topic in the software world right now. |
| 184 | +It's always been an interesting area of study for computer scientists, |
| 185 | +but as usage of the Internet explodes, |
| 186 | +people are looking to improve the number of users a given service can handle. |
| 187 | +Concurrency is one way of achieving this goal. |
| 188 | +There is a pretty big drawback to concurrent code, though: |
| 189 | +it can be hard to reason about, because it is non-deterministic. |
| 190 | +There are a few different approaches to writing good concurrent code, |
| 191 | +but let's talk about how Rust's notions of ownership and lifetimes contribute to correct but concurrent code. |
| 192 | + |
| 193 | +First, let's go over a simple concurrency example. |
| 194 | +Rust makes it easy to create "tasks", |
| 195 | +otherwise known as "threads". |
| 196 | +Typically, tasks do not share memory but instead communicate amongst each other with 'channels', like this: |
| 197 | + |
| 198 | +``` |
| 199 | +fn main() { |
| 200 | + let numbers = ~[1,2,3]; |
| 201 | +
|
| 202 | + let (tx, rx) = channel(); |
| 203 | + tx.send(numbers); |
| 204 | +
|
| 205 | + spawn(proc() { |
| 206 | + let numbers = rx.recv(); |
| 207 | + println!("{}", numbers[0]); |
| 208 | + }) |
| 209 | +} |
| 210 | +``` |
| 211 | + |
| 212 | +In this example, we create a boxed array of numbers. |
| 213 | +We then make a 'channel', |
| 214 | +Rust's primary means of passing messages between tasks. |
| 215 | +The `channel` function returns two different ends of the channel: |
| 216 | +a `Sender` and `Receiver` (commonly abbreviated `tx` and `rx`). |
| 217 | +The `spawn` function spins up a new task, |
| 218 | +given a *heap allocated closure* to run. |
| 219 | +As you can see in the code, |
| 220 | +we call `chan.send()` from the original task, |
| 221 | +passing in our boxed array, |
| 222 | +and we call `rx.recv()` (short for 'receive') inside of the new task: |
| 223 | +values given to the `Sender` via the `send` method come out the other end via the `recv` method on the `Receiver`. |
| 224 | + |
| 225 | +Now here's the exciting part: |
| 226 | +because `numbers` is an owned type, |
| 227 | +when it is sent across the channel, |
| 228 | +it is actually *moved*, |
| 229 | +transfering ownership of `numbers` between tasks. |
| 230 | +This ownership transfer is *very fast* - |
| 231 | +in this case simply copying a pointer - |
| 232 | +while also ensuring that the original owning task cannot create data races by continuing to read or write to `numbers` in parallel with the new owner. |
| 233 | + |
| 234 | +To prove that Rust performs the ownership transfer, |
| 235 | +try to modify the previous example to continue using the variable `numbers`: |
| 236 | + |
| 237 | +```ignore |
| 238 | +fn main() { |
| 239 | + let numbers = ~[1,2,3]; |
| 240 | +
|
| 241 | + let (tx, rx) = channel(); |
| 242 | + tx.send(numbers); |
| 243 | +
|
| 244 | + spawn(proc() { |
| 245 | + let numbers = rx.recv(); |
| 246 | + println!("{}", numbers[0]); |
| 247 | + }); |
| 248 | +
|
| 249 | + // Try to print a number from the original task |
| 250 | + println!("{}", numbers[0]); |
| 251 | +} |
| 252 | +``` |
| 253 | + |
| 254 | +This will result an error indicating that the value is no longer in scope: |
| 255 | + |
| 256 | +```notrust |
| 257 | +concurrency.rs:12:20: 12:27 error: use of moved value: 'numbers' |
| 258 | +concurrency.rs:12 println!("{}", numbers[0]); |
| 259 | + ^~~~~~~ |
| 260 | +``` |
| 261 | + |
| 262 | +Since only one task can own a boxed array at a time, |
| 263 | +if instead of distributing our `numbers` array to a single task we wanted to distribute it to many tasks, |
| 264 | +we would need to copy the array for each. |
| 265 | +Let's see an example that uses the `clone` method to create copies of the data: |
| 266 | + |
| 267 | +``` |
| 268 | +fn main() { |
| 269 | + let numbers = ~[1,2,3]; |
| 270 | +
|
| 271 | + for num in range(0, 3) { |
| 272 | + let (tx, rx) = channel(); |
| 273 | + // Use `clone` to send a *copy* of the array |
| 274 | + tx.send(numbers.clone()); |
| 275 | +
|
| 276 | + spawn(proc() { |
| 277 | + let numbers = rx.recv(); |
| 278 | + println!("{:d}", numbers[num as uint]); |
| 279 | + }) |
| 280 | + } |
| 281 | +} |
| 282 | +``` |
| 283 | + |
| 284 | +This is similar to the code we had before, |
| 285 | +except now we loop three times, |
| 286 | +making three tasks, |
| 287 | +and *cloning* `numbers` before sending it. |
| 288 | + |
| 289 | +However, if we're making a lot of tasks, |
| 290 | +or if our data is very large, |
| 291 | +creating a copy for each task requires a lot of work and a lot of extra memory for little benefit. |
| 292 | +In practice, we might not want to do this because of the cost. |
| 293 | +Enter `Arc`, |
| 294 | +an atomically reference counted box ("A.R.C." == "atomically reference counted"). |
| 295 | +`Arc` is the most common way to *share* data between tasks. |
| 296 | +Here's some code: |
| 297 | + |
| 298 | +``` |
| 299 | +extern crate sync; |
| 300 | +use sync::Arc; |
| 301 | +
|
| 302 | +fn main() { |
| 303 | + let numbers = ~[1,2,3]; |
| 304 | + let numbers = Arc::new(numbers); |
| 305 | +
|
| 306 | + for num in range(0, 3) { |
| 307 | + let (tx, rx) = channel(); |
| 308 | + tx.send(numbers.clone()); |
| 309 | +
|
| 310 | + spawn(proc() { |
| 311 | + let numbers = rx.recv(); |
| 312 | + println!("{:d}", numbers[num as uint]); |
| 313 | + }) |
| 314 | + } |
| 315 | +} |
| 316 | +``` |
| 317 | + |
| 318 | +This is almost exactly the same, |
| 319 | +except that this time `numbers` is first put into an `Arc`. |
| 320 | +`Arc::new` creates the `Arc`, |
| 321 | +`.clone()` makes another `Arc` that referrs to the same contents. |
| 322 | +So we clone the `Arc` for each task, |
| 323 | +send that clone down the channel, |
| 324 | +and then use it to print out a number. |
| 325 | +Now instead of copying an entire array to send it to our multiple tasks we are just copying a pointer (the `Arc`) and *sharing* the array. |
| 326 | + |
| 327 | +How can this work though? |
| 328 | +Surely if we're sharing data then can't we cause data races if one task writes to the array while others read? |
| 329 | + |
| 330 | +Well, Rust is super-smart and will only let you put data into an `Arc` that is provably safe to share. |
| 331 | +In this case, it's safe to share the array *as long as it's immutable*, |
| 332 | +i.e. many tasks may read the data in parallel as long as none can write. |
| 333 | +So for this type and many others `Arc` will only give you an immutable view of the data. |
| 334 | + |
| 335 | +Arcs are great for immutable data, |
| 336 | +but what about mutable data? |
| 337 | +Shared mutable state is the bane of the concurrent programmer: |
| 338 | +you can use a mutex to protect shared mutable state, |
| 339 | +but if you forget to acquire the mutex, bad things can happen, including crashes. |
| 340 | +Rust provides mutexes but makes it impossible to use them in a way that subverts memory safety. |
| 341 | + |
| 342 | +Let's take the same example yet again, |
| 343 | +and modify it to mutate the shared state: |
| 344 | + |
| 345 | +``` |
| 346 | +extern crate sync; |
| 347 | +use sync::{Arc, Mutex}; |
| 348 | +
|
| 349 | +fn main() { |
| 350 | + let numbers = ~[1,2,3]; |
| 351 | + let numbers_lock = Arc::new(Mutex::new(numbers)); |
| 352 | +
|
| 353 | + for num in range(0, 3) { |
| 354 | + let (tx, rx) = channel(); |
| 355 | + tx.send(numbers_lock.clone()); |
| 356 | +
|
| 357 | + spawn(proc() { |
| 358 | + let numbers_lock = rx.recv(); |
| 359 | +
|
| 360 | + // Take the lock, along with exclusive access to the underlying array |
| 361 | + let mut numbers = numbers_lock.lock(); |
| 362 | + numbers[num as uint] += 1; |
| 363 | +
|
| 364 | + println!("{}", numbers[num as uint]); |
| 365 | +
|
| 366 | + // When `numbers` goes out of scope the lock is dropped |
| 367 | + }) |
| 368 | + } |
| 369 | +} |
| 370 | +``` |
| 371 | + |
| 372 | +This example is starting to get more subtle, |
| 373 | +but it hints at the powerful compositionality of Rust's concurrent types. |
| 374 | +This time we've put our array of numbers inside a `Mutex` and then put *that* inside the `Arc`. |
| 375 | +Like immutable data, |
| 376 | +`Mutex`es are sharable, |
| 377 | +but unlike immutable data, |
| 378 | +data inside a `Mutex` may be mutated as long as the mutex is locked. |
| 379 | + |
| 380 | +The `lock` method here returns not your original array or a pointer thereof, |
| 381 | +but a `MutexGuard`, |
| 382 | +a type that is responsible for releasing the lock when it goes out of scope. |
| 383 | +This same `MutexGuard` can transparently be treated as if it were the value the `Mutex` contains, |
| 384 | +as you can see in the subsequent indexing operation that performs the mutation. |
| 385 | + |
| 386 | +OK, let's stop there before we get too deep. |
| 387 | + |
| 388 | +# A footnote: unsafe |
| 389 | + |
| 390 | +The Rust compiler and libraries are entirely written in Rust; |
| 391 | +we say that Rust is "self-hosting". |
| 392 | +If Rust makes it impossible to unsafely share data between threads, |
| 393 | +and Rust is written in Rust, |
| 394 | +then how does it implement concurrent types like `Arc` and `Mutex`? |
| 395 | +The answer: `unsafe`. |
| 396 | + |
| 397 | +You see, while the Rust compiler is very smart, |
| 398 | +and saves you from making mistakes you might normally make, |
| 399 | +it's not an artificial intelligence. |
| 400 | +Because we're smarter than the compiler - |
| 401 | +sometimes - we need to over-ride this safe behavior. |
| 402 | +For this purpose, Rust has an `unsafe` keyword. |
| 403 | +Within an `unsafe` block, |
| 404 | +Rust turns off many of its safety checks. |
| 405 | +If something bad happens to your program, |
| 406 | +you only have to audit what you've done inside `unsafe`, |
| 407 | +and not the entire program itself. |
| 408 | + |
| 409 | +If one of the major goals of Rust was safety, |
| 410 | +why allow that safety to be turned off? |
| 411 | +Well, there are really only three main reasons to do it: |
| 412 | +interfacing with external code, |
| 413 | +such as doing FFI into a C library; |
| 414 | +performance (in certain cases); |
| 415 | +and to provide a safe abstraction around operations that normally would not be safe. |
| 416 | +Our `Arc`s are an example of this last purpose. |
| 417 | +We can safely hand out multiple pointers to the contents of the `Arc`, |
| 418 | +because we are sure the data is safe to share. |
| 419 | +But the Rust compiler can't know that we've made these choices, |
| 420 | +so _inside_ the implementation of the Arcs, |
| 421 | +we use `unsafe` blocks to do (normally) dangerous things. |
| 422 | +But we expose a safe interface, |
| 423 | +which means that the `Arc`s are impossible to use incorrectly. |
| 424 | + |
| 425 | +This is how Rust's type system prevents you from making some of the mistakes that make concurrent programming difficult, |
| 426 | +yet get the efficiency of languages such as C++. |
| 427 | + |
| 428 | +# That's all, folks |
| 429 | + |
| 430 | +I hope that this taste of Rust has given you an idea if Rust is the right language for you. |
| 431 | +If that's true, |
| 432 | +I encourage you to check out [the tutorial](tutorial.html) for a full, |
| 433 | +in-depth exploration of Rust's syntax and concepts. |
0 commit comments