From fc6372ea1f3ab98ab1a0252843a93a7045e95849 Mon Sep 17 00:00:00 2001 From: Steve Klabnik Date: Tue, 12 May 2015 13:34:29 -0400 Subject: [PATCH] TRPL: Rust inside other languages --- src/doc/trpl/SUMMARY.md | 1 + src/doc/trpl/rust-inside-other-languages.md | 353 ++++++++++++++++++++ 2 files changed, 354 insertions(+) create mode 100644 src/doc/trpl/rust-inside-other-languages.md diff --git a/src/doc/trpl/SUMMARY.md b/src/doc/trpl/SUMMARY.md index de7ded76280f6..29ec66861f153 100644 --- a/src/doc/trpl/SUMMARY.md +++ b/src/doc/trpl/SUMMARY.md @@ -6,6 +6,7 @@ * [Hello, Cargo!](hello-cargo.md) * [Learn Rust](learn-rust.md) * [Guessing Game](guessing-game.md) + * [Rust inside other languages](rust-inside-other-languages.md) * [Effective Rust](effective-rust.md) * [The Stack and the Heap](the-stack-and-the-heap.md) * [Testing](testing.md) diff --git a/src/doc/trpl/rust-inside-other-languages.md b/src/doc/trpl/rust-inside-other-languages.md new file mode 100644 index 0000000000000..a1ae50a0c5396 --- /dev/null +++ b/src/doc/trpl/rust-inside-other-languages.md @@ -0,0 +1,353 @@ +% Rust Inside Other Languages + +For our third project, we’re going to choose something that shows off one of +Rust’s greatest strengths: a lack of a substantial runtime. + +As organizations grow, they increasingly rely on a multitude of programming +languages. Different programming languages have different strengths and +weaknesses, and a polyglot stack lets you use a particular language where +its strengths make sense, and use a different language where it’s weak. + +A very common area where many programming languages are weak is in runtime +performance of programs. Often, using a language that is slower, but offers +greater programmer productivity is a worthwhile trade-off. To help mitigate +this, they provide a way to write some of your system in C, and then call +the C code as though it were written in the higher-level language. This is +called a ‘foreign function interface’, often shortened to ‘FFI’. + +Rust has support for FFI in both directions: it can call into C code easily, +but crucially, it can also be called _into_ as easily as C. Combined with +Rust’s lack of a garbage collector and low runtime requirements, this makes +Rust a great candidate to embed inside of other languages when you need +some extra oomph. + +There is a whole [chapter devoted to FFI][ffi] and its specifics elsewhere in +the book, but in this chapter, we’ll examine this particular use-case of FFI, +with three examples, in Ruby, Python, and JavaScript. + +[ffi]: ffi.html + +# The problem + +There are many different projects we could choose here, but we’re going to +pick an example where Rust has a clear advantage over many other languages: +numeric computing and threading. + +Many languages, for the sake of consistency, place numbers on the heap, rather +than on the stack. Especially in languages that focus on object-oriented +programming and use garbage collection, heap allocation is the default. Sometimes +optimizations can stack allocate particular numbers, but rather than relying +on an optimizer to do its job, we may want to ensure that we’re always using +primitive number types rather than some sort of object type. + +Second, many languages have a ‘global interpreter lock’, which limits +concurrency in many situations. This is done in the name of safety, which is +a positive effect, but it limits the amount of work that can be done at the +same time, which is a big negative. + +To emphasize these two aspects, we’re going to create a little project that +uses these two aspects heavily. Since the focus of the example is the embedding +of Rust into the languages, rather than the problem itself, we’ll just use a +toy example: + +> Start ten threads. Inside each thread, count from one to five million. After +> All ten threads are finished, print out ‘done!’. + +I chose five million based on my particular computer. Here’s an example of this +code in Ruby: + +```ruby +threads = [] + +10.times do + threads << Thread.new do + count = 0 + + 5_000_000.times do + count += 1 + end + end +end + +threads.each {|t| t.join } +puts "done!" +``` + +Try running this example, and choose a number that runs for a few seconds. +Depending on your computer’s hardware, you may have to increase or decrease the +number. + +On my system, running this program takes `2.156` seconds. And, if I use some +sort of process monitoring tool, like `top`, I can see that it only uses one +core on my machine. That’s the GIL kicking in. + +While it’s true that this is a synthetic program, one can imagine many problems +that are similar to this in the real world. For our purposes, spinning up some +busy threads represents some sort of parallel, expensive computation. + +# A Rust library + +Let’s re-write this problem in Rust. First, let’s make a new project with +Cargo: + +```bash +$ cargo new embed +$ cd embed +``` + +This program is fairly easy to write in Rust: + +```rust +use std::thread; + +fn process() { + let handles: Vec<_> = (0..10).map(|_| { + thread::spawn(|| { + let mut _x = 0; + for _ in (0..5_000_001) { + _x += 1 + } + }) + }).collect(); + + for h in handles { + h.join().ok().expect("Could not join a thread!"); + } +} +``` + +Some of this should look familiar from previous examples. We spin up ten +threads, collecting them into a `handles` vector. Inside of each thread, we +loop five million times, and add one to `_x` each time. Why the underscore? +Well, if we remove it and compile: + +```bash +$ cargo build + Compiling embed v0.1.0 (file:///home/steve/src/embed) +src/lib.rs:3:1: 16:2 warning: function is never used: `process`, #[warn(dead_code)] on by default +src/lib.rs:3 fn process() { +src/lib.rs:4 let handles: Vec<_> = (0..10).map(|_| { +src/lib.rs:5 thread::spawn(|| { +src/lib.rs:6 let mut x = 0; +src/lib.rs:7 for _ in (0..5_000_001) { +src/lib.rs:8 x += 1 + ... +src/lib.rs:6:17: 6:22 warning: variable `x` is assigned to, but never used, #[warn(unused_variables)] on by default +src/lib.rs:6 let mut x = 0; + ^~~~~ +``` + +That first warning is because we are building a library. If we had a test +for this function, the warning would go away. But for now, it’s never +called. + +The second is related to `x` versus `_x`. Because we never actually _do_ +anything with `x`, we get a warning about it. In our case, that’s perfectly +okay, as we’re just trying to waste CPU cycles. Prefixing `x` with the +underscore removes the warning. + +Finally, we join on each thread. + +Right now, however, this is a Rust library, and it doesn’t expose anything +that’s callable from C. If we tried to hook this up to another language right +now, it wouldn’t work. We only need to make two small changes to fix this, +though. The first is modify the beginning of our code: + +```rust,ignore +#[no_mangle] +pub extern fn process() { +``` + +We have to add a new attribute, `no_mangle`. When you create a Rust library, it +changes the name of the function in the compiled output. The reasons for this +are outside the scope of this tutorial, but in order for other languages to +know how to call the function, we need to not do that. This attribute turns +that behavior off. + +The other change is the `pub extern`. The `pub` means that this function should +be callable from outside of this module, and the `extern` says that it should +be able to be called from C. That’s it! Not a whole lot of change. + +The second thing we need to do is to change a setting in our `Cargo.toml`. Add +this at the bottom: + +```toml +[lib] +name = "embed" +crate-type = ["dylib"] +``` + +This tells Rust that we want to compile our library into a standard dynamic +library. By default, Rust compiles into an ‘rlib’, a Rust-specific format. + +Let’s build the project now: + +```bash +$ cargo build --release + Compiling embed v0.1.0 (file:///home/steve/src/embed) +``` + +We’ve chosen `cargo build --release`, which builds with optimizations on. We +want this to be as fast as possible! You can find the output of the library in +`target/release`: + +```bash +$ ls target/release/ +build deps examples libembed.so native +``` + +That `libembed.so` is our ‘shared object’ library. We can use this file +just like any shared object library written in C! As an aside, this may be +`embed.dll` or `libembed.dylib`, depending on the platform. + +Now that we’ve got our Rust library built, let’s use it from our Ruby. + +# Ruby + +Open up a `embed.rb` file inside of our project, and do this: + +```ruby +require 'ffi' + +module Hello + extend FFI::Library + ffi_lib 'target/release/libembed.so' + attach_function :process, [], :void +end + +Hello.process + +puts "done!” +``` + +Before we can run this, we need to install the `ffi` gem: + +```bash +$ gem install ffi # this may need sudo +Fetching: ffi-1.9.8.gem (100%) +Building native extensions. This could take a while... +Successfully installed ffi-1.9.8 +Parsing documentation for ffi-1.9.8 +Installing ri documentation for ffi-1.9.8 +Done installing documentation for ffi after 0 seconds +1 gem installed +``` + +And finally, we can try running it: + +```bash +$ ruby embed.rb +done! +$ +``` + +Whoah, that was fast! On my system, this took `0.086` seconds, rather than +the two seconds the pure Ruby version took. Let’s break down this Ruby +code: + +```ruby +require 'ffi' +``` + +We first need to require the `ffi` gem. This lets us interface with our +Rust library like a C library. + +```ruby +module Hello + extend FFI::Library + ffi_lib 'target/release/libembed.so' +``` + +The `ffi` gem’s authors recommend using a module to scope the functions +we’ll import from the shared library. Inside, we `extend` the necessary +`FFI::Library` module, and then call `ffi_lib` to load up our shared +object library. We just pass it the path that our library is stored, +which as we saw before, is `target/release/libembed.so`. + +```ruby +attach_function :process, [], :void +``` + +The `attach_function` method is provided by the FFI gem. It’s what +connects our `process()` function in Rust to a Ruby function of the +same name. Since `process()` takes no arguments, the second parameter +is an empty array, and since it returns nothing, we pass `:void` as +the final argument. + +```ruby +Hello.process +``` + +This is the actual call into Rust. The combination of our `module` +and the call to `attach_function` sets this all up. It looks like +a Ruby function, but is actually Rust! + +```ruby +puts "done!" +``` + +Finally, as per our project’s requirements, we print out `done!`. + +That’s it! As we’ve seen, bridging between the two languages is really easy, +and buys us a lot of performance. + +Next, let’s try Python! + +# Python + +Create an `embed.py` file in this directory, and put this in it: + +```python +from ctypes import cdll + +lib = cdll.LoadLibrary("target/release/libembed.so") + +lib.process() + +print("done!") +``` + +Even easier! We use `cdll` from the `ctypes` module. A quick call +to `LoadLibrary` later, and we can call `process()`. + +On my system, this takes `0.017` seconds. Speedy! + +# Node.js + +Node isn’t a language, but it’s currently the dominant implementation of +server-side JavaScript. + +In order to do FFI with Node, we first need to install the library: + +```bash +$ npm install ffi +``` + +After that installs, we can use it: + +```javascript +var ffi = require('ffi'); + +var lib = ffi.Library('target/release/libembed', { + 'process': [ 'void', [] ] +}); + +lib.process(); + +console.log("done!"); +``` + +It looks more like the Ruby example than the Python example. We use +the `ffi` module to get access to `ffi.Library()`, which loads up +our shared object. We need to annotate the return type and argument +types of the function, which are 'void' for return, and an empty +array to signify no arguments. From there, we just call it and +print the result. + +On my system, this takes a quick `0.092` seconds. + +# Conclusion + +As you can see, the basics of doing this are _very_ easy. Of course, +there's a lot more that we could do here. Check out the [FFI][ffi] +chapter for more details.