diff --git a/2020-global/README.md b/2020-global/README.md index f610e0d..2c96020 100644 --- a/2020-global/README.md +++ b/2020-global/README.md @@ -19,13 +19,13 @@ Thanks! ### UTC Block -0. [Learnable Programming with Rust](./talks/02_UTC/01-Nikita-Baksalyar.txt) - Nikita Baksalyar -0. [Build your own (Rust-y) robot!](./talks/02_UTC/02-Aissata-Maiga.txt) - Aïssata Maiga -0. [Rust for Safer Protocol Development](./talks/02_UTC/03-Vivian-Band.txt) - Vivian Band -0. [Rust as foundation in a polyglot development environment](./talks/02_UTC/04-Gavin-Mendel-Gleason-and-Matthijs-van-Otterdijk.txt) - Gavin Mendel-Gleason & Matthijs van Otterdijk -0. [Rust for Artists. Art for Rustaceans.](./talks/02_UTC/05-Anastasia-Opara.txt) - Anastasia Opara -0. [Miri, Undefined Behavior and Foreign Functions](./talks/02_UTC/06-Christian-Poveda.txt) - Christian Poveda -0. [RFC: Secret types in Rust](./talks/02_UTC/07-Diane-Hosfelt-and-Daan-Sprenkels.txt) - Diane Hosfelt & Daan Sprenkels +0. [Learnable Programming with Rust](./talks/02_UTC/01-Nikita-Baksalyar.md) - Nikita Baksalyar +0. [Build your own (Rust-y) robot!](./talks/02_UTC/02-Aissata-Maiga.md) - Aïssata Maiga +0. [Rust for Safer Protocol Development](./talks/02_UTC/03-Vivian-Band.md) - Vivian Band +0. [Rust as foundation in a polyglot development environment](./talks/02_UTC/04-Gavin-Mendel-Gleason-and-Matthijs-van-Otterdijk.md) - Gavin Mendel-Gleason & Matthijs van Otterdijk +0. [Rust for Artists. Art for Rustaceans.](./talks/02_UTC/05-Anastasia-Opara.md) - Anastasia Opara +0. [Miri, Undefined Behavior and Foreign Functions](./talks/02_UTC/06-Christian-Poveda.md) - Christian Poveda +0. [RFC: Secret types in Rust](./talks/02_UTC/07-Diane-Hosfelt-and-Daan-Sprenkels.md) - Diane Hosfelt & Daan Sprenkels ### LATAM Block diff --git a/2020-global/talks/02_UTC/01-Nikita-Baksalyar.md b/2020-global/talks/02_UTC/01-Nikita-Baksalyar.md new file mode 100644 index 0000000..94af009 --- /dev/null +++ b/2020-global/talks/02_UTC/01-Nikita-Baksalyar.md @@ -0,0 +1,136 @@ +**Learnable Programming with Rust** + +**Bard:** +Nikita makes Rust interactive +so if learning it is your directive +you won't need to fight +to see what's inside +to become a debugging detective + +**Nikita:** + +Hi, I'm Nikita, and today I'm going to share with you a different approach to writing documentation and educating people about programming in Rust. +Let's start by dephenotyping the problem. + +What exactly is learnable programming? It is defined by Brett Victor in essay of the same name, as a set of design principles that helps programmers to understand and see how their programs execute. +And some of the defining characteristics are of this idea are seeing the state of a running program, and giving the programmer tools to easily play with their code. +And these things have become available and important when they apply to the problems of education and writing documentation because they can help to lower the entry barriers for people who are new to programming, and, today, I'm going to show you how we can apply these ideas to Rust and systems programming in particular. +So, let's see how it it looks in action. + +Let's imagine you're someone new to asynchronous programming in Rust and you want to learn more about it. +So you go to the Tokio's website, and you see they have really nice tutorials. +Before you can start experimenting with asynchronous code, you need to follow a number of steps because they have prerequisites. +You need to set up a new Rust project and then add dependencies through Cargo. +Then you can take a look at the code provided in this example, and this code connects to our Redis-like server, and sends a key to "hello with value "world" and gets the request with the same key and verifies that it has the expected volume. +Before you can run this example, you need to make sure that a mini Redis is running on your machine. + +Why don't we try including all the necessary dependencies right there on the same page, so that a user can immediately run this code and see the results? This is how it might look. +I go to the same example and I click on the Run button. +This is compiled, and, on the right side, I can see a key "hello" has been set to value "world". +This gives the user an immediate context and response of what happens with their code and the state of the mini Redis server can be observed right there in the browser. + +But we can take this idea a little bit further and make this code example editable, so I can change the value of the key and run this code again, and I can see that the state has also changed. +Or, I can add a new key while you pair "hey, RustFest" and you can see there is a new key now. +This approaches requires a minimal set-up so I can start exploring the Tokio API straightaway and I believe this is something we should do to make Rust more accessible to more people. +Of course, setting up a new project is also an important skill, but, the first impression to show that this whole thing is not really that complicated is also important, I think. + +And, well, it's not some kind of magic, because we already can make it happen. +We are able to run the Rust code in the browser by using assembly, and the platform support is constantly expanded and improved in the compiler so we can make it work quite easily. +When you click on the Run button, the code that you have in the editor is sent to the server where it is compiled by the Rust compiler, and the model is produced as a result. +This model is then executed in your browser, and you can see the output. + +This approach can be expanded way beyond Tokio or any other single library, because we can use it to enhance documentation that is produced by Rust doc automatically. +This can be used in interactive playground in which you can execute in your web browser. +And, while it already kind of works like that with the standard library documentation because if you go, for example, to the page that documents ... +we can click on the run button in any of the examples and you will be directed to the Rust playground where we can see the output. + +Of course, you can also play with this code when and what I think it, but what if we make it simpler by running examples in the browser on the same page, and showing the output besides the code? This will make it easier for people to just immediately see the output without switching the context. +But there is a problem if we go beyond the standard libraries. + +So the problem is how do we deal with dependencies, and especially those dependencies which won't compile? The thing is we don't really have access to dependencies on the Rust playground either, because Rust Playground has a limited set available to us because it's an isolated environment and that's an expected thing because the Rust Playground suits the code on their server, and they want to make sure that the code is not malicious, and they limit things like inputs and outputs, and loading the external dependencies. +On the other hand, this makes it harder for us, and practically impossible to run examples from codebases, or even from public crates, and it makes it harder for us to mentor and educate people through examples. +WebAssembly does not have this problem. + +The sandboxed environment is not someone else's machine or server, but it's your own web browser, and this sandbox is secured by definition. +But the main problem remains: not all Rust code compiles down yet. +Even if it don't, we can argue that this is a good thing, because if you want to add this kind of interactivity from your project, it will also incentivise you to make your code more portable, and make it compilable into a new target. + +When you write tests for code that, for example, sends network requests, usually what you want to do is to - meaning your code won't really send real network requests, but instead it will pretend to do so. +So, you can test the behaviour of your code in isolation from the external environment, and that exactly the same thing that we want to do with our interactive documentation too, because we want to give the user the option to run their code independent of the external environment. +The main thing so look for here is the API performance, because we want to make sure that mock functions that we call have the exact same interface as your real code. +And while you can do this manually by using feature flags, for example, or you can use one of the many Rust libraries for automatic mocking, but the main idea remains, that you want to provide the same interface, both in your real code and in the mocked version. +And this idea of running the Playground code in the browser can go a lot further, because we have lots of techniques for visualising data. +So, for example, on this slide, you can see some code from the documentation of an HTTP client library called Request. +It sends a get request and outputs the response. + +This example demonstrates only the abstract idea of sending an HTTP request, but how do we know how it actually works? And what kind of request it actually sends? In this case, we can output the HTTP request, and it is really helpful to see what kind of request gets sent when you send this code because it can help you learn more about HTTP, and, more than that, it can also help to you debug problems, because with an interactive Playground like this, you can replace the code with your own code and observe the changes, and observe its behaviour. +And the good news is that a lot of cross libraries already do something like that with tracing, and logging, but to use that, you also have to choose a logging library, and enable it to just get the log output. + +With WebAssembly enabled, Playground, we just need to provide our own login library that will redirect output from the logs to a user's browser. +And the user's code can also be seen as data that we can visualise and inspect, and some of the components of the Rust compiler are already available as libraries and there are even some for creates that can be used for parsing the Rust code. +Almost all of them can be built in the WebAssembly so we can parse the code and extract the information that we can use to show hints to our user, even before they execute their code. +It can make our examples even more clear, because these hints can be shown as a user types code, and they can provide context information almost like in the case of IDs. +On this slide, you can see the request documentation again and it encodes parameters as HTTP form data, and, as you look at this code in the editor, you can see what result this or that function returns with your chosen hints because this compiler provides us with information about the function names, and types that are used there, and all the kinds of expressions, and it is really straightforward work with code as data because all expressions are actually variants of one large - so you can use pattern matching to work with this enum as you normally would do with Rust. + +There is one more thing that we can do this year, and this is highlighting the code execution flow. +It can be compared to how you work with the debugger which can go through a program step by step, and while it goes through the program in this fashion, it also can show the internal state of the program at this particular point in time, and we can do the same thing for our interactive Playgrounds, because, it can really help in understanding the cases like, for example, ... +and on this side we can see an example from the Rust standard library documentation to instruct the enumerators in a functional style, and while it makes some sense, it's hard to see what kind of output we will get when, for example, we call the filter function on the third line. + +But we can give a user this to go through the example line-by-line, while also showing what each function will do. +And we can also combine this technique with the others we have discussed so far, because this time, we can have access not only to the static context information that is provided by the compiler, but also to the dynamic state at the runtime. +And we can display data like local variables, or function call results, and as a user steps through the code, it becomes really easy to see how the state changes with each function call. +With asynchronous code, this approach can really make a difference in understanding how it works. + +If we treat each step with its own function call, we can do an interesting thing here. +We can record each state at each step and move it forwards and backwards in time. +Since we are talking mainly about things like short snippets of code, and examples instead of, like, large codebases, it's possible to give a user a kind of a slider to go back and forth to really see the execution flow, or they can just jump straight to a point in time that is interesting to them, just by clicking on the slider. +And, again, this is not some sort of magic, because even if we don't have an immediate access to the WebAssembly execution engine in the browser, and we don't have a kind of fine-grained control over the execution, we can transform through the compilation step, and we can do that even without modifying and talking to the Rust compiler. + +This transformation technique is actually pretty well known, and even the Rust compiler itself uses it for asynchronous code. +It works by basically breaking down each block into individual function that represents a single execution step. + +This is known as continuation, and it means that we can continue the execution of a function from the point that we left it at. +And Rust has an unstable feature called Generators, and this is used for using the Async/Await syntax. +While it works almost exactly as you see it on the slide, so we have an struct that holds the variable state and local variables, and each block is transformed into a separate function. + +So, when you want to execute this snippet, all you have to do is to call these functions one by one, and the state changes. +So these functions can be called from the browser, and we are very flexible in choosing in what order we call them, and what kind of state we record. + +So far, we have discussed some implementation details for these ideas, but, overarchingly, how do we actually make it work? And how to make it accessible to everyone? And there are several problems to solve here, and not all of them are technical. +So, first of all, the Rust compiler itself cannot be built into the WebAssembly model yet, so this requires us to have a compiler service that will build arbitrary Rust code into the WebAssembly for everyone. + +So we already have something like that on the Rust Playground, so it's not that big of a deal, and, well, so I tried implementing this idea in production for a systems programming course, and surprisingly, this infrastructure is not really hard to provide, and it's not expensive, because a single CPU optimised virtual machine can easily handle hundreds of thousands of compilation requests per day, but, still, we need to make sure that this infrastructure is easily available, and it should be possible to deploy it, to deploy this compilation server locally and privately. + +There is another problem that we have discussed briefly. +If we start to include dependencies in our small code snippets, the compilation will take a large amount of time, and resources, and the resulting module will have a huge site easily taking up several megabytes, making it harder to scale. +While the obvious solution here is to compile these dependencies as separate assembly models instead and link the dependencies when we need them, but, this problem is made worse by the fact that there is no dynamic linking standard for the WebAssembly. + +So you're basically left with the only choice of statically compiling the dependencies. +But, technically, the linking is still possible, even though there is no standard way of doing it. +It's possible to make two of the assembly models work together. +Each model consists of functions that it exports, and that would be our Rust functions, and it also has functions that are declared as imported, and these imported functions are provided by the caller, and usually they're used for calling JavaScript functions from the Rust code, and this is what makes it possible for Rust programs to interact, for example, with DOM and browser functions. + +We can use this trick. +When Rust module A calls some imported function, what we are going to do is to call an exported function from Rust module B, and this works, but it works, but there is another problem with it. +Each model has its own separate memory, and this memory is not shared between the modules, and this means that if an imported Rust function tries to access its state when it is executed, it will fail because its memory reading does not contain the expected data. + +What we will need to do is to basically copy memory from module A to module B before we can call an imported function. +The main disadvantage of this approach is of course that it is not standard, and can be currently, it requires manual implementation. +Ideally, the Rust compiler should take care of this for us, but for now, to make it work in the general case, we will need to find a way to automatically link Rust dependencies which are compiled as WebAssembly modules. +Now that we have covered all this, what is the conclusion? I think that it is well worth the effort to make our documentation interactive, because, it will help us to bring more people into Rust, and part of the reason why JavaScript is so popular is that it is so easy to use it and access it. +You don't need any complicated set-up. + +All you have to do is open a console in your web browser, and you can start writing and experimenting with code. +With WebAssembly, we have a really good chance of bringing this simplicity into the Rust world. +But we still have a long way ahead of us, because the WebAssembly ecosystem has not matured yet. +But still, we can try our best in making this as easy as possible to have add these sort of interactive elements to any project, because we have tools like Rust Doc which can generate documentation from the source code. +What we need to do is to improve Rust Doc to automatically generate Playgrounds for our documentation, and we also need to have a tool kit to simplify building these interactive playgrounds. +The good news is that we don't have to start from scratch. +The most crucial part is, actually, the compiling of the code into WebAssembly, and the Rust compiler has got us covered there. + +We just need to build the tooling around it. +I have started a project that contains the Docker image for the compiler service and some front-end components. +You can find the repository at the address you see on the slide, so, if you're interested, you can join the development effort, and help to make this a reality. +And that is all from me, and thanks for your attention, and I will be answering your questions in the chat room. + +Thank you! diff --git a/2020-global/talks/02_UTC/01-Nikita-Baksalyar.txt b/2020-global/talks/02_UTC/01-Nikita-Baksalyar.txt deleted file mode 100644 index b9b5e2a..0000000 --- a/2020-global/talks/02_UTC/01-Nikita-Baksalyar.txt +++ /dev/null @@ -1,88 +0,0 @@ -Welcome, and Learnable Programming with Rust - Nikita Baksalyar -[Music] -> Hello! -> Good morning, everyone. -> Hello, good morning. -> So how did you all like part one? I mean, I saw there was a lot of activity on the chat, so I'm guessing people stayed up for that, and, of course, APAC enjoyed that. -> Some people tried the snake game. You managed to crash it after 2.5 hours! -> I mean, that is a lot of dedication, though! -> Some lasted 2.5 hours. -> I'm pretty proud, and it was Rust, so, hey, I crashed here. It's fixed again, so, please, crash it again! [Laughter]. -> Not the snakes, just the game. -> Just the game. -> All right, so we have Stefan: would you like to do a bit of an intro? We have some stuff to go through before we get started with the talks. Got to get the preparatory caffeine in first. -> Ready! All right! Here we go. -> That's just me! -> What's up! So, welcome. So this will be fun, because there are three of us, so we will keep talking all over each other. Welcome to the second block of RustFest. Welcome to the UTC, or Europe, or euro-African time zones. What is RustFest all about? It started out as the community conference in Berlin, went to Kiev, to Zürich, to - I'm getting this wrong - Paris, then Rome, then Barcelona, and now we are in the Netherlands somewhat. -> Kind of! -> Or I am! -> Jeske presenting! -> Yes, so, it's about meeting people, connecting with people. That's why we have a huge amount of chat rooms. I don't know if you've seen that, but if you're watching us on RustFest Global, there is a house icon, and then on the right side, it opens a box, and the chat, and you can enter each - for each talk, there is a chat room that you can enter. Click on it, and then Ferrous, our trustee admin bot will invite you to the room, and you can ask questions there. There are also moderators. -> Yes, there are moderators, and we also have - I mean, for this round, we have life scope of noting. If you go - *we have live sketch noting. You can follow Malwine doing live sketch notes which is super cool - not just presenting it at the end, but as far as I understand, you get to see stroke by stroke, and like the erasing and everything. I know I'm going to be watching that. I'm going to be watching that when I'm not presenting someone! -> Yes. -> You can see yourself being drawn! -> Hmm. -> That's pretty cool. -> So what - there are also buttons on the bottom. One is the snake game, the other is the sponsor, and there is a little "e" with a slash which has the live transcriptions, which I hope people that cannot hear me find on their own. -> I think it will be announced on the chat anyway, and it's been shared, so, hopefully, the people that need it can find it. I actually really prefer using - you know, it's a global conference, not everyone is a native speaker, so I find it so cool that we have this, because not everyone can understand accents, or - you know? It's so helpful to have. Thank you, White Coat for joining us! -> Thank you. All right. We talked about this, multiple communities. If you allow me this political side note, thanks to certain countries and their politics, Europe has benefited from that, since we have speakers from all continents from the very start, which is great for us. And now, it's live. All over the planet! So, RustFest isn't just for super high-tech people that work on the compiler, it's for everybody, actually. The next talk, which Jeske will present, is very beginner-friendly, in my opinion. You were going to say something afterwards? -> From the three of us and the most senior level, so very suitable, I would say, it's a good start to the early morning here in Europe. -> I mean, absolutely, and I think that's part of the reason why this is such a - like even the technology might be complex when you go deeper. The community is so wonderful. I didn't know anything about Rust when I first attended RustFest in Paris, and, like, oh, such a welcoming, amazing community, and it's so sad that we can't all be together to celebrate and hug, and just have an amazing time together, but we are all here, so, you know, definitely join the chat, tweet out stuff. And let us know how you're enjoying it. I don't know, take a selfie of you watching it at home in your PJs! -> I think the screens are our Twitter handle, so if you have any questions to us, just tag us, mention us, and we will respond. -> Or ask @RustFest and then the whole team can respond. -> If you have something super cool to share with everyone, they might retweet it too. -> Yes, 21 talks, all confirmed. 24 speakers, 12 artists, three teams. I think we are in eight time zones right now, so, ... - -> Thank you, day light saving time! -> Yes. Vote for normal time if you get the chance too! So these all the artists. In our block, it's Earth to Abigail after, and Aesthr. -> DJ Dibs. We are super lucky. This is so cool. I'm so excited about this, especially because it's global, we get to have artists from around the world, and that it's much easier for everyone to join. Not everyone has to drag around their whole kit. You've seen when artists attend conferences, that they have to bring everything with them, they get stopped at the airport because of the electronics, and everything. Now we just all get to share, and enjoy, so that is super cool. I'm really excited about this. I hope you all really enjoy it. This is appropriate. There are some/ing lights. -> Artists have lights that might be triggering for photosensitive people. If you are not comfortable, if it is unsafe for you, it's better to opt out. We just want you all to be okay. You could also listen in, because there will be musical performances too. -> Like minimise the tab, or something. -> Stay away from the blinky stuff! -> Pilar already said it would be that we have a wonderful community, and part of that is because we have a code of conduct, and, lucky us, we have hardly ever had to enforce it, because people think about it, and they say, "Okay, I can agree with that." It's basically be very kind, and assume the best of people. Especially keep in mind that most people like us have English as their second, third, or fourth language, so - -> For sure, be considerate. Because this is a global edition, you might be impacting people from a different culture, upbringing, different understanding of the world. -> Completely different sleep level! -> [Laughter]. That's true! If you've been watching from the beginning, oh, boy! I'm going to be extra kind! [Laughter]. We urge you please to read the code of conduct. If there are any misunderstandings, if you need any help, please reach out to us. Us three will be probably a bit busy, but for sure, the team are out on the chat. Please reach out. We're there for you. -> We don't have any - like when you scroll down in the chat system, and you can see the administer or moderator. If you text Ferrous, you may not get an answer! -> I think it would be good to read it out so we can have it in our transcription because these slides cannot be screen read, as far as I know. -> Would you mind? -> So our code of conduct: RustFest is dedicated to provide a harassment-free conference experience for everyone. Harassment includes, amongst other things, offensive comments related to gender, gender identity, and expression, age, sexual orientation, disability, physical appearance, body size, race, ethnicity, religion, technology choices, deliberate intimidation, stalking, and unwelcome sexual attention. Thank you all for listening to that. Hopefully, you've read it on the code of conduct, and you're familiarised with it. Yes, be kind to each other. Please be respectful. We are all here to have a good time. -> Thank you. -> I like that we have these so ingrained that we keep on going over things that we've already said! -> Wonderful. So, we go ahead. The schedule is online. On a side note, if your browser detects the wrong time zone, you can change that on the top of the page. But, on certain browsers, this dropdown menu takes quite some time to generate, so if you click and nothing happens, just wait, let's say up to 15 seconds, and then you can select your time zone, and then it will recalculate all the times to your local time zone so you don't have to do the math in your head. -> The chat that we've been talking about over and over Chen, rustch.at, which is funny if you're in the German area, but it's really fun. -> It's matrix-based, and most people have been able to log on. Otherwise drop us an email or tweet. -> So this is the - I have to drink something! -> Hydrate! So this is the APAC team who just signed off. We're so grateful to each and every one of them. You know, they've been pulling all of their weight, and more, for this edition, and they're hopefully taking it easy now. If not, you know, indulging in watching a little bit of the UTC timeline. Stefan, did you want to say something? -> No, no. Same. Thank you very much! -> Big applause to them this they did a lot of hard work. -> So this is us. -> This is us. -> Maybe we can say also, because I think we already said, that we're all on the same time zone, for sure, but Pilar, I'm in Amsterdam, so whereabouts are you? -> You can see I'm the stand-out name in the team, originally from Chile, but I'm based near Vienna. I live in - if you know Lord of the Rings, I live in the shire, in the middle of nowhere in Austria. -> You have a lot of dogs there? -> They might make an appearance every now and again. The mail is coming. So I've banned them from the streaming room! But if they behave ... maybe they can pop on later and say hi. What about you, Stefan? -> I'm currently in Switzerland, in Zürich. It's quite nice here. We have fibre optic cable to the house, so -. -> The luck of some of us. -> The luck of some of us. The internet is really fast and close here. -> That is also the reason why you chose this place, right? -> Yes, yes! Well, not just ... -> If you're going to be locked down somewhere ... -> Better have 6.5 gigabits per second. -> I think you also have to think a lot of the other members here, thank the other members here, Alexander, Jan-Erik, Flaki, and everybody else, they've all been real troupers here. -> Jan-Erik has not slept yet! -> Won't. You see him in varying levels of drowsiness. Naah, he's a superstar! -> The European time zone, that is the three of us, so we are looking forward to it. [Alarm sounds]. -> We should have finished-by-now-time. The upcoming is from Latin America. I think they're mostly on the east coast, right? -> That time zone? -> I'm not that sure. I mean, let's leave it up to them, and they will introduce themselves! -> Curious already! -> So, huge thanks to our sponsors. Thank you very much. Thanks to Coil, Embark, Parity, Mux - Mux is also part of the streaming infrastructure that you're seeing in this one. Thanks to Mozilla, Centricular, OpenSuse, Mullvad VPN, OBB, the Rail Company - Redis, TerminusDB, Nervos, Truelayer, Technolution, Iomentum, and Ferrous Systems from Berlin which you might know from the embedded Rust conference, Oxidize. Thank you very much! Now, enjoy and have fun. -> We've taken up enough time. It was lovely to get to set the stage for you all. You will see all three of us throughout. -> If you allow me, there may be T-shirts on the horizon! [Laughter]. -> We're very jealous of his T-shirt. -> I turned on my purple light. You all are way too ahead of me. Purple T-shirt ... the green screen. I just can't compete! We will leave it up to you. -> I think, indeed, as the process will follow, like every presenter will have a short intro, and then we will have a ... and then we will drive into the talk. The first talk is going to be Nikita Baksalyar from Scotland, push Rust developer tool kits forward. I'm especially into this, because as we already mentioned, I'm a beginner of Rust, so this is going to be a talk that is really good for that. So it is system engineering and decentralised systems. If we can start, I will speak to you afterwards. -> Technical problems. You know it! -> Nikita makes Rust interactive, so if learning it is your directive, you won't need to fight to see what is inside to become a bugging detective. -> Hi, I'm Nikita, and today I'm going to share with you a different approach to writing documentation and educating people about programming in Rust. Let's start by dephenotyping the problem. What exactly is learnable programming? It is defined by Brett Victor in essay of the same name, as a set of design principles that helps programmers to understand and see how their programs execute. And some of the defining characteristics are of this idea are seeing the state of a running program, and giving the programmer tools to easily play with their code. And these things have become available and important when they apply to the problems of education and writing documentation because they can help to lower the entry barriers for people who are new to programming, and, today, I'm going to show you how we can apply these ideas to Rust and systems programming in particular. So, let's see how it it looks in action. Let's imagine you're someone new to asynchronous programming in Rust and you want to learn more about it. So you go to the Tokio's website, and you see they have really nice tutorials. Before you can start experimenting with asynchronous code, you need to follow a number of steps because they have prerequisites. You need to set up a new Rust project and then add dependencies through Cargo. Then you can take a look at the code provided in this example, and this code connects to our Redis-like server, and sends a key to "hello with value "world" and gets the request with the same key and verifies that it has the expected volume. Before you can run this example, you need to make sure that a mini Redis is running on your machine. Why don't we try including all the necessary dependencies right there on the same page, so that a user can immediately run this code and see the results? This is how it might look. I go to the same example and I click on the Run button. This is compiled, and, on the right side, I can see a key "hello" has been set to value "world". This gives the user an immediate context and response of what happens with their code and the state of the mini Redis server can be observed right there in the browser. But we can take this idea a little bit further and make this code example editable, so I can change the value of the key and run this code again, and I can see that the state has also changed. Or, I can add a new key while you pair "hey, RustFest" and you can see there is a new key now. This approaches requires a minimal set-up so I can start exploring the Tokio API straightaway and I believe this is something we should do to make Rust more accessible to more people. Of course, setting up a new project is also an important skill, but, the first impression to show that this whole thing is not really that complicated is also important, I think. And, well, it's not some kind of magic, because we already can make it happen. We are able to run the Rust code in the browser by using assembly, and the platform support is constantly expanded and improved in the compiler so we can make it work quite easily. When you click on the Run button, the code that you have in the editor is sent to the server where it is compiled by the Rust compiler, and the model is produced as a result. This model is then executed in your browser, and you can see the output. This approach can be expanded way beyond Tokio or any other single library, because we can use it to enhance documentation that is produced by Rust doc automatically. This can be used in interactive playground in which you can execute in your web browser. And, while it already kind of works like that with the standard library documentation because if you go, for example, to the page that documents ... we can click on the run button in any of the examples and you will be directed to the Rust playground where we can see the output. Of course, you can also play with this code when and what I think it, but what if we make it simpler by running examples in the browser on the same page, and showing the output besides the code? This will make it easier for people to just immediately see the output without switching the context. But there is a problem if we go beyond the standard libraries. So the problem is how do we deal with dependencies, and especially those dependencies which won't compile? The thing is we don't really have access to dependencies on the Rust playground either, because Rust Playground has a limited set available to us because it's an isolated environment and that's an expected thing because the Rust Playground suits the code on their server, and they want to make sure that the code is not malicious, and they limit things like inputs and outputs, and loading the external dependencies. On the other hand, this makes it harder for us, and practically impossible to run examples from codebases, or even from public crates, and it makes it harder for us to mentor and educate people through examples. WebAssembly does not have this problem. The sandboxed environment is not someone else's machine or server, but it's your own web browser, and this sandbox is secured by definition. But the main problem remains: not all Rust code compiles down yet. Even if it don't, we can argue that this is a good thing, because if you want to add this kind of interactivity from your project, it will also incentivise you to make your code more portable, and make it compilable into a new target. When you write tests for code that, for example, sends network requests, usually what you want to do is to - meaning your code won't really send real network requests, but instead it will pretend to do so. So, you can test the behaviour of your code in isolation from the external environment, and that exactly the same thing that we want to do with our interactive documentation too, because we want to give the user the option to run their code independent of the external environment. The main thing so look for here is the API performance, because we want to make sure that mock functions that we call have the exact same interface as your real code. And while you can do this manually by using feature flags, for example, or you can use one of the many Rust libraries for automatic mocking, but the main idea remains, that you want to provide the same interface, both in your real code and in the mocked version. And this idea of running the Playground code in the browser can go a lot further, because we have lots of techniques for visualising data. So, for example, on this slide, you can see some code from the documentation of an HTTP client library called Request. It sends a get request and outputs the response. This example demonstrates only the abstract idea of sending an HTTP request, but how do we know how it actually works? And what kind of request it actually sends? In this case, we can output the HTTP request, and it is really helpful to see what kind of request gets sent when you send this code because it can help you learn more about HTTP, and, more than that, it can also help to you debug problems, because with an interactive Playground like this, you can replace the code with your own code and observe the changes, and observe its behaviour. And the good news is that a lot of cross libraries already do something like that with tracing, and logging, but to use that, you also have to choose a logging library, and enable it to just get the log output. With WebAssembly enabled, Playground, we just need to provide our own login library that will redirect output from the logs to a user's browser. And the user's code can also be seen as data that we can visualise and inspect, and some of the components of the Rust compiler are already available as libraries and there are even some for creates that can be used for parsing the Rust code. Almost all of them can be built in the WebAssembly so we can parse the code and extract the information that we can use to show hints to our user, even before they execute their code. It can make our examples even more clear, because these hints can be shown as a user types code, and they can provide context information almost like in the case of IDs. On this slide, you can see the request documentation again and it encodes parameters as HTTP form data, and, as you look at this code in the editor, you can see what result this or that function returns with your chosen hints because this compiler provides us with information about the function names, and types that are used there, and all the kinds of expressions, and it is really straightforward work with code as data because all expressions are actually variants of one large - so you can use pattern matching to work with this enum as you normally would do with Rust. There is one more thing that we can do this year, and this is highlighting the code execution flow. It can be compared to how you work with the debugger which can go through a program step by step, and while it goes through the program in this fashion, it also can show the internal state of the program at this particular point in time, and we can do the same thing for our interactive Playgrounds, because, it can really help in understanding the cases like, for example, ... and on this side we can see an example from the Rust standard library documentation to instruct the enumerators in a functional style, and while it makes some sense, it's hard to see what kind of output we will get when, for example, we call the filter function on the third line. But we can give a user this to go through the example line-by-line, while also showing what each function will do. And we can also combine this technique with the others we have discussed so far, because this time, we can have access not only to the static context information that is provided by the compiler, but also to the dynamic state at the runtime. And we can display data like local variables, or function call results, and as a user steps through the code, it becomes really easy to see how the state changes with each function call. With asynchronous code, this approach can really make a difference in understanding how it works. If we treat each step with its own function call, we can do an interesting thing here. We can record each state at each step and move it forwards and backwards in time. Since we are talking mainly about things like short snippets of code, and examples instead of, like, large codebases, it's possible to give a user a kind of a slider to go back and forth to really see the execution flow, or they can just jump straight to a point in time that is interesting to them, just by clicking on the slider. And, again, this is not some sort of magic, because even if we don't have an immediate access to the WebAssembly execution engine in the browser, and we don't have a kind of fine-grained control over the execution, we can transform through the compilation step, and we can do that even without modifying and talking to the Rust compiler. This transformation technique is actually pretty well known, and even the Rust compiler itself uses it for asynchronous code. It works by basically breaking down each block into individual function that represents a single execution step. This is known as continuation, and it means that we can continue the execution of a function from the point that we left it at. And Rust has an unstable feature called Generators, and this is used for using the Async/Await syntax. While it works almost exactly as you see it on the slide, so we have an struct that holds the variable state and local variables, and each block is transformed into a separate function. So, when you want to execute this snippet, all you have to do is to call these functions one by one, and the state changes. So these functions can be called from the browser, and we are very flexible in choosing in what order we call them, and what kind of state we record. So far, we have discussed some implementation details for these ideas, but, overarchingly, how do we actually make it work? And how to make it accessible to everyone? And there are several problems to solve here, and not all of them are technical. So, first of all, the Rust compiler itself cannot be built into the WebAssembly model yet, so this requires us to have a compiler service that will build arbitrary Rust code into the WebAssembly for everyone. So we already have something like that on the Rust Playground, so it's not that big of a deal, and, well, so I tried implementing this idea in production for a systems programming course, and surprisingly, this infrastructure is not really hard to provide, and it's not expensive, because a single CPU optimised virtual machine can easily handle hundreds of thousands of compilation requests per day, but, still, we need to make sure that this infrastructure is easily available, and it should be possible to deploy it, to deploy this compilation server locally and privately. There is another problem that we have discussed briefly. If we start to include dependencies in our small code snippets, the compilation will take a large amount of time, and resources, and the resulting module will have a huge site easily taking up several megabytes, making it harder to scale. While the obvious solution here is to compile these dependencies as separate assembly models instead and link the dependencies when we need them, but, this problem is made worse by the fact that there is no dynamic linking standard for the WebAssembly. So you're basically left with the only choice of statically compiling the dependencies. But, technically, the linking is still possible, even though there is no standard way of doing it. It's possible to make two of the assembly models work together. Each model consists of functions that it exports, and that would be our Rust functions, and it also has functions that are declared as imported, and these imported functions are provided by the caller, and usually they're used for calling JavaScript functions from the Rust code, and this is what makes it possible for Rust programs to interact, for example, with DOM and browser functions. We can use this trick. When Rust module A calls some imported function, what we are going to do is to call an exported function from Rust module B, and this works, but it works, but there is another problem with it. Each model has its own separate memory, and this memory is not shared between the modules, and this means that if an imported Rust function tries to access its state when it is executed, it will fail because its memory reading does not contain the expected data. What we will need to do is to basically copy memory from module A to module B before we can call an imported function. The main disadvantage of this approach is of course that it is not standard, and can be currently, it requires manual implementation. Ideally, the Rust compiler should take care of this for us, but for now, to make it work in the general case, we will need to find a way to automatically link Rust dependencies which are compiled as WebAssembly modules. Now that we have covered all this, what is the conclusion? I think that it is well worth the effort to make our documentation interactive, because, it will help us to bring more people into Rust, and part of the reason why JavaScript is so popular is that it is so easy to use it and access it. You don't need any complicated set-up. All you have to do is open a console in your web browser, and you can start writing and experimenting with code. With WebAssembly, we have a really good chance of bringing this simplicity into the Rust world. But we still have a long way ahead of us, because the WebAssembly ecosystem has not matured yet. But still, we can try our best in making this as easy as possible to have add these sort of interactive elements to any project, because we have tools like Rust Doc which can generate documentation from the source code. What we need to do is to improve Rust Doc to automatically generate Playgrounds for our documentation, and we also need to have a tool kit to simplify building these interactive playgrounds. The good news is that we don't have to start from scratch. The most crucial part is, actually, the compiling of the code into WebAssembly, and the Rust compiler has got us covered there. We just need to build the tooling around it. I have started a project that contains the Docker image for the compiler service and some front-end components. You can find the repository at the address you see on the slide, so, if you're interested, you can join the development effort, and help to make this a reality. And that is all from me, and thanks for your attention, and I will be answering your questions in the chat room. Thank you! -> Hello, everybody. Thanks for the interesting talk. I like the idea of the interactive documentation from mocking and visualisation with the help of WebAssembly. As Nikita already mentioned in his talk, he's online in the chat, so, if you have any questions, or you want to follow him, or you want to follow up on some issues like he is active in the chat right now, so, I do encourage you to ask any questions over there. Rest assured that the next talk is going to be in five minutes, so, at 1050CST, I would say, thank you all for listening in for the first talk, and I will see you in the next one. Thank you. diff --git a/2020-global/talks/02_UTC/02-Aissata-Maiga.md b/2020-global/talks/02_UTC/02-Aissata-Maiga.md new file mode 100644 index 0000000..e289676 --- /dev/null +++ b/2020-global/talks/02_UTC/02-Aissata-Maiga.md @@ -0,0 +1,324 @@ +**Build your own (Rust-y) robot!** + +**Bard:** +Aïssata Maiga lets me know +how to make bots without Arduino +writing Rust to move +her bot to my groove +Sure there will be some cool stuff to see, no? + +**Aïssata** + +Hello, I'm Aïssata Maiga, just your regular computer science student and I live in Sweden. +I discovered Rust this summer and fell in love with it. + +So, let's just start. +This presentation will be about making a robot in Rust and working with no std. +It is a fun project for you to try for yourself and children. +It's also very easy. + +The most intimidating part is to get started and order stuff from the internet. +I will show you the robot, and then explain everything you need to know about every part of the code, and share a lot of mistakes I've made, and then there will be a little surprise. +I used the avr-hal, and I got a lot of help. + +It has great documentation, a lot of templates, how to start your own project, and of course how to use your cargo file and basic templates, but it also has many examples for every avenue of work. +For example, I'm using uno, and, if you go to "examples", you can see how to blink an LED, which is the "hello, world" of Arduino systems. +It really works great, and I would recommend it heartfully. + +The components first. +With time, you will notice that all components are standard and pretty much the same, but the easiest way to get started is just to buy a kit. +Many are available online, Amazon. +If you google "smart car", you will see a bunch of suppliers that you can choose. +The cheapest start at ten or 15 euros. +If you want to assemble, it's the same. + +If you just google "assembly instructions smart car Arduino", you will have a lot of good videos, and I link them in my repository. +A word of caution, and a good opportunity to share my big error number one. +In most assembly videos and on the rep on. + +There are schematics that follow. +You must be careful to follow them and you must plug them as they look on the image, but the most important is to make sure that the circuit is grounded, that means that all ground cables are connected, that the circuit has a common ground. +If not, bad things will happen, and bad things also called "undefined behaviour", so if you're here and nothing is working and you're getting frustrated, just check if everything has a common ground. + +Arduino is ideal to begin to kind of project. +They're relatively affordable, and there are tons of robot-making with Arduino. +What is great with the brand is the ecosystem. + +All the libraries and the IDE, but we can't use that in Rust and that is where avr-hal has you covered. +We're using an eight-bit microcontroller. +One of the Uno and the Nano is to make it - it has a general-purpose input and output pins - those little holes, and all the protocols. +What do I mean by that? Is you're not programming all the board, but you are programming the microcontroller. + +This one. +You can see here on this board. +There is Arduino and Arduino, as we can say, when you order your kit, it will come with a board. +It is a clone, and it works as well as a regular to show for this kind of project. + +I will now tell you about the Servo motors, but also the timers which are very important in this kind of project. +I will of course show you what I mean with an animation at the end of the slide. +So the Servo motor is a simple rotating motor, but we do not need it to go all the way to the end of the rotation, so we can think about it as a light dimmer, we use something called pulse modulation. +If you have a romantic night with your partner, you need to control the light, right? This is what we are going to do. + +We're going to control the duty cycle. +They're nothing mystical. +It's the fraction of time where the signal is active. + +In other words, we're going to tell the microcontroller how long do we want the signal to be active? Now, all microcontrollers have an internal clock and it is 16 megahertz. +It defines a period of time of one divided by frequency. +So, one divided by 16 million is really fast, 16 nanoseconds. +We can't work with that. + +Even if you multiply two of the power of eight which is the size of the timer register, it will overflow in a few milliseconds. +And now is a good time to share a big error number two, on the micro control, the timers do not all have the same size, you must make sure that you are doing the calculation with the right timer, or the right size. +If nothing working and your calculations are wrong, it might be the cause. +We are going to reduce the frequency by dividing it by 1,024, and we can then work with a cycle of 16 milliseconds. + +Why do we do that? Most Servo motors have a frequency of 60 Hertz in short duty cycles. +For example, to centre it, you need to set it high for 1.5 milliseconds, so 1.5 divided by 16, times the size of the register is 24 ticks, so, let's go and look at some code. +So this is for the Servo motor. + +First, the magic numbers that we calculated together. +To centre it, we define the time times the register. +Then we declare immutable timer, immutable pin. + +You notice that it is prescaled with a factor of 1,024, and the pin is d3 and then we enable it. +So how do we know how to do that? We can go into the documentation. +You can see here that I just follow the documentation. + +Please note that it is very important to choose them right, the pins, because they're hard wired, actually, so this is my big error number three: so timer two that I'm using it hard wired to win d3. + +If you're going to use a timer for the Servo, you have to make sure you're using the right pin. +If we go back to the code, here, you see that I just have a mutable delay to make the rotation not too fast, and then we just set the duty to 24 - sorry, to 16, we wait a bit, and we centre it again, and then we wait a bit, 400 milliseconds, and then we put it to the left, and I'm going to show how it looks like. +Now the sensor. + +So you can think of the sensor as a bat. +A bat sends sound waves every now and then and waits for them to return to calculate how far is it from an obstacle. +This is exactly how it works. + +We need to send sound waves about every 100 milliseconds. +The sensor has a trigger and an echo. +The trigger turns it on, and senses the sound wave. + +When the obstacle is met, the sound wave will bounce on it and return as the echo, and we will measure the length of it. +There are many details, but I will just cover the ones I will show in the code. + +We would use a timer, and another timer, timer 1. +This one does not need as much prescaling as timer 2 that we use for the Servo, so we will just make it 64 times slower. + +So, another magic number that I would like to explain is 58 that you will see in the code. +When the sound is travelling 340 metres in one second, it will bounce on an obstacle and come back, so we need to calculate 34,000 divided by 2 in milliseconds. +So, it's going to be 0.017 which is the same as one divided by 58. + +Also, every tick is four milliseconds, so you will see a multiplication by four that is suspicious. +You do not need to pay attention to all those details. +I just explained the magic numbers because I know some of you are interested in them. +This is the code for the sensor. + +So we are using timer 1 which is 16 bit, and we are prescaling it with a factor of 64 here. +We declare mutable trigger that I connected to pin d12 and configured into output. +All pins are input by default, and that's why you need to configure it to an output, and then you need to declare an echo. +I collected it to pin d11. + +You don't need to configure it into input, because they're all input by default, and then you don't need to have it mutable, because we are just going to monitor how long it is high. +Those in the comments are commands to get your console working. + +So you can get the address of your Arduino, and then, if you type "screen", then the tty, and you can see how everything is showing on the screen. +To see things on the screen, you will need the serial, so, we've a receiver, a sender, and a baud rate, so this is. +This is nothing to worry about. +This is described in the documentation. +If you just copy-paste the declaration of a serial, it's going to work. + +And then, in an infinite loop, we are going to write zero to the timer, set the trigger high for ten microseconds and set the trigger low again. +This is going to send the sound wave. + +Then we have to manage an eventual error with the hardware, so, if we have waited for more than 50,000 ticks, it means that we have waited for more than 200 milliseconds, so this is probably an error, so we need to use, to exit the loop, since Rust is allowing us to name loops, we continue to work, we continue with the outer loop. +If not, if we have detected something, we just write zero to the timer register, and then we monitor how long the echo is high. +It means that we don't do anything while the echo is high. + +And then we get the number of ticks in the timer register divided by the magic number 58, and mull apply it by four, because the unit is four milliseconds. +And then we wait 100 milliseconds between the sound waves, so 100 milliseconds is corresponding to 25,000 ticks. +And, at last, we print on the screen how far we are from the target. + +Now, I want to show the motor driver. +The first time you will see a motor move, because, you decided it, you will be hooked. + +Arduino does not have enough power to move the motors, so we connect it to the driver. +And connect the logical pins to the Arduino. +Which means that you will be plugging the cable for the wheels in those two. +Those are to communicate with the battery, and this is a five-volt logical pin that I'm going to use in the demo. +There is also an enable pin to control the speed, but this will be for Rust-y 2.0. + +So now it's time for the walk-through, and talking a bit about no-std. +Why do we have to work with no-std, which is Rust non-standard code? On the, there is no OS, which means we need to do it ourselves, and to indicate to the compiler that we are going to work with a no-std and no name. +We also need to build with cargo the nightly build to indicate that we are going to use Nightly. + +To get the cargo from the configuration, you can again go to the documentation here, and everything is explained in 0.5. +This is what you need for your cargo file, so we can go back to the code. +So, we are going to, because we are in no-std, we are going to need to import a panic handler, and `panic_halt` here, and those two are the crates I'm importing to make it work. + +Those three crates are modules that I used to separate my code when I was refactoring, because I felt that it would be more clear, and also because I was training with Rust's data structures. +To make it work, you will need some constants. + +How long do you want our bot to wait between actions, the mini distance you want to have between itself and an obstacle, and what is an acceptable distance to make an alternative choice? +So this macro is an attribute macro, since we are working no-std, we have to assume that is the point of entry of the code, and the exclamation mark here is never type, which means nothing should return from this function. + +So we start by downloading everything. +We download everything we have on the MCU. +And then we collapse all the pins into a single variable that we are going to use here. +This is the general timer that has been prescaled by a factor of 64 that we are going to use mostly with the sensor, but also as a general time-checker for the whole project, and then the timer2, and it is pin d3 that we are going to use for the cell. + +I created the Servo unit which was to work with Rust structures. +You do not need to do that. +It's going to work fine. + +But then I connected those logical pins to d7, d5, d6, and d4, and that I have them long variable names to refer to each wheel. +Then those pins can be downgraded. +Downgraded means they can be put in a mutable array that we can send to other modules to modify them. +But, wheels is still the owner of those wheels. + +And then the infinite loop that is going to control the robot, it is still called outur: loop. +It starts with the Servo unit that is rotated to the front, and then the wheels are going to move forward. +We are reading the value with the sensor continuously, but if the value is smaller than the minimal distance that we decided, then we are going to stop the wheels, and I'm going to show a bit later how to stop the wheels, and then check the distance on the right. +We are going to turn the Servo to the right, get the value here, wait between to interaction and then do the same for the left, and the rest is just - if the value is bigger on the left than the right, and it's an acceptable distance, like there is not another obstacle here, then we're going to turn the wheels left and then continue to the outer, that is, go forward. +Else, if the value on the right is better than, then we are going to turn right, and then continue to the outer loop. +Else, we're just going to go backwards. +And turn right. + +Going back to show the model, I think this is the only thing that I didn't show. +For we can decide the constant for how long do we want our car to turn? And moving forward, it just receiving a mutable reference to wheels. +This type seems really, really long, but you know how you do with Rust, when you don't know a type, you just declare another type, the compiler will complain and give you the right type, and you can just copy-paste it. + +I did some unpacking here. +I put the wheels into a new array to make sure that I wrote it correctly, and then you just need, when you go forward, you just need to set forward the motion, the left and right forward motion high, and the right and back motion low. +To turn right, you need to stop the wheels, that is exactly the - to stop the wheels, you just need to set all the pins low, right? I just removed it from the presentation for clarity. +And to turn right, you have to set the left forward wheel high, and the right forward wheel low for an amount of time, so, if you move the left wheel, the robot is going to turn to the right. +You need to know where to find help. + +Actually, if there is one thing you must get from this talk, it is where to find help. +The Rust community's very welcoming, and one of their core values is to provide a safe, friendly, and welcoming environment. +This is a community in which I felt safe and comfortable from day one. + +You can ask any question on the community forum. +Overall, people have been providing me with technical consultancy as well as psychological support since the start of my adventures in Rust. +When I arrived to matrix, people realised I didn't do anything and sent me to do homework but helped me anyway. +I want to thank my mentor and avh-hal. + +That's it. +Thank you very much for your attention. +All the project is on GitHub, so please don't hesitate to do whatever you want with it, and show me what you did. +You can ask me any question you like. + +That's it, and thank you again, and it's time for the surprise. + +**Lyrics:** + +* they see me rollin'. +* see me riding Rust-y. +* wanna see me riding Rust-y! +* thinking cool to ride Rust-y. See you riding Rust-y. +* wanna see me riding Rust-y p +* showing, moving, grooving, want to see me riding Rust-y. +* wanna see me riding Rust-y. +* now that I'm riding Rust-y. +* want to see me - want to see me riding Rust-y. + +**Pilar**: + +That was incredible! It's so good that you all could not see me during the talk, because it was just me grinning from ear to ear and clapping my hands off! I said I was excited about this talk, but, wow! + +Thank you so much, Aïssata. That was an incredible talk, and, yes, like, you mentioned the community at the end, and your love for it, and your being such an integral part of it, at least to me, shows so much, because it's that spirit - like you just held our hand through all of that. +If somebody wanted to try that out, it's like I messed up here, there and, you know? Thank you so much. + +That was really, really great. +And as a special treat, I mean, besides the amazing ending, Aïssata is here to join us for a live Q&A, so I'm going to add her on now, and we had a lot of questions in the chat, so we will try to get through as many of those as we can. + +So, hi, you're live with us now. + +**Aïssata:** +Thank you very much. Fantastic. + +**Pilar:** +You had to watch your own talk. I don't know how you could do that. +Personally, I can't! Don't ask me to! What an amazing talk, really. I was so excited. +Absolutely called dibs on introducing, and being here for this talk. + +**Aïssata:** +I say something about the sound, and I jump from 340 to 334,000, and because, it was really weird. +It's because I was talking in metres, and then centimetres, and then I forgot to do that, to explain it? And then, oh, I don't even know what to say, and I saw that in my comments, I write something about the God bat. +That was embarrassing. Like, you know ... how to be a God bat. I meant the sensor is working like a bat, and well, let's just forget that. + +**Pilar:** +I mean, if it is fair at all, I think it was very clear. I know we are multi-cultured, and everything, and everyone might not be on the same technological or English-speaking level, +but I thought you two things you mentioned were fairly clear, but thank you for clearing that up too. + +**Aïssata:** +Speaking about the community, because, instead of thinking about the bad things, the good things, that is true, when I joined the Rust community, +it was the only time that I used my own name and my own picture on the internet, and I never do that, because, you know, you're always afraid of mean comments, and abuse, but, yes, like from day 1, it never happened. +It never happened, and that I've felt so welcomed anywhere. I think I made pull requests after a week. + +**Pilar:** +That's amazing. I know people who have been in the industry for years and years and they're too scared to make PRs to put their stuff out there, so that is really cool. It's so great that you actually went for it, and that you felt safe and comfortable to do so. I hope, you know, that's why I love your mentoring work as well, you mentioning that and sharing so much with us because you're encouraging other people to feel safe, go for it, also try it, and that is amazing. +Thank you so much. +Do you mind if I go into a couple of questions that were on the chat? + +**Aïssata:** +No, please, yes. My child is here! + +**Pilar:** +Don't worry about it. We are all at home and in it together. +I think a couple of fun ones for you first. +So, you know, there was kind the line of thinking why robots? +Is it hard to start off with something that is embedded, and what your next planned robot is? +I think people are very fascinated by the topic of embedded, and robots, so, please, like I saw how excited you are for it. + +**Aïssata:** +Yes, okay, so what I really wanted to von way with this talk was that it is not that hard. +You have to have help, and, if you have the right help, it's not that difficult, and the most difficult is really do you I start with it? This is what I wanted to show in my talk: how do I get started? +I'm pretty sure you have your own ideas and objectives, you have your own crazy stuff that you want to implement, talking robots, so I don't know. +So I just wanted to be sure. +The confidence that I'm using, how do they work, so we can put them to this. I don't know, maybe you want to have a fridge that comments if you open it at night. +This is something you can do with this stuff. But, when ... + +**Pilar:** +I like how you mention where to get things and how to spot fakes. Watching your talk, I want to do something too. I want to buy components. + +**Aïssata:** +It's easy, and it's really cool and fun and easy. Please go for it. You have to show me. I want to see. + +**Pilar:** +Thank you for that. Any future plans for more robots. + +**Aïssata:** +I've brought my robot skull. As you can see, it's not done, it can't be closed. +But this is also, you know, you can do that with whatever small board you want, and then a small sensor, and then it reacts to the sound, but also to shock. +If you turn the battery on first, because then the demo, the demo is not going to work. And you can put some LED into it, and program that thing with for loops. + +**Pilar:** +That is so cool. Thank you for bringing it! Wow! That's amazing! I hope people are tuning into the Q&A to get to see this! That's so cool! Wow. Thank you. + +**Aïssata:** +Thank you, too. I'm all sweaty, and happy! + +**Pilar:** +That's just part of being in the community, and part of getting to partake in this. So, I think I wanted to jump on to more technical questions, but I think we actually have to go to the next talk, but you showed us the next project, which is super cool. Thanks so much for joining us. Thank you for everything. + +**Aïssata:** +Thank you, everyone! + +**Pilar:** +It's been amazing. Thank you for being here. We will see you in the chat, right? + +**Aïssata:** +Yes, I will stay here. We will be in the chat. + +**Pilar:** +We will see you in the chat. Exactly. Oh, darn, technical thing. I'm so sorry! [Laughter]. + +**Aïssata:** +It's okay. + +**Pilar:** +See you, then. + +**Aïssata:** +See you. diff --git a/2020-global/talks/02_UTC/02-Aissata-Maiga.txt b/2020-global/talks/02_UTC/02-Aissata-Maiga.txt deleted file mode 100644 index 3689f92..0000000 --- a/2020-global/talks/02_UTC/02-Aissata-Maiga.txt +++ /dev/null @@ -1,38 +0,0 @@ -Build your own (Rust-y) robot - Aissata Maiga -PILAR: Hello, everyone. So, I hope you - wait, I'm going to remove this off my face. That's the talk that's coming up. I hope you all enjoyed that first talk. I thought it was a really great way to set the day. It was a good way to start the mood and get us excited for what is coming up. So, up next, as you may have seen in the chat, on the schedule, and in our little announcement bubble over here - I'm not very used to that! - is Aïssata Maiga. She is giving a talk which I'm so, so excited about. Excuse my robotic delivery! Aïssata is a computer science student at the Royal Institute of Technology, and she, like, if you've seen her activity, she's just absolutely passionate about getting people excited about code, which is something that just it is we are on the same wavelength there. She mentors a programming club for women, and it's so cool because it's for mothers and daughters, and you will be dazzled by her talk. I'm going to let her get down to it. I'm also going to bring our bard to share more about Aïssata's talk. I will see you after the talk for Q&A, and I hope you're enjoying. -> Aïssata Maiga lets me know how to make bots without Arduino, making Rust to move to the groove, sure, there will be some cool stuff to see, no? -AÏSSATA: Hello, I'm Aïssata Maiga, just your regular computer science student and I live in Sweden. I discovered Rust this summer and fell in love with it. So, let's just start. This presentation will be about making a robot in Rust and working with no std. It is a fun project for you to try for yourself and children. It's also very easy. The most intimidating part is to get started and order stuff from the internet. I will show you the robot, and then explain everything you need to know about every part of the code, and share a lot of mistakes I've made, and then there will be a little surprise. I used the avr-hal, and I got a lot of help. It has great documentation, a lot of templates, how to start your own project, and of course how to use your cargo file and basic templates, but it also has many examples for every avenue of work. For example, I'm using uno, and, if you go to "examples", you can see how to blink an LED, which is the "hello, world" of Arduino systems. It really works great, and I would recommend it heartfully. The components first. With time, you will notice that all components are standard and pretty much the same, but the easiest way to get started is just to buy a kit. Many are available online, Amazon. If you Google "smart car", you will see a bunch of suppliers that you can choose. The cheapest start at ten or 15 euros. If you want to assemble, it's the same. If you just Google "assembly instructions smart car Arduino", you will have a lot of good videos, and I link them in my repository. A word of caution, and a good opportunity to share my big error number one. In most assembly videos and on the rep on. There are schematics that follow. You must be careful to follow them and you must plug them as they look on the image, but the most important is to make sure that the circuit is grounded, that means that all ground cables are connected, that the circuit has a common ground. If not, bad things will happen, and bad things also called "undefined behaviour", so if you're here and nothing is working and you're getting frustrated, just check if everything has a common ground. Arduino is ideal to begin to kind of project. They're relatively affordable, and there are tons of robot-making with Arduino. What is great with the brand is the ecosystem. All the libraries and the IDE, but we can't use that in Rust and that is where avr-hal has you covered. We're using an eight-bit microcontroller. One of the Uno and the Nano is to make it - it has a general-purpose input and output pins - those little holes, and all the protocols. What do I mean by that? Is you're not programming all the board, but you are programming the microcontroller. This one. You can see here on this board. There is Arduino and Arduino, as we can say, when you order your kit, it will come with a board. It is a clone, and it works as well as a regular to show for this kind of project. I will now tell you about the Servo motors, but also the timers which are very important in this kind of project. I will of course show you what I mean with an animation at the end of the slide. So the Servo motor is a simple rotating motor, but we do not need it to go all the way to the end of the rotation, so we can think about it as a light dimmer, we use something called pulse modulation. If you have a romantic night with your partner, you need to control the light, right? This is what we are going to do. We're going to control the duty cycle. They're nothing mystical. It's the fraction of time where the signal is active. In other words, we're going to tell the microcontroller how long do we want the signal to be active? Now, all microcontrollers have an internal clock and it is 16 megahertz. It defines a period of time of one divided by frequency. So, one divided by 16 million is really fast, 16 nanoseconds. We can't work with that. Even if you multiply two of the power of eight which is the size of the timer register, it will overflow in a few milliseconds. And now is a good time to share a big error number two, on the micro control, the timers do not all have the same size, you must make sure that you are doing the calculation with the right timer, or the right size. If nothing working and your calculations are wrong, it might be the cause. We are going to reduce the frequency by dividing it by 1,024, and we can then work with a cycle of 16 milliseconds. Why do we do that? Most Servo motors have a frequency of 60 Hertz in short duty cycles. For example, to centre it, you need to set it high for 1.5 milliseconds, so 1.5 divided by 16, times the size of the register is 24 ticks, so, let's go and look at some code. So this is for the Servo motor. First, the magic numbers. First, the magic numbers that we calculated together. To centre it, we define the time times the register. Then we declare immutable timer, immutable pin. You notice that it is prescaled with a factor of 1,024, and the pin is d3 and then we enable it. So how do we know how to do that? We can go into the documentation. You can see here that I just follow the documentation. Please note that it is very important to choose them right, the pins, because they're hard wired, actually, so this is my big error number three: so timer two that I'm using it hard wired to win d3. If you're going to use a timer for the Servo, you have to make sure you're using the right pin. If we go back to the code, here, you see that I just have a mutable delay to make the rotation not too fast, and then we just set the duty to 24 - sorry, to 16, we wait a bit, and we centre it again, and then we wait a bit, 400 milliseconds, and then we put it to the left, and I'm going to show how it looks like. Now the sensor. So you can think of the sensor as a bat. A bat sends sound waves every now and then and waits for them to return to calculate how far is it from an obstacle. This is exactly how it works. We need to send sound waves about every 100 milliseconds. The sensor has a trigger and an echo. The trigger turns it on, and senses the sound wave. When the obstacle is met, the sound wave will bounce on it and return as the echo, and we will measure the length of it. There are many details, but I will just cover the ones I will show in the code. We would use a timer, and another timer, timer 1. This one does not need as much prescaling as timer 2 that we use for the Servo, so we will just make it 64 times slower. So, another magic number that I would like to explain is 58 that you will see in the code. When the sound is travelling 340 metres in one second, it will bounce on an obstacle and come back, so we need to calculate 34,000 divided by 2 in milliseconds. So, it's going to be 0.017 which is the same as one divided by 58. Also, every tick is four milliseconds, so you will see a multiplication by four that is suspicious. You do not need to pay attention to all those details. I just explained the magic numbers because I know some of you are interested in them. This is the code for the sensor. So we are using timer 1 which is 16 bit, and we are prescaling it with a factor of 64 here. We declare mutable trigger that I connected to pin d12 and configured into output. All pins are input by default, and that's why you need to configure it to an output, and then you need to declare an echo. I collected it to pin d11. You don't need to configure it into input, because they're all input by default, and then you don't need to have it mutable, because we are just going to monitor how long it is high. Those in the comments are commands to get your console working. So you can get the address of your Arduino, and then, if you type "screen", then the tty, and you can see how everything is showing on the screen. To see things on the screen, you will need the serial, so, we've a receiver, a sender, and a because rate, so this is - *and a baud rate. This is nothing to worry about. This is described in the documentation. If you just copy-paste the declaration of a serial, it's going to work. And then, in an infinite loop, we are going to write zero to the timer, set the trigger high for ten microseconds and set the trigger low again. This is going to send the sound wave. Then we have to manage an eventual error with the hardware, so, if we have waited for more than 50,000 ticks, it means that we have waited for more than 200 milliseconds, so this is probably an error, so we need to use, to exit the loop, since Rust is allowing us to name loops, we continue to work, we continue with the outer loop. If not, if we have detected something, we just write zero to the timer register, and then we monitor how long the echo is high. It means that we don't do anything while the echo is high. And then we get the number of ticks in the timer register divided by the magic number 58, and mull apply it by four, because the unit is four milliseconds. And then we wait 100 milliseconds between the sound waves, so 100 milliseconds is corresponding to 25,000 ticks. And, at last, we print on the screen how far we are from the target. Now, I want to show the motor driver. The first time you will see a motor move, because, you decided it, you will be hooked. Arduino does not have enough power to move the motors, so we connect it to the driver. And connect the logical pins to the Arduino. Which means that you will be plugging the cable for the wheels in those two. Those are to communicate with the battery, and this is a five-volt logical pin that I'm going to use in the demo. There is also an enable pin to control the speed, but this will be for Rust-y 2.0. So now it's time for the walk-through, and talking a bit about no-std. Why do we have to work with no-std, which is Rust non-standard code? On the, there is no OS, which means we need to do it ourselves, and to indicate to the compiler that we are going to work with a no-std and no name. We also need to build with cargo the nightly build to indicate that we are going to use Nightly. To get the cargo from the configuration, you can again go to the documentation here, and everything is explained in 0.5. This is what you need for your cargo file, so we can go back to the code. So, we are going to, because we are in no-std, we are going to need to import a panic handler, and panic_halt here, and those two are the crates I'm importing to make it work. Those three crates are modules that I used to separate my code when I was refactoring, because I felt that it would be more clear, and also because I was training with Rust's data structures. To make it work, you will need some constants. How long do you want euro bot to wait between actions, the mini distance you want to have between itself and an obstacle, and what is an acceptable distance to make an alternative choice? So this macro is an attribute macro, since we are working no-std, we have to assume that is the point of entry of the code, and the exclamation mark here is never type, which means nothing should return from this function. So we start by downloading everything. We download everything we have on the MCU. And then we collapse all the pins into a single variable that we are going to use here. This is the general timer that has been prescaled by a factor of 64 that we are going to use mostly with the sensor, but also as a general time-checker for the whole project, and then the timer2, and it is pin d3 that we are going to use for the cell. I created the Servo unit which was to work with Rust structures. You do not need to do that. It's going to work fine. But then I connected those logical pins to d7, d5, d6, and d4, and that I have them long variable names to refer to each wheel. Then those pins can be downgraded. Downgraded means they can be put in a mutable array that we can send to other modules to modify them. But, wheels is still the owner of those wheels. And then the infinite loop that is going to control the robot, it is still called outur: loop. It starts with the Servo unit that is rotated to the front, and then the wheels are going to move forward. We are reading the value with the sensor continuously, but if the value is smaller than the minimal distance that we decided, then we are going to stop the wheels, and I'm going to show a bit later how to stop the wheels, and then check the distance on the right. We are going to turn the Servo to the right, get the value here, wait between to interaction and then do the same for the left, and the rest is just - if the value is bigger on the left than the right, and it's an acceptable distance, like there is not another obstacle here, then we're going to turn the wheels left and then continue to the outer, that is, go forward. Else, if the value on the right is better than, then we are going to turn right, and then continue to the outer loop. Else, we're just going to go backwards. And turn right. Going back to show the model, I think this is the only thing that I didn't show. For we can decide the constant for how long do we want our car to turn? And moving forward, it just receiving a mutable reference to wheels. This type seems really, really long, but you know how you do with Rust, when you don't know a type, you just declare another type, the compiler will complain and give you the right type, and you can just copy-paste it. I did some unpacking here. I put the wheels into a new array to make sure that I wrote it correctly, and then you just need, when you go forward, you just need to set forward the motion, the left and right forward motion high, and the right and back motion low. To turn right, you need to stop the wheels, that is exactly the - to stop the wheels, you just need to set all the pins low, right? I just removed it from the presentation for clarity. And to turn right, you have to set the left forward wheel high, and the right forward wheel low for an amount of time, so, if you move the left wheel, the robot is going to turn to the right. You need to know where to find help. Actually, if there is one thing you must get from this talk, it is where to find help. The Rust community's very welcoming, and one of their core values is to provide a safe, friendly, and welcoming environment. This is a community in which I felt safe and comfortable from day one. You can ask any question on the community forum. Overall, people have been providing me with technical consultancy as well as psychological support since the start of my adventures in Rust. When I arrived to matrix, people realised I didn't do anything and sent me to do homework but helped me anyway. I want to thank my mentor and avh-hal. That's it. Thank you very much for your attention. All the project is on GitHub, so please don't hesitate to do whatever you want with it, and show me what you did. You can ask me any question you like. That's it, and thank you again, and it's time for the surprise. -* they see me rollin'. -* see me riding Rust-y. -* wanna see me riding Rust-y! -* thinking cool to ride Rust-y. See you riding Rust-y. -* wanna see me riding Rust-y p -* showing, moving, grooving, want to see me riding Rust-y. -* wanna see me riding Rust-y. -* now that I'm riding Rust-y. -* want in a see me - *want to see me riding Rust-y. -PILAR: That was incredible! It's so good that you all could not see me during the talk, because it was just me grinning from ear to ear and clapping my hands off! I said I was excited about this talk, but, wow! Thank you so much, Aïssata. That was an incredible talk, and, yes, like, you mentioned the community at the end, and your love for it, and your being such an integral part of it, at least to me, shows so much, because it's that spirit - like you just held our hand through all of that. If somebody wanted to try that out, it's like I messed up here, there and, you know? Thank you so much. That was really, really great. And as a special treat, I mean, besides the amazing ending, Aïssata is here to join us for a live Q&A, so I'm going to add her on now, and we had a lot of questions in the chat, so we will try to get through as many of those as we can. So, hi, you're live with us now. -> Thank you very much. Fantastic. -PILAR: You had to watch your own talk. I don't know how you could do that. Personally, I can't! Don't ask me to! What an amazing talk, really. I was so excited. Absolutely called dibs on introducing, and being here for this talk. -> I say something about the sound, and I jump from 340 to 334,000, and because, it was really weird. It's because I was talking in metres, and then centimetres, and then I forgot to do that, to explain it? And then, oh, I don't even know what to say, and I saw that in my comments, I write something about the God bat. That was embarrassing. Like, you know ... how to be a God bat. I meant the sensor is working like a bat, and well, let's just forget that. -PILAR: I mean, if it is fair at all, I think it was very clear. I know we are multi-cultured, and everything, and everyone might not be on the same technological or English-speaking level, but I thought you two things you mentioned were fairly clear, but thank you for clearing that up too. -> Speaking about the community, because, instead of thinking about the bad things, the good things, that is true, when I joined the Rust community, it was the only time that I used my own name and my own picture on the internet, and I never do that, because, you know, you're always afraid of mean comments, and abuse, but, yes, like from day 1, it never happened. It never happened, and that I've felt so welcomed anywhere. I think I made pull requests after a week. -PILAR: That's amazing. I know people who have been in the industry for years and years and they're too scared to make PRs to put their stuff out there, so that is really cool. It's so great that you actually went for it, and that you felt safe and comfortable to do so. I hope, you know, that's why I love your mentoring work as well, you mentioning that and sharing so much with us because you're encouraging other people to feel safe, go for it, also try it, and that is amazing. Thank you so much. Do you mind if I go into a couple of questions that were on the chat? -> No, please, yes. My child is here! -PILAR: Don't worry about it. We are all at home and in it together. I think a couple of fun ones for you first. So, you know, there was kind the line of thinking why robots? Is it hard to start off with something that is embedded, and what your next planned robot is? I think people are very fascinated by the topic of embedded, and robots, so, please, like I saw how excited you are for it. -> Yes, okay, so what I really wanted to von way with this talk was that it is not that hard. You have to have help, and, if you have the right help, it's not that difficult, and the most difficult is really do you I start with it? This is what I wanted to show in my talk: how do I get started? I'm pretty sure you have your own ideas and objectives, you have your own crazy stuff that you want to implement, talking robots, so I don't know. So I just wanted to be sure. The confidence that I'm using, how do they work, so we can put them to this. I don't know, maybe you want to have a fridge that comments if you open it at night. This is something you can do with this stuff. But, when ... -> I like how you mention where to get things and how to spot fakes. Watching your talk, I want to do something too. I want to buy components. -> It's easy, and it's really cool and fun and easy. Please go for it. You have to show me. I want to see. -PILAR: Thank you for that. Any future plans for more robots. -> I've brought my robot skull. As you can see, it's not done, it can't be closed. But this is also, you know, you can do that with whatever small board you want, and then a small sensor, and then it reacts to the sound, but also to shock. If you turn the battery on first, because then the demo, the demo is not going to work. And you can put some LED into it, and program that thing with for loops. -PILAR: That is so cool. Thank you for bringing it! Wow! That's amazing! I hope people are tuning into the Q&A to get to see this! That's so cool! Wow. Thank you. -> Thank you, too. I'm all sweaty, and happy! -PILAR: That's just part of being in the community, and part of getting to partake in this. So, I think I wanted to jump on to more technical questions, but I think we actually have to go to the next talk, but you showed us the next project, which is super cool. Thanks so much for joining us. Thank you for everything. -> Thank you, everyone! -PILAR: It's been amazing. Thank you for being here. We will see you in the chat, right? -> Yes, I will stay here. We will be in the chat. -PILAR: We will see you in the chat. Exactly. Oh, darn, technical thing. I'm so sorry! [Laughter]. -> It's okay. -PILAR: See you, then. -> See you. -PILAR: Just wanted to add a quick note. Our replays are already working, so hop on to the next talk, and we will see you there in a bit. Enjoy, everyone. diff --git a/2020-global/talks/02_UTC/03-Vivian-Band.md b/2020-global/talks/02_UTC/03-Vivian-Band.md new file mode 100644 index 0000000..cb456bc --- /dev/null +++ b/2020-global/talks/02_UTC/03-Vivian-Band.md @@ -0,0 +1,246 @@ +**Rust for Safer Protocol Development** + +**Bard:** + +Vivian wants us to be safe +and our code on the web to behave +use Rust to generate +code that will validate +risky inputs, no need to be brave + +**Vivian** + +Hello, my name is Vivian Band. +I'm a second-year PhD at Glasgow University studying network security. +I was on the safer protocol development project. + +So, improving protocol standards: the Internet Engineering Task Force standardises network protocols. +These are initially presented as drafts to working groups and then become official standards after further review. +However, even after all of this peer review from several different sources, mistakes still sometimes appear in these documents. + +For example, in the image on the right, the ASCII diagram shows the option real port is 13 bits long and 19 bits long, but the text description says these should be 16 bits in length. +These create ambiguity for people implementing protocols. + +What the improving protocols standard aims to do is to provide a machine-readable ASCII format to detect these inconsistencies much more easily. +These are minimally different from the format using existing diagrams with authors using consistent label names and certain specific stock phrases in their descriptions. +These machine-readable documents allow us to build a typed model of protocol data. + +We call this custom typed system developed as part of our project, network packet representation. +Network packet representation is program-agnostic. + +I had used Rust earlier to implement a bear-bones version of the protocol a few years ago on my final-year undergrad project and I was impressed how much safety it added to the system's programming. +Our first automatically generated libraries would be rainfalls files because we wanted resulting is have a good level of type safety. + +Okay, so, first of all, let's take a step back and take a look at which types we need to describe network protocols. +Before we can start building parsers and parser combinators. + +I use a lot of analogies when learning new concepts, so I like to think of these basic types like Lego bricks. +There are several basic types that we can identify from protocol standard documents and we will take a TCP header to demonstrate this. +Fields which contain raw data can be modelled, so source port is just 16-bit unsigned integer. +That's just raw data. + +Fields which could contain one or more of the same element could be modelled as an array. +Some fields only appear under certain conditions, and rely on values from other fields within the same protocol data unit, in this case, the TCP header, to establish whether we're using that or not. +We can call these constraint types since they need to be checked. + +Some fields require information not contained within this packet, like an encryption key, or a value from another data unit somewhere in this draft. +We can hold this information in a context-data type which can be received by other protocol data units which also feature in this draft if required. +A field which can take on one of a limited possible set of values can be modelled as an enum, indicated in each drafts with the stock phrase, a TCP option is one of, so "is one of" is the key phrase we need to use in the modified standard documents. + +Packets and protocol units as a whole can be considered as structure-like types given they contain field type as constituent type members. +One that doesn't feature in TCP is the function data type. +These are required to form congresses between different data unit, in this case, encryption and decryption of packet headers. + +We've got seven types in total in our network packet representation system. +Bit strings, arrays structs - arrays, contexts, and functions. + +Let's get to the fun stuff. +Automatic Rust parser generation. + +We've got our basic building blocks sorted out. +How can they be used for the complex combinators in Rust? + +Let's go to the bit string when we were explaining our custom types. +We can automatically generate this as a wrapper under an unsigned 16-bit integer in a Rust output file easily. +Immediately after that, we can generate a non-based parser for that type. +This is a little bit more difficult to generate. + +There is a lot going on here, so we will highlight a few key details. +Our first argument for all our parser function assist an input tuple, a borrowed array of bytes which would be an incoming packet of some source. +Our parsers work at the bit level so our second tuple level is how many bits we've read in the current bite. + +Our second argument is the mutable borrow from the context instance since we might want to update.- our outputs are non-specific result type containing the remaining bytes left to be parsed, an updated bit counter, instantiated with the correct value read from the bite array. +We also return mutable reference to our possibly updated context. + +The parser function itself takes a defined number of bits from the input byte array, this this case, it will take 16 bits. +It assigns the value of those taken bits to the custom bus type as needed. + +The order in which we generate these custom types and parsers in the Rust output file is determined by the search. +We generate a custom type and parser whenever we reach a leaf node and generate the combinator when there are no more leaf nodes found for that parent. +The overall protocol data unit is a TCP header which is a struct type in our custom network packet representation system, so this is the root of the depth of the search tree, and will generate the password combinator. + +The first parser will be for source port which is a 16-bit long bit string which was the parser we walked through earlier. +Bit strings are leaf nodes so we move to the next child destination port. +This also a bit string and therefore a leaf node so we write a custom type in a 16-bit parser for this. + +The first non-bit string being counter in TCP header is options which is an array type. +The elements which could be present in the options array are TCP options. +TCP options is an enum type with a limited range of possible choices. +Each of those enum variants are described in their own small ASCII diagrams in another section of the same document. +This makes each enum variant a struct type in our network packet representation system in this case EOL option is a struct. + +The value of the field in this ASCII diagram is a bit string. +This means we are finally reached the leaf node and we can write a custom Rust-type definition and a custom parser, and a Rust-type definition and a parser for its parent node, EOL option. +We find that there are more TCP option variants so we repeat this process for each one. +Once we have written parsers for all of the variants, we can write the Rust type definition and parser combinator for the parent nodes and TCP options. +The last in the packet is the pay loads which we can parse as a bit string. + +Finally, we write the Rust-type definition ... in one function call. +We also create a context object which all parser functions have access to. + +So, to recap the system that we developed in this project, we have the machine-readable protocol document at stage one with our minimal changes to ASCII diagrams and text descriptions. +We have the custom protocol typing system developed in teenage 2, our network packet representation language, and in stage 3, we have the results of the internship ... +a Rust library file automatically generated from the information we have in stage 2. + +Remember earlier when I mentioned that I think of these basics types and parsers as building blocks? To go further with that analogy as quickly as possible a TCP header is like this Lego block. +It is difficult to build manually without making mistakes. + +Our generated parser libraries are not only a manual explaining how this data should be parsed, they also allow protocol developers to build the struct with extracted values with a single function code. +This is ideal for protocol testing. +The picture on the left is a genuine sample of our generated TCP parser code from our modified TCP document. + +So, conclusions: initially, I decided on Rust as our first parser output language, because I enjoyed Rust for systems programming on a previous project. +Using parser combinators turned out to be an ideal fit since assigning them to network protocols both used depth of search. +Parsers can be difficult to write manually and are prone to containing errors. + +Automatically generating parsers minimises the chance of some of these errors occurring, for example, the number of bits being read will always match the specification. +The typing guarantees offered by Rust will help us ensure we get the machine-readable specification document, and in our network packet representation system. +If there are errors, the Rust compiler will alert us to this. + +The next steps: this project is still ongoing, and there are more directions that this research can go in. +We are aiming to show our system to the IIETF. +We need to put in more work on function types so we can create encryption and decryption functions for protocols like QUIC which heavily rely on this. +We would like to use the Rust libraries for protocol and error correction to support more protocol languages in the future. +Resources for this project can be found at these links. +We have a peer-reviewed publication which goes into more detail about our network packet representation typing system and a GitHub repository containing the codes for all automatic Rust parser generator. + +Thank you for your time, and I would be happy to answer any questions. + +**Vivian:** +That was brilliant. Loved it! [Laughter]. Thanks so much. + +**Stefan:** +Thank you. +I know we have 25 to 40 second-delay to the stream, so, just to get ahead of time, I have two questions if you don't mind. +The first one is, there is a push for native implementation of the networking type, so the Rust standard library doesn't Lewis LIPSE any more but directly operates with system calls. Do you think that will affect you in any way like in developing new types? + +**Vivian:** +Potentially. +So, the whole point of us developing the network packet representation system was to have something that was completely agnostic of any programming languages, or output libraries we want to use in the actual parser fields themselves, so it should be fairly easy for us to adopt to these things, I think. +I think we could maybe have to consider, like, how we can convert from network packet representation to different codes - different types featured in the output code, but that's relatively straightforward, I think. + + +**Stefan:** +Wonderful. So, this feeds into my other question: so, I guess you can use the higher level parsers for TCP, UDP, what not, regardless of the underlying types of IPV4 versus version 6? + +**Vivian:** +Yes, so what we are aiming to do is have these run through a single protocol specified in a draft. +It's very rare that you would have an RFC that specifies multiple protocols, so if you wanted to make an IPV6 generator, go ahead, run it on the RFC. +We are aiming to introduce our machine-readable ASCII format to feature IETF drafts and hopefully we will see more adoption of that so we can see automated testing going forward. +What we've done for showing the TCP example, we've gone through an older RFC, and made minimal changes to it to generate parsers, so, if you wanted to do that with protocols, that's absolutely fine as well. +So, again, in answer to your question: sorry, the question was about multiple protocols nested? + +**Stefan:** +Yes, if you can use the parser coming out of the RTC for PC6, and what the - + +**Vivian:** +Yes, we can use this for all sorts of different patrols Coles. The nice thing about parser combinators, you can have a ... if you like. Maybe one day in the future. + +**Stefan:** +Yes. Cool. Wonderful. There is also a question from the audience: how do you deal with non-bite aligned structures, so, if like a five-bit word crosses the eight -bit alignment? + +**Vivian:** +So, we had - so I think I had a small file for test when that I was doing the internship about what if this happens and non-bite aligned words was one of them. +What we found was with the bit-level parsers, it tends to go straight into the next byte if you happen to - if the counter exceeds seven, so it will just run forwards happily. +We haven't found any issues with that so far. It's been very good to us. + +**Stefan:** +Yes, it has been released. Version 6 has been out since Tuesday, I think? + +**Vivian:** +Yes, I haven't had time to update that yet, and this was written on five, so we will see if it works with six and see if there is anything that needs changed. + +**Stefan:** +Wonderful. If this were a physical conference, we would probably meet Jeffrey who wrote the thing. + +**Vivian:** +Sure, we would love to. + +**Stefan:** +Wonderful. Do you want to precise something, or say this is something that came to mind just now? + +**Vivian:** +No, I think I've kind of said everything that I want to say in the presentation, mostly. +So what we've - it's mostly a proof-of-concept at the moment. +So I posted a link to the repository and our paper explaining our system in the conference room chat, so if people want to take a look at our library and have a play about it, see how the generated Rust code looks, +we will happily take feedback if people want to improve our parsers, so I consider myself a novice at Rust. +We used using num functions as opposed to macros so we knew what was going on. If people want to talk how to optimise that, make it cleaner or more improvements, that would be great. We would love that. + +**Stefan:** +Wonderful. So, to the lovely people in the stream, this is about the last chance you get to ask more questions. Has the IETF been receptive to the machine-readable diagram format? + +**Vivian:** +So, the problem with the IETF is there are so many different groups, it's impossible to get a group consensus for the whole organisation, so what we've got at the moment is a small side meeting at the formal descriptions technique and side groups, I think, which is aiming to say, okay, how can we deploy this? +So Stephen and Paul Perkins, two people involved in this project are heavily involved with the IETF, so I think they're having discussions to see how we can get this deployed. +So it's been past attempts about okay, we can have custom tooling to do this and this, all singing and dancing, but we tried to make something relatively simple and unintrusive that could work for multiple workflows. + +**Stefan:** +Cool. + +**Vivian:** +So the answer with somebody haven't published using it yet, but watching this space. + +**Stefan:** +I guess you will be trying to investigate like the correctness of the middle boxes and what-not, or maybe try to circumvent them? + +**Vivian:** +Yes. So one of the examples that we are working on at the moment is QUIC. QUIC being high-profile, and a complex protocol, I think. If we can successfully parse this, and we can successfully use it for testing, then we think that's quite a good promotion, I suppose. + +**Stefan:** +Definitely. Having an actually correct implementation that is done when the specification is finished ... + +**Vivian:** +This was one of the main motivations. You get protocols that are becoming increasingly more complex, like QUIC. It's not surprised, and there will be flows with it. Say you got a package generated by C, and we fed it through our Rust parsers, we could potentially find - so it is written in other languages, we just need the output that they generate. + +**Stefan:** +So tools like cargoes, expand, the generated code, and maybe check out the state machine that has been generated to see ... + +**Vivian:** +Yes. + +**Stefan:** +To see if the specified behaviour makes any sense, right? Or if there is, like, obvious flaws in the - + +**Vivian:** +Yes, to catch the subtle bugs, which, okay, you know, essentially, what our parsers are testing is your output on the wire correct, doing what you think it's doing? We could maybe come up with more advanced testing, and automated error correction later on possibly, but that's going to take some time to develop. + +**Stefan:** +Yes. Looks like a long ongoing project. + +**Vivian:** +For sure. Hopefully, yes! + +**Stefan:** +Wonderful. So, I'm currently not seeing any more questions. I hope I haven't missed any. + +**Vivian:** +It seems like that's all of them. + +**Stefan:** +Wonderful. Thank you again very much. + +**Vivian:** +Thank you for having me. + +**Stefan:** +Yes, you're welcome. So please stick around, because now it's at all the people, hello? I think I will let you go, so you can enjoy the next act. Thank you. diff --git a/2020-global/talks/02_UTC/03-Vivian-Band.txt b/2020-global/talks/02_UTC/03-Vivian-Band.txt deleted file mode 100644 index 30b0388..0000000 --- a/2020-global/talks/02_UTC/03-Vivian-Band.txt +++ /dev/null @@ -1,40 +0,0 @@ -Rust for Safer Protocol Development - Vivian Band. -STEFAN: Hello, again. Welcome back from our quick break. So, the next talk is about to start. Vivian Band will tell us all about safer protocol development. She's doing her PhD in Scotland, and I'm really excited to see the talk. Remember, just like last time, please join the room that's for the questions - that's number 10. And without further ado, let's see the intro. -> Daan and Diane get us to the hype of keeping secrets in a type, thus allowing creation of optimisation that just might tell the FEDs what you type. -> Hello, my name is Vivian Band. I'm a second-year PhD at Glasgow University studying network security. I was on the safer protocol development project. So, improving protocol standards: the Internet Engineering Task Force standardises network protocols. These are initially presented as drafts to working groups and then become official standards after further review. However, even after all of this peer review from several different sources, mistakes still sometimes appear in these documents. For example, in the image on the right, the ASCII diagram shows the option real port is 13 bits long and 19 bits long, but the text description says these should be 16 bits in length. These create ambiguity for people implementing protocols. What the improving protocols standard aims to do is to provide a machine-readable ASCII format to detect these inconsistencies much more easily. These are minimally different from the format using existing diagrams with authors using consistent label names and certain specific stock phrases in their descriptions. These machine-readable documents allow us to build a typed model of protocol data. We call this custom typed system developed as part of our project, network packet representation. Network packet representation is program-agnostic. I had used Rust earlier to implement a bear-bones version of the protocol a few years ago on my final-year undergrad project and I was impressed how much safety it added to the system's programming. Our first automatically generated libraries would be rainfalls files because we wanted resulting is have a good level of type safety. Okay, so, first of all, let's take a step back and take a look at which types we need to describe network protocols. Before we can start building parsers and parser combinators. I use a lot of analogies when learning new concepts, so I like to think of these basic types like Lego bricks. There are several basic types that we can identify from protocol standard documents and we will take a TCP header to demonstrate this. Fields which contain raw data can be modelled, so source port is just 16-bit unsigned integer. That's just raw data. Fields which could contain one or more of the same element could be modelled as an array. Some fields only appear under certain conditions, and rely on values from other fields within the same protocol data unit, in this case, the TCP header, to establish whether we're using that or not. We can call these constraint types since they need to be checked. Some fields require information not contained within this packet, like an encryption key, or a value from another data unit somewhere in this draft. We can hold this information in a context-data type which can be received by other protocol data units which also feature in this draft if required. A field which can take on one of a limited possible set of values can be modelled as an enum, indicated in each drafts with the stock phrase, a TCP option is one of, so "is one of" is the key phrase we need to use in the modified standard documents. Packets and protocol units as a whole can be considered as structure-like types given they contain field type as constituent type members. One that doesn't feature in TCP is the function data type. These are required to form congresses between different data unit, in this case, encryption and decryption of packet headers. We've got seven types in total in our network packet representation system. Bit strings, arrays structs - arrays, contexts, and functions. Let's get to the fun stuff. Automatic Rust parser generation. We've got our basic building blocks sorted out. How can they be used for the complex combinators in Rust. Let's go to the bit string when we were explaining our custom types. We can automatically generate this as a wrapper under an unsigned 16-bit integer in a Rust output file easily. Immediately after that, we can generate a non-based parser for that type. This is a little bit more difficult to generate. There is a lot going on here, so we will highlight a few key details. Our first argument for all our parser function assist an input tuple, a borrowed array of bytes which would be an incoming packet of some source. Our parsers work at the bit level so our second tuple level is how many bits we've read in the current bite. Our second argument is the mutable borrow from the context instance since we might want to update.- our outputs are non-specific result type containing the remaining bytes left to be parsed, an updated bit counter, instantiated with the correct value read from the bite array. We also return mutable reference to our possibly updated context. The parser function itself takes a defined number of bits from the input byte array, this this case, it will take 16 bits. It assigns the value of those taken bits to the custom bus type as needed. The order in which we generate these custom types and parsers in the Rust output file is determined by the search. We generate a custom type and parser whenever we reach a leaf node and generate the combinator when there are no more leaf nodes found for that parent. The overall protocol data unit is a TCP header which is a struct type in our custom network packet representation system, so this is the root of the depth of the search tree, and will generate the password combinator. The first parser will be for source port which is a 16-bit long bit string which was the parser we walked through earlier. Bit strings are leaf nodes so we move to the next child destination port. This also a bit string and therefore a leaf node so we write a custom type in a 16-bit parser for this. The first non-bit string being counter in TCP header is options which is an array type. The elements which could be present in the options array are TCP options. TCP options is an enum type with a limited range of possible choices. Each of those enum variants are described in their own small ASCII diagrams in another section of the same document. This makes each enum variant a struct type in our network packet representation system in this case EOL option is a struct. The value of the field in this ASCII diagram is a bit string. This means we are finally reached the leaf node and we can write a custom Rust-type definition and a custom parser, and a Rust-type definition and a parser for its parent node, EOL option. We find that there are more TCP option variants so we repeat this process for each one. Once we have written parsers for all of the variants, we can write the Rust type definition and parser combinator for the parent nodes and TCP options. The last in the packet is the pay loads which we can parse as a bit string. Finally, we write the Rust-tripe definition ... in one function call. We also create a context object which all parser functions have access to. So, to recap the system that we developed in this project, we have the machine-readable protocol document at stage one with our minimal changes to ASCII diagrams and text descriptions. We have the custom protocol typing system developed in teenage 2, our network packet representation language, and in stage 3, we have the results of the internship ... a Rust library file automatically generated from the information we have in stage 2. Remember earlier when I mentioned that I think of these basics types and parsers as building blocks? To go further with that analogy as quickly as possible a TCP header is like this Lego block. It is difficult to build manually without making mistakes. Our generated parser libraries are not only a manual explaining how this data should be parsed, they also allow protocol developers to build the struct with extracted values with a single function code. This is ideal for protocol testing. The picture on the left is a genuine sample of our generated TCP parser code from our modified TCP document. So, conclusions: initially, I decided on Rust as our first parser output language, because I enjoyed Rust for systems programming on a previous project. Using parser combinators turned out to be an ideal fit since assigning them to network protocols both used depth of search. Parsers can be difficult to write manually and are prone to containing errors. Automatically generating parsers minimises the chance of some of these errors occurring, for example, the number of bits being read will always match the specification. The typing guarantees offered by Rust will help us ensure we get the machine-readable specification document, and in our network packet representation system. If there are errors, the Rust compiler will alert us to this. The next steps: this project is still ongoing, and there are more directions that this research can go in. We are aiming to show our system to the IIETF. We need to put in more work on function types so we can create encryption and decryption functions for protocols like QUIC which heavily rely on this. We would like to use the Rust libraries for protocol and error correction to support more protocol languages in the future. Resources for this project can be found at these links. We have a peer-reviewed publication which goes into more detail about our network packet representation typing system and a GitHub repository containing the codes for all automatic Rust parser generator. Thank you for your time, and I would be happy to answer any questions. -STEFAN: Thank you, Vivian. I, clicked the wrong intro. -> Vivian wants us to be safe and our code on the web to behave. Use Rust to generate code that will validate risky inputs - no need to be brave! -> That was brilliant. Loved it! [Laughter]. Thanks so much. -STEFAN: Thank you. I know we have 25 to 40 second-delay to the stream, so, just to get ahead of time, I have two questions if you don't mind. The first one is, there is a push for native implementation of the networking type, so the Rust standard library doesn't Lewis LIPSE any more but directly operates with system calls. Do you think that will affect you in any way like in developing new types? -> Potentially. So, the whole point of us developing the network packet representation system was to have something that was completely agnostic of any programming languages, or output libraries we want to use in the actual parser fields themselves, so it should be fairly easy for us to adopt to these things, I think. I think we could maybe have to consider, like, how we can convert from network packet representation to different codes - different types featured in the output code, but that's relatively straightforward, I think. -STEFAN: Wonderful. So, this feeds into my other question: so, I guess you can use the higher level parsers for TCP, UDP, what not, regardless of the underlying types of IPV4 versus version 6? -> Yes, so what we are aiming to do is have these run through a single protocol specified in a draft. It's very rare that you would have an RFC that specifies multiple protocols, so if you wanted to make an IPV6 generator, go ahead, run it on the IFC. We are aiming to introduce our machine-readable ASCII format to feature ITF drafts and hopefully we will see more adoption of that so we can see automated testing going forward. What we've done for showing the TCP example, we've gone through an older RFC, and made minimal changes to it to generate parsers, so, if you wanted to do that with protocols, that's absolutely fine as well. So, again, in answer to your question: sorry, the question was about multiple protocols nested? -STEFAN: Yes, if you can use the parser coming out of the RTC for PC6, and what the - -> Yes, we can use this for all sorts of different patrols Coles. The nice thing about parser combinators, you can have a ... if you like. Maybe one day in the future. -STEFAN: Yes. Cool. Wonderful. There is also a question from the audience: how do you deal with non-bite aligned structures, so, if like a five-bit word crosses the eight -bit alignment? -> So, we had - so I think I had a small file for test when that I was doing the internship about what if this happens and non-bite aligned words was one of them. What we found was with the bit-level parsers, it tends to go straight into the next byte if you happen to - if the counter exceeds seven, so it will just run forwards happily. We haven't found any issues with that so far. It's been very good to us. -STEFAN: Yes, it has been released. Version 6 has been out since Tuesday, I think? -> Yes, I haven't had time to update that yet, and this was written on five, so we will see if it works with six and see if there is anything that needs changed. -STEFAN: Wonderful. If this were a physical conference, we would probably meet Jeffrey who wrote the thing. -> Sure, we would love to. -STEFAN: Wonderful. Do you want to precise something, or say this is something that came to mind just now? -> No, I think I've kind of said everything that I want to say in the presentation, mostly. So what we've - it's mostly a proof-of-concept at the moment. So I posted a link to the repository and our paper explaining our system in the conference room chat, so if people want to take a look at our library and have a play about it, see how the generated Rust code looks, we will happily take feedback if people want to improve our parsers, so I consider myself a novice at Rust. We used using num functions as opposed to macros so we knew what was going on. If people want to talk how to optimise that, make it cleaner or more improvements, that would be great. We would love that. -STEFAN: Wonderful. So, to the lovely people in the stream, this is about the last chance you get to ask more questions. Has the ITF been receptive to the machine-readable diagram format? -> So, the problem with the ITF is there are so many different groups, it's impossible to get a group consensus for the whole organisation, so what we've got at the moment is a small side meeting at the formal descriptions technique and side groups, I think, which is aiming to say, okay, how can we deploy this? So Stephen and Paul Perkins, two people involved in this project are heavily involved with the ITF, so I think they're having discussions to see how we can get this deployed. So it's been past attempts about okay, we can have custom tooling to do this and this, all singing and dancing, but we tried to make something relatively simple and unintrusive that could work for multiple workflows. -STEFAN: Cool. -> So the answer with somebody haven't published using it yet, but watching this space. -STEFAN: I guess you will be trying to investigate like the correctness of the middle boxes and what-not, or maybe try to circumvent them? -> Yes. So one of the examples that we are working on at the moment is QUIC. QUIC being high-profile, and a complex protocol, I think. If we can successfully parse this, and we can successfully use it for testing, then we think that's quite a good promotion, I suppose. -STEFAN: Definitely. Having an actually correct implementation that is done when the specification is finished ... -> This was one of the main motivations. You get protocols that are becoming increasingly more complex, like QUIC. It's not surprised, and there will be flows with it. Say you got a package generated by C, and we fed it through our Rust parsers, we could potentially find - so it is written in other languages, we just need the output that they generate. -STEFAN: So tools like cargoes, expand, the generated code, and maybe check out the state machine that has been generated to see ... -> Yes. -STEFAN: To see if the specified behaviour makes any sense, right? Or if there is, like, obvious flaws in the - -> Yes, to catch the subtle bugs, which, okay, you know, essentially, what our parsers are testing is your output on the wire correct, doing what you think it's doing? We could maybe come up with more advanced testing, and automated error correction later on possibly, but that's going to take some time to develop. -STEFAN: Yes. Looks like a long ongoing project. -> For sure. Hopefully, yes! -STEFAN: Wonderful. So, I'm currently not seeing any more questions. I hope I haven't missed any. -> It seems like that's all of them. -STEFAN: Wonderful. Thank you again very much. -> Thank you for having me. -STEFAN: Yes, you're welcome. So please stick around, because now it's at all the people, hello? I think I will let you go, so you can enjoy the next act. Thank you. \ No newline at end of file diff --git a/2020-global/talks/02_UTC/04-Gavin-Mendel-Gleason-and-Matthijs-van-Otterdijk.md b/2020-global/talks/02_UTC/04-Gavin-Mendel-Gleason-and-Matthijs-van-Otterdijk.md new file mode 100644 index 0000000..19b80bd --- /dev/null +++ b/2020-global/talks/02_UTC/04-Gavin-Mendel-Gleason-and-Matthijs-van-Otterdijk.md @@ -0,0 +1,208 @@ +**Rust as a foundation in a polyglot development environment** + + +**Bard:** +Gavin and Matthijs show how one might +a large project in Rust rewrite +start out small, let it grow +until stealing the show +from whatever was there before, right? + +**Gavin:** +That's an excellent introduction. + +I'm Gavin Mendel-Gleason, the CTO for TerminusDB, and I wanted to talk a little bit today about Rust as a foundation in a polyglot environment. +First, I'm going to give a little outline of the talk with the motivation, challenges and solution. + - and why we used Rust as a foundation in our environment. + +First, you have to know about our problem, we are an in-memory, revision control graph database, and we have slightly unusual features which has driven some of the tool chain requirements we have. +Our software is a polyglot house, so we have clients written in JavaScript and in Python. +We have Rust, and we have prolog which is somewhat unusual in the modern day. +There is also C involved there as well. + +Some of the unusual features that drive our design requirements, so we're an in-memory database which enables faster query. +It's also simpler to implement, and I have some experience in implementing on ACID databases and so I know a lot about the difficulties that you can encounter when trying to page in. + +We chose this time to leave it in memory for the simplicity of design and performance. +We are, however, also ACID, so we use backing store. +We actually write everything to disk, but we leave things in memory. + +We also use succinct data structures which approach the information theoretic minimum size whilst allowing query in the data structure. +This allows us to get large graphs in memory simultaneously, but this requires a lot of bit-twiddling. +They're relatively complicated data structures, and they're compact but not so transparent to the developer, so you really need to be able to do effective bit-twiddling, which, of course, is Rust comes in. + +We have a bunch of git-like features like revision control, push, pull, clone, and all of the things that you know from git. +We do those on databases. +So that also drives a lot of our requirements. + +We have a data log query engine, and we also have complex schema constraint management. +So, first, why did we look into Rust in the first place? So, we were not initially a Rust house. +We didn't have any Rust in our development at all. +I didn't come from a Rust background and although I have a lot of experience in different programming languages, Rust was not one of those programming languages. +Our earlier prototype is actually in Java. + +It was hard to write, and it had mediocre performance, and so I started prototyping in prolog. +Because prolog was very logical, especially the schema-checking parts of it, it was extremely fast for us to write it in prolog, but however it had poor performance, so obviously it is not the best for bit-twiddling. + +Later, we moved to Library and C++ called HDT, and we used that as our storage layer which radically improved the performance of the application. +However, we had a lot of trouble with this, and it was a persistent source of pain, so C++ was crashing regularly, and this is partly because we needed - we had requirements that we had to be multithreaded for performance reasons, because we were dealing with very, very large databases in the billions of nodes, and the code was not re-entrant, although it was supposed to be written with the intent of being re- entrant but it wasn't in practice and this would come up with the server crashed. + +It was really, really hard to find the source of these crashes, and that was a persistent source of problems for us. +So then there was a secondary problem which is that HDT was not designed for write-transactions. +It was really designed for datasets and not databases so we were using orchestration logic on top of it where we would journal transactions and stuff like that. +It wasn't designed that way. +So we had feelings about what the interface should be for a library, HDT wasn't it, and it also had these crashing problems, and we were finding it hard to find the source of them. +Matthijs off his own bat went out and wrote a prototype in Rust of the succinct data structures that we needed to replace HDT and like a simple library around it, and it looked really very promising. +I had heard of Rust, but I had not written anything in Rust. +This drove me to take a look at Rust. + +I know a lot of languages have learned Kam, C++, Haskell, prolog, Lisp, I've been through the gamut of all of these, and I don't try to learn a new language unless there is something peculiar that drives it as something you might need in your tool kit. +Rust had this kind of incredible aspect to it which is this ability to avoid memory problems whilst still being extremely low-level programming language. + +So thread safety was one of our major headaches. +We were getting segfaults and we were finding it difficult to time-consuming to sort it out. + +This library was exhibiting none of these problems, and this was really promising. +We decided we were just going to take the plunge and rewrite the foundations of our system in Rust. + +So, it also gave us the chance to re-engineer our data structure, simplify code, improve fitness for purpose, change the low-level primitives, and cater to write-transactions in particular, but also enabled us to do some performance enhancements that we would like to have done but were afraid to do because in C++ there is kind of a fear factor where, if you had anything new, you might add something that causes it to crash. + +So, of course, in terms of challenges, I'm sure everyone in the Rust community knows about challenges of FFI, but I don't want to belabour the point. +We had - we had a comfortable interaction with C stack, and this is annoying, because if we're interfacing with Rust, we're actually interfacing it through a C FFI, and that kills some of the nice guarantees you get from Rust, but at least they're isolated to the interaction surface rather than completely. +So, we also ended up trampolining through a light C shim which is not the best approach. +We are evaluating a more direct approach currently. + +I didn't want to tell everybody we've done it right, we've done some things right, but we can improve a lot here. +Now, what we would really like, though, is a Rust prolog because then we could have a nice clean Rust FFI, and everything would be beautiful and perfect. +There's some progress being made on Scryer prolog which has cool features that you should look at if you're interested in a Rust prolog project. + +Then some of the challenges that we ran into, I would like to go through really quickly. +So we initially expected to write a lot more of the product in Rust, so we started off replacing the HDT layer, +and then we expected to write a lot more from the ground up, so it's essentially like we had this building, +we went in, we replaced the foundations, and then we were going to start replacing the walls, so, unfortunately, developer-time constraints has favoured a different approach for us, so we're doing rapid prototyping in prolog. +We essentially rewrite the kind of feature that we are interested in there, and then instead of just immediately going to Rust from there, we actually wait, so, we're much more selective about what we put into Rust than we had initially imagined. + +Partly this is due to the learning curve of thorough checking semantics meaning there is a difficulty in getting our developers to understand how this stuff works, so that takes some time. +And there is a higher front cost here, and you win it back, and, if you're replacing C++, you win it back very quickly. +You win it back very quickly because seeking out those bugs dominates in terms of time, so that upfront learning cost is nothing compared to the cost of some horrible seg fault that you can't find. +But, if you're replacing prolog, the sort of amortized costs are more important, so you have to worry about where you replace it, and you have to be more careful about that. + +Once you've gotten the knack of the checker, things go a lot faster but they're still writer than writing prolog, because it's a lower-level language which is why we use it, but it's also why we don't always use it. +So, our solution has been a late optimisation approach, and the way that we do this is we developed the low-level primitives in Rust for our low-level storage layer, and then we designed the orchestration of these in prolog. +When we find a performance bottleneck, we think about how to press that orchestration, or what unit of that orchestration, to press down, and try to find good sort of boundaries, module boundaries, essentially, so that we can press it down into Rust to improve performance. + +We have really been performance-driven on this, so the things that get pressed into Rust are those things that need performance enhancements. +So we started with this storage layer in Rust and have extended this to several, like operations that have proved to be slow when they were in prolog and needed to be faster. +These include things like, you know, patch application, and squash operations, things of that nature. +So these are larger orchestrated - they're not as low-level, so they have logic in them. + +We also have done some bulk operations that, for instance, in csc loading has been written completely in Rust as well, +because, if you have hundreds of thousands of rows in your csv, we get a ten- to 20-times speed-up going from prolog to Rust using the same algorithm because there's some kind of constant time that you can imagine expanding out, +but the cost of these operations, +and for hundreds of thousands of lines, that becomes a really significant time sink, so csv load has now been moved completely into Rust and we imagine large-scale bulk operations will all have to be moved into Rust eventually. + +So there are some features that we know we're going to add directly to the Rust library, so we have specific feature enhancements that we are never going to even bother trying to do in prolog. +They generally have to do with low-level manipulation. +It would be silly to write them. +There's no point in prototyping them even there. + +However, there's a lot of features that we expect will end up in Rust as we move forward, and they really, it's going to be a slow replacement strategy, +and it's not clear that we will ever replace all of prolog, although we may, +but there is even like in the ACID future where this product is well developed, ten years from now, and very solid, +we can imagine that probably some of the schema checking, et cetera, will be done in prolog, even though it will be perhaps prolog embedded in Rust, or using Scryer for prolog or something along those lines. + +One of the things, though, that we ran into was the unexpected bonus, and we kind of knew this was here, but are amazingly impressed with it. +This is the unexpected bonus round. +We got data parallelism from switching to Rust at a very low cost, using Rayon, and it really blew our minds. +We had things we hardly changed at all. + +We had the logic written there, and we used these magic incantations into_par, and others, and everything is way, way faster, +and we didn't have to think about it the hard way, and I love that, because I'm lazy! +So anything that can reduce the amount of time we spend writing things while also improving performance, it's a huge win. +I can't impress upon people enough how awesome this is, and how much we need other people to start using it. +So the borrow-checker, there is a cost but huge benefits that come from it - not just safety, but also potentially speed. +So, if you're interested in an open-source solution, you should give TerminusDB a try. +And that's it! + +**Jeske:** +Yes thank you so much for the talk. That was really interesting. + +**Gavin:** +Thank you. Let me check the chat. I don't think there are open questions yet. I have a question. You always build a release mode, or is there speed-up and debug mode also good enough? + +**Matthijs:** +No, debug is definitely not fast enough. +Well, I mean, it is fast enough, it's fast enough when we're just testing out things, +and it's great sometimes to be able to use a debugger, or something, but like an actual general use, also when we are developing and not developing the low-level library, +we definitely build a release bug always, and it is a tremendous speed-up between them. + +**Jeske:** +Cool. Thank you so much. +I see a lot of clapping hands in the chat right now. +Thank you for joining in. +Matthijs, is there a last thing that you would like to add because we have a few minutes also still left? + +**Matthijs:** +Wow, no. [Laughter]. I don't know if I could add anything to that! + +**Gavin:** +People should try Rayon is definitely one thing. + +**Matthijs:** +Rayon was a great thing to try. +We were scared to try it, because oh, data parallelism, scary, but it's literally just replacing a few calls, and it just works. +We got so much speed out of it, so, yes, Rust's ecosystem is just amazing. We love it. + +**Jeske:** +There is a warming community, I have to say, also. + +**Matthijs:** +It's really great. It's a good community. + +**Jeske:** +I see a question happening. Do you have any idea what hinders productivity in Rust beside the borrow-checker? + +**Gavin:** +Well, like, types just introduce extra overhead. +In prolog, you don't have to worry about garbage collection or how you allocate things. +It's just a few things to worry about. +It costs you later in terms of performance but it's really helpful in terms of developer time, and lots of things, it doesn't matter what the constant time cost is, because it's just glue. +Most software is just glue code, and, if you're just writing glue, you don't want to be worried about lots of details, I think. + +**Matthijs:** +There is another thing here, which is to compare with prolog. +In prolog, you would have a running instance, and then you do live recompilation of parts of that program, so it is a very short loop between writing your code and seeing it in action. +With Rust, you have to compile, and then you can run the unit tests, and I mean it's not a big thing, but it is a thing. So having that kind of repo experience, that really does help development. + +**Jeske:** +Thank you. There are some questions popping up for use cases and what applications of use of TerminusDB at the moment? Can you elaborate a little bit on that? + +**Gavin:** +It's like machine learning where you need to have revision control of your data sets and there is any kind of large-scale graph manipulation if you want to - if you want to keep revisions, and be able to pipeline your data, that's where we would use it. +We scale up to quite large graphs. You would be able to stick something large in there if you would like. + +**Jeske:** +I think we are running out of time. Will you both be active in the chat to help around? I see already Matthijs you're in the chat as well. + +**Matthijs:** +Yes. + +**Jeske:** +We had some technical difficulties sometimes which one does with this online experience, I would say, also, it's kind of fun experiences now, I have to say. +I want to thank you both so much for your time, and interesting presentation, and please do check out the chat. +And then I see that in eight minutes would be will he start the next speaker already. Please also, for the people watching their live streams, stick around for that. We will be back in eight minutes, I would say. Thank you so much, again, Gavin and Matthijs. + +**Gavin:** +Thanks for having us. + +**Jeske:** +See you in the chat. + +**Matthijs:** +Thank you for having us. I'm looking forward for the rest of the talks. + +**Jeske:** +Ciao! + +**Matthijs:** +Bye-bye! diff --git a/2020-global/talks/02_UTC/04-Gavin-Mendel-Gleason-and-Matthijs-van-Otterdijk.txt b/2020-global/talks/02_UTC/04-Gavin-Mendel-Gleason-and-Matthijs-van-Otterdijk.txt deleted file mode 100644 index 2ab9a05..0000000 --- a/2020-global/talks/02_UTC/04-Gavin-Mendel-Gleason-and-Matthijs-van-Otterdijk.txt +++ /dev/null @@ -1,59 +0,0 @@ -Rust as a foundation in a polyglot development environment - Gavin Mendel-Gleason and Matthijs van Otterdijk. -JESKE: And we are back! -PILAR: That was a great start to the day! -JESKE: Loved it. -PILAR: I should not have had coffee this morning, between the caffeine and the adrenaline, I'm like super jittery! But it's amazing. And, like, wow, I don't - a lot of people are very sceptical still of online conferences, but, wow, the engagement in the chat, and online, it's been really, really cool. -JESKE: Really lovely to see tweets popping up, and like people are really actively participating, so thank you all for that, and just keeping mentioning it, and keep tagging everybody, and so we can also see your questions as they pop up. Stefan you wanted to introduce our next act? -STEFAN: Wait, wait ... . One second! -PILAR: We threw you off with the hype. We were just so excited! It's just been really cool and really great to be here. Amazing speakers. Amazing audience. -STEFAN: Yes, sorry. This was the classic having a bug in the ear. I just heard the update from the next act will start on time, which is in seven-ish minutes from now. -JESKE: Perfect. -STEFAN: At 12 Europe time, UTC11. My head doesn't work any more! -JESKE: Where can people find the next act as well? -STEFAN: So, for you out there watching the stream, stay tuned, we will have the same break as before, which means a couple of minutes nothing, and then the artist Dibs with immersive Afro-beats will start. This is one of the good occasions to stand up and dance. -JESKE: I know I will. -PILAR: I didn't think of that. That's a great break activity as well, because we probably are all sitting at our computers at home, so, yes, maybe you cannot, like it won't annoy your surrounding people if you blast the conference for a couple of minutes while we have our break. -STEFAN: If you have daytime, stomping helps me to calm all the caffeine down, because, yes. -JESKE: I think having a festival feeling, right? In Amsterdam, the sun is shining, so it's a little bit like picking up again. -PILAR: Absolutely. Get your friends in on it. Get your family in on it. If I start stomping, my dogs will get the zoomies. The mail man came in while we were streaming, and we were all saved from the barks there! It's been really cool to have the live Q&A as well. Thank you so much to our speakers for being here, for providing amazing talks, like, oh. I couldn't even pick my favourite one. They're all so amazing, and there are more amazing talks coming up too. -JESKE: It's really nice they're active in the chat, and indeed online, and a few of them are here live in our "studio" so thanks for that as well, and I'm looking forward to what is coming after the break. But, I think we will give everybody a few minutes to get a coffee, or get their dancing outfit on, maybe. -STEFAN: And eat something, if you have some warm food, also it's nice. And then, after this, we will commence at 1250 local time. There will be a 20-minute break between the performance and the next talk. I think that's all. -PILAR: The replays are working again, snake game is working again unless you managed to crash it. -JESKE: Enjoy the dancing, and I will see you in 55 minutes. -PILAR: See each other then. Have a great break. Enjoy our artists. -STEFAN: See you soon! -PILAR: See you! [Break]. [Music]. -JESKE: Hello. -STEFAN: Hello. -PILAR: Welcome back, everyone. How - let us know in the chat what you felt about the amazing artist break we had. I hope you all had a - -JESKE: I did a little dance! -STEFAN: More than one! -PILAR: I couldn't help it, and it did give my dogs zoomies, but that's just part of the energy. -JESKE: Thanks for the for the music, and the nice introductions, and looking forward to a lot of nice content again. For the purposes of not running too late, I would like to dive immediately into the intro. -PILAR: We will leave you to it. -STEFAN: See you in the chat room for the talk. See you later. -JESKE: Rust as a foundation in a polyglot environment. Please join the chat room, ask questions over there, so the following speaker will be Gavin and Matthijs, with and they will be having this talk live. I will hand over to Gavin after the short introduction, but please also ask questions in the chat in the meantime, and will he will save them for after the talk of Gavin, and Matthijs will join to answer those questions. -BARD: Gavin and Matthijs show how one might, a large project in Rust rewrite. Start out small, let it grow, until stealing the show from whatever was there before, right? -GAVIN: That's an excellent introduction. I'm Gavin Mendel-Gleason, the CTO for TerminusDB, and I wanted to talk a little bit today about Rust as a foundation in a polyglot environment. First, I'm going to give a little outline of the talk with the motivation, challenges and solution. - and why we used Rust as a foundation in our environment. First, you have to know about our problem, we are an in-memory, revision control graph database, and we have slightly unusual features which has driven some of the tool chain requirements we have. Our software is a polyglot house, so we have clients written in JavaScript and in Python. We have Rust, and we have prolog which is somewhat unusual in the modern day. There is also C involved there as well. Some of the unusual features that drive our design requirements, so we're an in-memory database which enables faster query. It's also simpler to implement, and I have some experience in implementing on ACID databases and so I know a lot about the difficulties that you can encounter when trying to page in. We chose this time to leave it in memory for the simplicity of design and performance. We are, however, also ACID, so we use backing store. We actually write everything to disk, but we leave things in memory. We also use succinct data structures which approach the information theoretic minimum size whilst allowing query in the data structure. This allows us to get large graphs in memory simultaneously, but this requires a lot of bit-twiddling. They're relatively complicated data structures, and they're compact but not so transparent to the developer, so you really need to be able to do effective bit-twiddling, which, of course, is Rust comes in. We have a bunch of git-like features like revision control, push, pull, clone, and all of the things that you know from git. We do those on databases. So that also drives a lot of our requirements. We have a data log query engine, and we also have complex schema constraint management. So, first, why did we look into Rust in the first place? So, we were not initially a Rust house. We didn't have any Rust in our development at all. I didn't come from a Rust background and although I have a lot of experience in different programming languages, Rust was not one of those programming languages. Our earlier prototype is actually in Java. It was hard to write, and it had mediocre performance, and so I started prototyping in prolog. Because prolog was very logical, especially the schema-checking parts of it, it was extremely fast for us to write it in prolog, but however it had poor performance, so obviously it is not the best for bit-twiddling. Later, we moved to Library and C++ called HDT, and we used that as our storage layer which radically improved the performance of the application. However, we had a lot of trouble with this, and it was a persistent source of pain, so C++ was crashing regularly, and this is partly because we needed - we had requirements that we had to be multithreaded for performance reasons, because we were dealing with very, very large databases in the billions of nodes, and the code was not re-entrant, although it was supposed to be written with the intent of being re- entrant but it wasn't in practice and this would come up with the server crashed. It was really, really hard to find the source of these crashes, and that was a persistent source of problems for us. So then there was a secondary problem which is that HDT was not designed for write-transactions. It was really designed for datasets and not databases so we were using orchestration logic on top of it where we would journal transactions and stuff like that. It wasn't designed that way. So we had feelings about what the interface should be for a library, HDT wasn't it, and it also had these crashing problems, and we were finding it hard to find the source of them. Matthijs off his own bat went out and wrote a prototype in Rust of the succinct data structures that we needed to replace HDT and like a simple library around it, and it looked really very promising. I had heard of Rust, but I had not written anything in Rust. This drove me to take a look at Rust. I know a lot of languages have learned Kam, C++, Haskell, prolog, Lisp, I've been through the gamut of all of these, and I don't try to learn a new language unless there is something peculiar that drives it as something you might need in your tool kit. Rust had this kind of incredible aspect to it which is this ability to avoid memory problems whilst still being extremely low-level programming language. So thread safety was one of our major headaches. We were getting seg faults and we were finding it difficult to time-consuming to sort it out. This library was exhibiting none of these problems, and this was really promising. We decided we were just going to take the plunge and rewrite the foundations of our system in Rust. So, it also gave us the chance to re-engineer our data structure, simplify code, improve fitness for purpose, change the low-level primitives, and cater to write-transactions in particular, but also enabled us to do some performance enhancements that we would like to have done but were afraid to do because in C++ there is kind of a fear factor where, if you had anything new, you might add something that causes it to crash. So, of course, in terms of challenges, I'm sure everyone in the Rust community knows about challenges of FFI, but I don't want to belabour the point. We had - we had a comfortable interaction with C stack, and this is annoying, because if we're interfacing with Rust, we're actually interfacing it through a C FFI, and that kills some of the nice guarantees you get from Rust, but at least they're isolated to the interaction surface rather than completely. So, we also ended up trampolining through a light Cshim which is not the best approach. We are evaluating a more direct approach currently. I didn't want to tell everybody we've done it right, we've done some things right, but we can improve a lot here. Now, what we would really like, though, is a Rust prolog because then we could have a nice clean Rust FFI, and everything would be beautiful and perfect. There's some progress being made on Scryer prolog which has cool features that you should look at if you're interested in a Rust prolog project. Then some of the challenges that we ran into, I would like to go through really quickly. So we initially expected to write a lot more of the product in Rust, so we started off replacing the HDT layer, and then we expected to write a lot more from the ground up, so it's essentially like we had this building, we went in, we replaced the foundations, and then we were going to start replacing the walls, so, unfortunately, developer-time constraints has favoured a different approach for us, so we're doing rapid prototyping in prolog. We essentially rewrite the kind of feature that we are interested in there, and then instead of just immediately going to Rust from there, we actually wait, so, we're much more selective about what we put into Rust than we had initially imagined. Partly this is due to the learning curve of thorough checking semantics meaning there is a difficulty in getting our developers to understand how this stuff works, so that takes some time. And there is a higher front cost here, and you win it back, and, if you're replacing C++, you win it back very quickly. You win it back very quickly because seeking out those bugs dominates in terms of time, so that upfront learning cost is nothing compared to the cost of some horrible seg fault that you can't find. But, if you're replacing prolog, the sort of amortized costs are more important, so you have to worry about where you replace it, and you have to be more careful about that. Once you've gotten the knack of the checker, things go a lot faster but they're still writer than writing prolog, because it's a lower-level language which is why we use it, but it's also why we don't always use it. So, our solution has been a late optimisation approach, and the way that we do this is we developed the low-level primitives in Rust for our low-level storage layer, and then we designed the orchestration of these in prolog. When we find a performance bottleneck, we think about how to press that orchestration, or what unit of that orchestration, to press down, and try to find good sort of boundaries, module boundaries, essentially, so that we can press it down into Rust to improve performance. We have really been performance-driven on this, so the things that get pressed into Rust are those things that need performance enhancements. So we started with this storage layer in Rust and have extended this to several, like operations that have proved to be slow when they were in prolog and needed to be faster. These include things like, you know, patch application, and squash operations, things of that nature. So these are larger orchestrated - they're not as low-level, so they have logic in them. We also have done some bulk operations that, for instance, in csc loading has been written completely in Rust as well, because, if you have hundreds of thousands of rows in your csv, we get a ten- to 20-times speed-up going from prolog to Rust using the same algorithm because there's some kind of constant time that you can imagine expanding out, but the cost of these operations, and for hundreds of thousands of lines, that becomes a really significant time sink, so csv load has now been moved completely into Rust and we imagine large-scale bulk operations will all have to be moved into Rust eventually. So, the - so there are some features that we know we're going to add directly to the Rust library, so we have specific feature enhancements that we are never going to even bother trying to do in prolog. They generally have to do with low-level manipulation. It would be silly to write them. There's no point in prototyping them even there. However, there's a lot of features that we expect will end up in Rust as we move forward, and they really, it's going to be a slow replacement strategy, and it's not clear that we will ever replace all of prolog, although we may, but there is even like in the ACID future where this product is well developed, ten years from now, and very solid, we can imagine that probably some of the schema checking, et cetera, will be done in prolog, even though it will be perhaps prolog embedded in Rust, or using Scryer for prolog or something along those lines. One of the things, though, that we ran into was the unexpected bonus, and we kind of knew this was here, but are amazingly impressed with it. This is the unexpected bonus round. We got data parallelism from switching to Rust at a very low cost, using Rayon, and it really blew our minds. We had things we hardly changed at all. We had the logic written there, and we used these magic incantations into_par, and others, and everything is way, way faster, and we didn't have to think about it the hard way, and I love that, because I'm lazy! So anything that can reduce the amount of time we spend writing things while also improving performance, it's a huge win. I can't impress upon people enough how awesome this is, and how much we need other people to start using it. So the borrow-checker, there is a cost but huge benefits that come from it - not just safety, but also potentially speed. So, if you're interested in an open-source solution, you should give TerminusDB a try. And that's it! -> Yes thank you so much for the talk. That was really interesting. -GAVIN: Thank you. Let me check the chat. I don't think there are open questions yet. I have a question. You always build a release mode, or is there speed-up and debug mode also good enough? -MATTHIJS: No, debug is definitely not fast enough. Well, I mean, it is fast enough, it's fast enough when we're just testing out things, and it's great sometimes to be able to use a debugger, or something, but like an actual general use, also when we are developing and not developing the low-level library, we definitely build a release bug always, and it is a tremendous speed-up between them. -JESKE: Cool. Thank you so much. I see a lot of clapping hands in the chat right now. Thank you for joining in. Matthijs, is there a last thing that you would like to add because we have a few minutes also still left? -MATTHIJS: Wow, no. [Laughter]. I don't know if I could add anything to that! -GAVIN: People should try Rayon is definitely one thing. -MATTHIJS: Rayon was a great thing to try. We were scared to try it, because oh, data parallelism, scary, but it's literally just replacing a few calls, and it just works. We got so much speed out of it, so, yes, Rust's ecosystem is just amazing. We love it. -JESKE: There is a warming community, I have to say, also. -MATTHIJS: It's really great. It's a good community. -JESKE: I see a question happening. Do you have any idea what hinders productivity in Rust beside the borrow-checker? -GAVIN: Well, like, types just introduce extra overhead. In prolog, you don't have to worry about garbage collection or how you allocate things. It's just a few things to worry about. It costs you later in terms of performance but it's really helpful in terms of developer time, and lots of things, it doesn't matter what the constant time cost is, because it's just glue. Most software is just glue code, and, if you're just writing glue, you don't want to be worried about lots of details, I think. -MATTHIJS: There is another thing here, which is to compare with prolog. In prolog, you would have a running instance, and then you do live recompilation of parts of that program, so it is a very short loop between writing your code and seeing it in action. With Rust, you have to compile, and then you can run the unit tests, and I mean it's not a big thing, but it is a thing. So having that kind of repo experience, that really does help development. -JESKE: Thank you. There are some questions popping up for use cases and what applications of use of TerminusDB at the moment? Can you elaborate a little bit on that? -GAVIN: It's like machine learning where you need to have revision control of your data sets and there is any kind of large-scale graph manipulation if you want to - if you want to keep revisions, and be able to pipeline your data, that's where we would use it. We scale up to quite large graphs. You would be able to stick something large in there if you would like. -JESKE: I think we are running out of time. Will you both be active in the chat to help around? I see already Matthijs you're in the chat as well. -MATTHIJS: Yes. -JESKE: We had some technical difficulties sometimes which one does with this online experience, I would say, also, it's kind of fun experiences now, I have to say. I want to thank you both so much for your time, and interesting presentation, and please do check out the chat. And then I see that in eight minutes would be will he start the next speaker already. Please also, for the people watching their live streams, stick around for that. We will be back in eight minutes, I would say. Thank you so much, again, Gavin and Matthijs. -GAVIN: Thanks for having us. -JESKE: See you in the chat. -MATTHIJS: Thank you for having us. I'm looking forward for the rest of the talks. -JESKE: Ciao! -MATTHIJS: Bye-bye! \ No newline at end of file diff --git a/2020-global/talks/02_UTC/05-Anastasia-Opara.md b/2020-global/talks/02_UTC/05-Anastasia-Opara.md new file mode 100644 index 0000000..90e3903 --- /dev/null +++ b/2020-global/talks/02_UTC/05-Anastasia-Opara.md @@ -0,0 +1,210 @@ +**Rust for Artists, Art for Rustaceans** + +**Bard:** +Anastasia plays Rust like a flute +or maybe a magical lute +to then simulate +things that art may create +and this art does really compute + +**Anastasia**: +Hi, and welcome to Rust for artists, art for Rustaceans. + +This talk covers from multiple personality disorder like talk about similarities between art, programming, and science, and at the same time showing you complicated drawing algorithms and also has practical tips. +If you're only interested in the Rust and algorithm parts, go get yourself a coffee and come back in about ten minutes, since we are going to go through the meta part first. +Without further ado, let's start with introductions. + +My name is Anastasia Opara, and I'm a procedural artist at Embark Studios, a Stockholm-based game development company. +You might be wondering what is a procedural artist? Procedural arts' distinguishing feature is that it is executed by a computer following a set of design procedures designed in code - that's why it is procedural. +As a procedural artist at Embark, I spend most of my time meditating on workflow in games from player and developer perspectives. +There is head scratching and more head banging on the wall, which is fun, exhausting, and sometimes outright terrifying, but never boring. + +To give you an example, I would like to show you two of the recent projects we did at Embark. +It will be tl;dr versions, and we have separate talks about both of them if you want to learn more. + +The first one is texture synthesis where I got introduced to Rust. +It's an example-based algorithm for image generation, meaning you can give it an example image and it will generate more similar-looking images. +You can also do things like guided generations, style transfer, fill in missing content and simple geometry generation. + +The second project is called Kittiwake which is a game-like environment where we explore a feeling of co-creation with an example-based algorithm which is embodied into this little creature. +You create a small screen, like a dialogue, and Kittis tries to minimise the way you create by analysing how you place things and using it as an example. +One of the key similarities between these projects is that both of them heavily rely on performing some kind of search. +Both of them give an example like object arrangement in Kittiwake, or pixel image synthesis, we search for a new configuration that looks perceptually similar to the example while not being a copy. +You can think of a search process in a simplified way, that is we try a bunch of configurations, we present you with the most promising one. +From the user perspective, it can happen so fast that is might not seem like a search. + +For example, in texture synthesis, you can see pixels appearing from possible neighbourhoods in the example. +In the example-based placement you can see objects magically moving around until they settle into the positions that satisfy the example. +Even though the notion of search in this project can be argued purely algorithmical, it's closer to a genuine art process than we might initially think. +It is easy to perceive the final artwork as the outcome of a linear pre-calculated path, like an artist just sits down and does art. + +However, if we dig deeper, we will discover there is always an underlying network of trial and error. +For example, Picasso's famous painting The Ladies of Avignon is the result of hundreds of sketches. +You can see how earlier sketches were in a different style than the final work. +We can argue that of course this worked required a lot of exploration because of how stylised it is, therefore the search was purely about finding the stylisation and re imagining what we see into something completely different. +However, even when painting from reference, with a goal of copying reality, it is never a passive observation but active interpretation, and engineering of usual forms, which together construct a presentation. + +For example, if we look at digital photo studies, we can see a process of searching for textures, colours, forms, that conjure a similar perceptual response to the target photograph. +It is a dynamic problem-solving of simplifying the object of depiction while keeping the perceptual essence. +Any painting if you look close enough is just an amorphous jumble of brushstrokes but they magically come together and make you believe what they present is real. +That I believe is any part is a search for a presentation that conveys a target experience, and the human ability to comprehend similarity between a representation and a thing that it aims to represent is an astonishing example of abstract thinking that comes to us so naturally. +And through the lens of representations, art can invite to us perceive the same object in different ways. + +Like let's consider this sheep, for example. +As artists, we might emphasise the way the wool curls are repetition of a pattern that trees make when they sway, abandoning - or we might adopt a different perspective. +And explore the sheep not the way we see it, but invite to experience the concept of a sheep through its hoof marks as it walks on the canvas. +Both works aim to capture the sheep, but the outcomes or representations differ drastically. +Art is not alone in its pursuits to construct things into representations. + +In programming, we are often faced with a challenge of translating the language of our thoughts into the language of implementation. +If we were asked to represent the sheep in code, we might adopt an inheritance pattern of thinking and inherit it from an animal class, or might say that animal is just one of the traits and there are other traits we are interested in such as adorable or fluffy. + +In mathematics, we can transform and map it into a new space. +Even if when dealing with data, we are faced with a choice of a model or a representation that will explain it. +In the end, these are all representations. +They don't change the way the sheep is, but they change the we think about it. +And quite often, there is no-one representation that just works for a - and it becomes an iterative search for pieces needed to design a new presentation. +The pieces we minute might be different but the pieces of search in art, programming, science, are similar. +That's why arts, programming, and science are actually much closer to each other than we usually portray. + +If science helps us to reason about the external world and deconstruct a problem into processing flow, I think art is about looking inward. +A self-introspection and observation of one's perceptions. +Art is not just about recreation of reality, even if you do choose to make it your focus; it's an invitation to co-experience something from your perspective, something that used to be bodiless, but you invented a representation for it, and from that perspective, I genuinely think anyone can be an artist. + +Today, there are plenty of art media to choose from, and code is a particularly fascinating one as it invites to convey not just a final destination of the art process, but the author's workflow, the search itself. +It invites us artists to reverse-engineer our own thought process and deconstruct it into an algorithmical form. +Pushing back the process into the background and putting the experience of the search to be THE main art piece. +And that is what computational drawing aims to capture. + +Computational drawing is my hobby project which is designed to imitate traditional drawing from reference by searching for a Deacon instruction of the target into discrete brushstrokes, inviting the experience to become of the work to its rough stages to final details. +And just like many paintings are not a faithful representation of reality, so is computational drawing not meant to be a recreation of an artist. +It was originally inspired by many implementations of genetic algorithms available on the internet. + +Genetic algorithm is a search algorithm loosely inspired by natural selection. +But what was more interesting in my opinion is the way this project's objective, that is to represent a target image in a budgets of 50 Polygon slides, or hundreds like the Eiffel Tower. + +It was the summer of 2017, and I finished my user the, and I had no idea about searching algorithms. +Seeing the genetic stuff really triggered me. +The process, as brute force as it was, reminded me of my own experience during life-drawing classes, having to translate what I see into a set of discrete motions with a pencil or a charcoal. +That personal experience combined with discovering an algorithmic representation I could use, I just had to try it out. +In the end, I modified the search quite a bit which actually made it redundant to frame it as a genetic algorithm and I will touch upon it later in the presentation. +So, this is a result from 2017. + +At the time I was just learning Python, so this was written in Python. +And it was very slow to calculate, like a couple of hours almost to a day for one image. +It was ugly, and I never really showed it to anyone except a couple of friends. +I thought it was unsophisticated, not worth sharing, so it just kind of collected virtual dust in my hard drive, until this summer of 2020, when, thanks to Covid, there was a lot of free time, and I went through my old hard drive and rediscovered it. +And the reception was beyond my expectations. +Which motivated me to clean it up a bit and open source it. +And it was super rewarding to see people getting inspired and trying out their own versions. + +And while cleaning up the Python code, I started having a lot of new ideas coming from the experience I accumulated. +So I decided to start anew, and, of course, for my sanity, I restarted the whole thing in Rust, and, yes, we are finally getting to the part when I talk about Rust! How does the algorithm work? Let's imagine we can draw a single brushstroke, which is parameterised by its scale, and value. +I like to - there are functions for rotation as well as scale and to change value, we can simply access pixel data. + +So now imagine we drew this brushstroke on a canvas, and just like we thought of scale of parameters, we can think of the brush as a parameter of the canvas. +And representing our brush configuration as one dimension, it is just a mental short cut, in fact, that one dimension encapsulates five scale rotation, value, and position of x and y. +Now suppose we added a second brush, extending our brush space to be two-dimensional, and this new space encodes both brushes and thus appearance our canvas would have. +So far, it might seem like quite a redundant transformation. +Cool - but it gets more conceptually interesting as we add more brushes. +It becomes messier to visualise. + +Let's imagine this 2D space is a space defined by 100 brushes, and a dot in the space represents a particular canvas appearance defined by our 100 brushes are configured. +To move in the space, all we have to do is change our canvas. +If we just take a stroll and aimlessly wander around in the space, we might discover that with just 100 brushstrokes, we can depict a lot of interesting stuff, but also a lot of random stuff. +In fact, the proportion of interesting stuff to random it insanely low. +It is very unlikely that we will just stumble upon a good painting. +So, the question becomes how do we stop and search for something interesting? We provide a target that guides our search to a space containing similar images, and the way we can define similarity is simply a difference between the pixels of the target and the pixels of the drawn images. +So, if after imitating a brushstroke our pixel difference is smaller, that means we are moving towards a space with more similar images and we should keep the mutation, and then we just do it again, again, and again. +Here, you can see the beginning of a search as brushes move around trying to position themselves in such a way that looks more like a target, and if it continues its guide the search long enough, we will eventually reach a can various configuration whose brushstroke arrangement looks similar to the target. +In general, that is pretty much it. + +There are many ways one might implement this search. +It can be a genetic algorithm, gradient descent, simulated annealing. +I will show you how I approached it from my art education and incorporating it into the search. +I was greatly inspired in fine-art classes, especially when doing oil still lives, we were never taught to solve the detail frequencies is at once. +Most of the time, you will get yourself into a corner you can't solve, like a - so you deconstruct the object of depiction into big shapes first. +Only once you've got them you go into details. + +I wanted to incorporate the same kind of wisdom in the way my algorithm would do the search. +Therefore, I broke it down into multiple stages, and each stage only solves a particular level of details, starting with very big brushstrokes, forcing to generalise the shape, and then applying more details. +Each of those stages is a completely separate searches, so when a first stage is done, the second stage has to just draw on top of what has been drawn before. +Here, you can see the search process happening for different stages. +And when you see a sudden jump, that's when your brushes are added, and the algorithm is using them to better approximate the target. +And during every stage, the algorithm needs to place 100 brushes, and it has to do so in 10,000 search steps. +10,000 steps might seem like a lot, but if you need to place 100 brushes, that is 8,000 parameters, and remember, the algorithm cannot remove them. +It is forced to place strokes even if it is not perfect. +And one of the reasons to limit the step number is to encourage happy accidents, mistakes, and imperfections. + +If something goes bad, it can be fixed in later stages, giving a conception of continuous problem-solving in the brush layout itself. +As brushstrokes become smaller, I use a sampling mask to guide the brush placement towards places of higher frequencies. +I do so to preserve the loose brush work while giving a perception of a deliberate intent. + +We're not just splattering a uniform brush texture, but have a specific thought process manifested in the way the brushes non-uniform sizes and visually interacting with each other. +There is an expression, "Don't overpad the painting" meaning don't overwork it and kill the playfulness. +That's what I'm trying to avoid by having the sampling mask. +You can notice the sampling mask is generated based on edges, and edges play a very important role in drawing. + +If you have had drawing classes, you probably recognise this. +This is used as a homework to copy and learn from on how to deconstruct a 3D shape into simplified contours. +As humans, we are sensitive to sharp transitions between darker and lighter elements and a small deviation can make something look wrong. +Therefore, one of the new additions to the Rust version was using contours to guide the search. +This is done by comparing edges of target versus drawn, and computing their distance. + +Here's a comparison of using versus not using edges to guide the search. +Notice how much better defined the face is, and how it looks perceptually closer to the original. +It's subtle, but I think it gives an extra push towards believability that there is an artist's thought process behind each brush placement. +Since we need to perform iteration, it needs to be fast. + +Here are some comparisons of how long it takes for different edge-detection iterations available in different Rust crates. +In the end, I went with a custom implementation that uses chunks to make it parallel. +And here is a time and quality comparison for Canny versus without-edge detection. +Edge detection takes a huge bulk of the generation time going from five minutes to 20, even with 1.5 hours with Canny. +The facial features are captured so much more precisely with Canny or Sobel. +Canny gives a crisper result but comes at a four-times slower generation cost. +In case one is interested in how the parallel Sobel is done, here is a code. +Feel free to pause when this goes online. + +Another way I'm moving edges is to drive the brushstroke's orientation. +Brushstrokes follow along the edges and don't cross them, because that would violate the perceptual contour border. +To guide the brushstrokes, I generate an image-gradient field. +The stronger the direction, the more influence it has on a brushstroke that might be placed in that region. + +For example, here, there is almost no gradient information and therefore the brushstroke might have any orientation. +Here, closer to the edges, there is a strong directionality indicated by the length of the lines, and the brushstrokes placed here are more likely to follow along the field's direction. +The reason why I made it to the always probabilistic is I don't want to exclude any happy accidents. +Perhaps if a brushstroke is placed completely perpendicular, it might actually be a very good solution, and to the pixel and edge difference is what matters. +Computational drawing is still very much work in progress, and I hope to open source it once it's done. +One thing I still haven't gotten to is figuring out a good strategy for searching for a colour solution. +That is still on my to-do list. + +Right now, I'm directly targets from the target image. +At the moment been the algorithm is running on CPU. +The code is parallelised, and drawing on CPU is quite slow, and it just so happens, recently Embark has announced Rust on GPU project. +I'm really looking forward to its development, so please go and contribute so I can do the paintings on the GPU! + +We are reaching the end of the presentation now, so let's summarise: +we have talked about art as a search process for new representations and how representations can invite us to view the same object from different perspectives. +Art, science, and programming are similar in that regard. + +We discussed how code as an art medium invites us to convey our search process in an algorithmic form, making it the main art piece, +and how computational drawing tries to capture that search in the context of traditional drawing from reference. + +And lastly, we have covered the algorithm details as well as how we can use our artistic intent to guide the search by translating it into code. +And if you're an artist, and you are interesting getting into Rust, I really recommend that you stop considering and just do it. + +First of all, the ecosystem is great, the package management is heavenly, and if you just want to get started ASAP, learning about ownership is all you really need, which is literally like reading the first four chapters of the Rust Book. +It's very rare I find myself in the need of advanced features when doing this kind of art tools. + +I also can't recommend enough getting Rust Analyzer, it will show you types, tips, it is absolutely amazing. +When I prototype, I often write very messy code, and it's a breath of fresh air to have the language guarding me against stupid mistakes, +and Clippy shout at me for making a variable and never using it! Having that confidence that if my code compiles, it works. +that really frees my brain to focus exclusively on the algorithm design and logic flow, and I don't envision myself prototyping in any other language now. + +Before we wrap up, a quick thought out to Thomas and Maik for dealing with my Rust programming on at that daily basis. +A lot of things I learned about Rust, I learned from them, and I would like to thank Embark giving me time to discuss this report. +I would like to thank you for listening. +I hope it was useful, and you learn something new, and, if you have any questions, write a comment if you're watching it off line, or post a message in the chat if you're watching it live. + +Thank you. +Have an awesome remainder of RustFest. diff --git a/2020-global/talks/02_UTC/05-Anastasia-Opara.txt b/2020-global/talks/02_UTC/05-Anastasia-Opara.txt deleted file mode 100644 index d583d54..0000000 --- a/2020-global/talks/02_UTC/05-Anastasia-Opara.txt +++ /dev/null @@ -1,5 +0,0 @@ -Rust for Artists, Art for Rustaceans - Anastasia Opara -PILAR: Welcome back, everyone. I hope you've all been having a great day. I'm trying to not be over hyped! I'm going to wear myself out! The next talk is again something very near and dear to my heart. You might sense how excited we all are for all these talks. Our next speaker is Anastasia Opara. Yes, and, so to preface this talk, if there is anything cool that has come from tech, it is that we can express ourselves creatively through it, and that is what Anastasia Opara is here for. She comes from a family of artists, continuing the family tradition by swapping out a paintbrush for code. She is a procedural artist at Embark, and her passion is to enable people to blur the lines between algorithmic language and art language. So, it's going to be a fantastic talk. I'm going hand it over to our Bard, and I hope you enjoy. -BARD: Anastasia plays Rust like a flute, or magical lute, to simulate things that art may create, and art really does compute! -ANASTASIA: Hi, and reck to Rust for artists, art for Rustaceans. This talk covers from multiple personality disorder like talk about similarities between art, programming, and science, and at the same time showing you complicated drawing algorithms and also has practical tips. If you're only interested in the Rust and algorithm parts, go get yourself a coffee and come back in about ten minutes, since we are going to go through the meta part first. Without further ado, let's start with introductions. My name is Anastasia Opara, and I'm a procedural artist at Embark Studios, a Stockholm-based game development company. You might be wondering what is a procedural artist? Procedural arts' distinguishing feature is that it is executed by a computer following a set of design procedures designed in code - that's why it is procedural. As a procedural artist at Embark, I spend most of my time meditating on workflow in games from player and developer perspectives. There is head scratching and more head banging on the wall, which is fun, exhausting, and sometimes outright terrifying, but never boring. To give you an example, I would like to show you two of the recent projects we did at Embark. It will be tl;dr versions, and we have separate talks about both of them if you want to learn more. The first one is texture synthesis where I got introduced to Rust. It's an example-based algorithm for image generation, meaning you can give it an example image and it will generate more similar-looking images. You can also do things like guided generations, style transfer, fill in missing content and simple geometry generation. The second project is called Kittiwake which is a game-like environment where we explore a feeling of co-creation with an example-based algorithm which is embodied into this little creature. You create a small screen, like a dialogue, and Kittis tries to minimise the way you create by analysing how you place things and using it as an example. One of the key similarities between these projects is that both of them heavily rely on performing some kind of search. Both of them give an example like object arrangement in Kittiwake, or pixel image synthesis, we search for a new configuration that looks perceptually similar to the example while not being a copy. You can think of a search process in a simplified way, that is we try a bunch of configurations, we present you with the most promising one. From the user perspective, it can happen so fast that is might not seem like a search. For example, in texture synthesis, you can see pixels appearing from possible neighbourhoods in the example. In the example-based placement you can see objects magically moving around until they settle into the positions that satisfy the example. Even though the notion of search in this project can be argued purely algorithmical, it's closer to a genuine art process than we might initially think. It is easy to perceive the final artwork as the outcome of a linear pre-calculated path, like an artist just sits down and does art. However, if we dig deeper, we will discover there is always an underlying network of trial and error. For example, Picasso's famous painting The Ladies of Avignon is the result of hundreds of sketches. You can see how earlier sketches were in a different style than the final work. We can argue that of course this worked required a lot of exploration because of how stylised it is, therefore the search was purely about finding the stylisation and re imagining what we see into something completely different. However, even when painting from reference, with a goal of copying reality, it is never a passive observation but active interpretation, and engineering of usual forms, which together construct a presentation. For example, if we look at digital photo studies, we can see a process of searching for textures, colours, forms, that conjure a similar perceptual response to the target photograph. It is a dynamic problem-solving of simplifying the object of depiction while keeping the perceptual essence. Any painting if you look close enough is just an amorphous jumble of brushstrokes but they magically come together and make you believe what they present is real. That I believe is any part is a search for a presentation that conveys a target experience, and the human ability to comprehend similarity between a representation and a thing that it aims to represent is an astonishing example of abstract thinking that comes to us so naturally. And through the lens of representations, art can invite to us perceive the same object in different ways. Like let's consider this sheep, for example. As artists, we might emphasise the way the wool curls are repetition of a pattern that trees make when they sway, abandoning - or we might adopt a different perspective. And explore the sheep not the way we see it, but invite to experience the concept of a sheep through its hoof marks as it walks on the canvas. Both works aim to capture the sheep, but the outcomes or representations differ drastically. Art is not alone in its pursuits to construct things into representations. In programming, we are often faced with a challenge of translating the language of our thoughts into the language of implementation. If we were asked to represent the sheep in code, we might adopt an inheritance pattern of thinking and inherit it from an animal class, or might say that animal is just one of the traits and there are other traits we are interested in such as adorable or fluffy. In mathematics, we can transform and map it into a new space. Even if when dealing with data, we are faced with a choice of a model or a representation that will explain it. In the end, these are all representations. They don't change the way the sheep is, but they change the we think about it. And quite off, there is no-one representation that just works for a - and it becomes an iterative search for pieces needed to design a new presentation. The pieces we minute might be different but the pieces of search in art, programming, science, are similar. That's why arts, programming, and science are actually much closer to each other than we usually portray. If science helps us to reason about the external world and deconstruct a problem into processing flow, I think art is about looking inward. A self-introspection and observation of one's perceptions. Art is not just about recreation of reality, even if you do choose to make it your focus; it's an invitation to co-experience something from your perspective, something that used to be bodiless, but you invented a representation for it, and from that perspective, I genuinely think anyone can be an artist. Today, there are plenty of art media to choose from, and code is a particularly fascinating one as it invites to convey not just a final destination of the art process, but the author's workflow, the search itself. It invites us artists to reverse-engineer our own thought process and deconstruct it into an algorithmical form. Pushing back the process into the background and putting the experience of the search to be THE main art piece. And that is what computational drawing aims to capture. Computational drawing is my hobby project which is designed to imitate traditional drawing from reference by searching for a Deacon instruction of the target into discrete brushstrokes, inviting the experience to become of the work to its rough stages to final details. And just like many paintings are not a faithful representation of reality, so is computational drawing not meant to be a recreation of an artist. It was originally inspired by many implementations of genetic algorithms available on the internet. Genetic algorithm is a search algorithm loosely inspired by natural selection. But what was more interesting in my opinion is the way this project's objective, that is to represent a target image in a budgets of 50 Polygon slides, or hundreds like the Eiffel Tower. It was the summer of 2017, and I finished my user the, and I had no idea about searching algorithms. Seeing the genetic stuff really triggered me. The process, as brute force as it was, reminded me of my own experience during life-drawing classes, having to translate what I see into a set of discrete motions with a pencil or a charcoal. That personal experience combined with discovering an algorithmic representation I could use, I just had to try it out. In the end, I modified the search quite a bit which actually made it redundant to frame it as a genetic algorithm and I will touch upon it later in the presentation. So, this is a result from 2017. At the time I was just learning Python, so this was written in Python. And it was very slow to calculate, like a couple of hours almost to a day for one image. It was ugly, and I never really showed it to anyone except a couple of friends. I thought it was unsophisticated, not worth sharing, so it just kind of collected virtual dust in my hard drive, until this summer of 2020, when, thanks to Covid, there was a lot of free time, and I went through my old hard drive and rediscovered it. And the reception was beyond my expectations. Which motivated me to clean it up a bit and open source it. And it was super rewarding to see people getting inspired and trying out their own versions. And while cleaning up the Python code, I started having a lot of new ideas coming from the experience I accumulated. So I decided to start anew, and, of course, for my sanity, I restarted the whole thing in Rust, and, yes, we are finally getting to the part when I talk about Rust! How does the algorithm work? Let's imagine we can draw a single brushstroke, which is Pam terrorised by its - *parameterised by its scale, and value. I like to - there are functions for rotation as well as scale and to change value, we can simply access pixel data. So now imagine we drew this brushstroke on a canvas, and just like we thought of scale of parameters, we can think of the brush as a parameter of the canvas. And representing our brush configuration as one dimension, it is just a mental short cut, in fact, that one dimension encapsulates five scale rotation, value, and position of x and y. Now suppose we added a second brush, extending our brush space to be two-dimensional, and this new space encodes both brushes and thus appearance our canvas would have. So far, it might seem like quite a redundant transformation. Cool - but it gets more conceptually interesting as we add more brushes. It becomes messier to visualise. Let's imagine this 2D space is a space defined by 100 brushes, and a dot in the space represents a particular canvas appearance defined by our 100 brushes are configured. To move in the space, all we have to do is change our canvas. If we just take a stroll and aimlessly wander around in the space, we might discover that with just 100 brushstrokes, we can depict a lot of interesting stuff, but also a lot of random stuff. In fact, the proportion of interesting stuff to random it insanely low. It is very unlikely that we will just stumble upon a good painting. So, the question becomes how do we stop and search for something interesting? We provide a target that guides our search to a space containing similar images, and the way we can define similarity is simply a difference between the pixels of the target and the pixels of the drawn images. So, if after imitating a brushstroke our pixel difference is smaller, that means we are moving towards a space with more similar images and we should keep the mutation, and then we just do it again, again, and again. Here, you can see the beginning of a search as brushes move around trying to position themselves in such a way that looks more like a target, and if it continues its guide the search long enough, we will eventually reach a can various configuration whose brushstroke arrangement looks similar to the target. In general, that is pretty much it. There are many ways one might implement this search. It can be a genetic algorithm, gradient descent, simulated annealing. I will show you how I approached it from my art education and incorporating it into the search. I was greatly inspired in fine-art classes, especially when doing oil still lives, we were never taught to solve the detail frequencies is at once. Most of the time, you will get yourself into a corner you can't solve, like a - so you deconstruct the object of depiction into big shapes first. Only once you've got them you go into details. I wanted to incorporate the same kind of wisdom in the way my algorithm would do the search. Therefore, I broke it down into multiple stages, and each stage only solves a particular level of details, starting with very big brushstrokes, forcing to generalise the shape, and then applying more details. Each of those stages is a completely separate searches, so when a first stage is done, the second stage has to just draw on top of what has been drawn before. Here, you can see the search process happening for different stages. And when you see a sudden jump, that's when your brushes are added, and the algorithm is using them to better approximate the target. And during every stage, the algorithm needs to place 100 brushes, and it has to do so in 10,000 search steps. 10,000 steps might seem like a lot, but if you need to place 100 brushes, that is 8,000 parameters, and remember, the algorithm cannot remove them. It is forced to place strokes even if it is not perfect. And one of the reasons to limit the step number is to encourage happy accidents, mistakes, and imperfections. If something goes bad, it can be fixed in later stages, giving a conception of continuous problem-solving in the brush layout itself. As brushstrokes become smaller, I use a sampling mask to guide the brush placement towards places of higher frequencies. I do so to preserve the loose brush work while giving a perception of a deliberate intent. We're not just splattering a uniform brush texture, but have a specific thought process manifested in the way the brushes non-uniform sizes and visually interacting with each other. There is an expression, "Don't overpad the painting" meaning don't overwork it and kill the playfulness. That's what I'm trying to avoid by having the sampling mask. You can notice the sampling mask is generated based on edges, and edges play a very important role in drawing. If you have had drawing classes, you probably recognise this. This is used as a homework to copy and learn from on how to deconstruct a 3D shape into simplified contours. As humans, we are sensitive to sharp transitions between darker and lighter elements and a small deviation can make something look wrong. Therefore, one of the new additions to the Rust version was using contours to guide the search. This is done by comparing edges of target versus drawn, and computing their distance. Here's a comparison of using versus not using edges to guide the search. Notice how much better defined the face is, and how it looks perceptually closer to the original. It's subtle, but I think it gives an extra push towards believability that there is an artist's thought process behind each brush placement. Since we need to perform iteration, it needs to be fast. Here are some comparisons of how long it takes for different edge-detection iterations available in different Rust crates. In the end, I went with a custom implementation that uses chunks to make it parallel. And here is a time and quality comparison for Canny versus without-edge detection. Edge detection takes a huge bulk of the generation time going from five minutes to 20, even with 1.5 hours with Canny. The facial features are captured so much more precisely with Canny or Sobel. Canny gives a crisper result but comes at a four-times slower generation cost. In case one is interested in how the parallel Sobel is done, here is a code. Feel free to pause when this goes online. Another way I'm moving edges is to drive the brushstroke's orientation. Brushstrokes follow along the edges and don't cross them, because that would violate the perceptual contour border. To guide the brushstrokes, I generate an image-gradient field. The stronger the direction, the more influence it has on a brushstroke that might be placed in that region. For example, here, there is almost no gradient information and therefore the brushstroke might have any orientation. Here, closer to the edges, there is a strong directionality indicated by the length of the lines, and the brushstrokes placed here are more likely to follow along the field's direction. The reason why I made it to the always probabilistic is I don't want to exclude any happy accidents. Perhaps if a brushstroke is placed completely perpendicular, it might actually be a very good solution, and to the pixel and edge difference is what matters. Computational drawing is still very much work in progress, and I hope to open source it once it's done. One thing I still haven't gotten to is figuring out a good strategy for searching for a colour solution. That is still on my to-do list. Right now, I'm directly targets from the target image. At the moment been the algorithm is running on CPU. The code is parallelised, and drawing on CPU is quite slow, and it just so happens, recently Embark has announced Rust on GPU project. I'm really looking forward to its development, so please go and contribute so I can do the paintings on the GPU! We are reaching the end of the presentation now, so let's summarise: we have talked about art as a search process for new representations and how representations can invite us to view the same object from different perspectives. Art, science, and programming are similar in that regard. We discussed how code as an art medium invites us to convey our search process in an algorithmic form, making it the main art piece, and how computational drawing tries to capture that search in the context of traditional drawing from reference. And lastly, we have covered the algorithm details as well as how we can use our artistic intent to guide the search by translating it into code. And if you're an artist, and you are interesting getting into Rust, I really recommend that you stop considering and just do it. First of all, the ecosystem is great, the package management is heavenly, and if you just want to get started ASAP, learning about ownership is all you really need, which is literally like reading the first four chapters of the Rust Book. It's very rare I find myself in the need of advanced features when doing this kind of art tools. I also can't recommend enough getting Rust Analyser, it will show you types, tips, it is absolutely amazing. When I prototype, I often write very messy code, and it's a breath of fresh air to have the language guarding me against stupid mistakes, and Clippy shout at me for making a variable and never using it! Having that confidence that if my code compiles, it works, that really frees my brain to focus exclusively on the algorithm design and logic flow, and I don't envision myself prototyping in any other language now. Before we wrap up, a quick thought out to Thomas and Maik for dealing with my Rust programming on at that daily basis. A lot of things I learned about Rust, I learned from them, and I would like to thank Embark giving me time to discuss this report. I would like to thank you for listening. I hope it was useful, and you learn something new, and, if you have any questions, write a comment if you're watching it off line, or post a message in the chat if you're watching it live. Thank you. Have an awesome remainder of RustFest. -PILAR: Thank you so much, Anastasia. Wow. That was incredible. A lot of people were commenting on how natural everything looked, and were making some really, really great questions, so, if you have any questions for Anastasia, she's over at the chat right now. Please go. It was everything, wasn't it? It was, you know, Bob Ross Whole some, and so, I'm blown away. It was so cool to see things I absolutely love mesh so well and blend so beautifully. And Anastasia also asked please do a shout out to the Rayon crate. She did a version where she did do this on the recording, but, yes, just wanted to let you all know. Helped a lot in the project. That's it for this talk. You can stay tuned for our next great speaker. \ No newline at end of file diff --git a/2020-global/talks/02_UTC/06-Christian-Poveda.md b/2020-global/talks/02_UTC/06-Christian-Poveda.md new file mode 100644 index 0000000..791c1b9 --- /dev/null +++ b/2020-global/talks/02_UTC/06-Christian-Poveda.md @@ -0,0 +1,481 @@ +**Miri, Undefined Behaviour and Foreign Functions** + +**Bard:** + +Miri ist Rust's interpreter +And Christian will gladly debate'er +On how to bequeath +her the stuff underneath +so she can run until much later + +**Christian**: + +Thank you. +This talk is called Miri, undefined behaviour and foreign functions. +So let me me introduce myself. +I'm Christian Poveda. +I'm Colombian. +I'm a PhD student. + +Occasionally, I contribute to the Rust compiler project. +I don't work full-time, I just do it when I have free time. + +So, first of all, I want to say what I want to give this talk, what I think is important somehow. +The first thing is that unsafe is a controversial topic in our community, +but, at the same time, it's something that we need something super special that Rust needs to work and be able to do the awesome stuff that it already does. +So, basically, every program you have is unsafe in one way or the other, even if you don't know it. +It is important to have awareness of the implications of what happens when you use misuse unsafe correctly, or if someone else does. + +I also want to show you a super cool that can help you write better code, +because it the empower philosophy of the Rust community has of being reliability software, and at the same time, having a super helpful community with tools that helps you to build that. + +This talk will have four parts, basically. +First, I'm going to show you a bit what is safe, and what is unsafe, +what is undefined behaviour and how everything works in Rust, +and then we're going to talk about Miri, which is a super cool tool I'm talking about. +Then I'm going to talk bit about functions. +If you like this, if you think this is interesting for you, I can give you some ideas at the end on how can you help contribute in all of this. + +So, let's begin by talking about unsafe Rust and undefined behaviour. +Before even talking about undefined behaviour, I think it's super important to know or to discuss why people use unsafe Rust in the first place. + +There are two main reasons. +The first one is that some people use unsafe because they're vested in performance, +they want their programs to run super fast, +so they are ensuring everyone of their programs is running correctly, +even if those programs don't have any check to be sure that they're running correctly, and not doing a lot of checks lets you squeeze a little bit of performance when you're writing your programs. +And there is a lot of controversy around this one. +People say yes, performance matters, but safety's first. +There are a lot of trade-offs you can do there. + +But the second reason is a little less controversial in the sense that many projects we have in Rust, +you need to interact with other languages, or with your operating system, or with a bunch of resources that aren't Brian themselves in Rust, +so most likely you will have to interact with a C, or C++ library, or create that interacts with a C++ library, +and that doesn't have the same guarantees that it has about safety, and having sound programs and so on. +All those functions that interact with C libraries are unsafe too. + +So, now we can discuss what unsafe can do. +Inside unsafe functions, or unsafe blocks, when you have the two, there's not much you can do, actually. +You can do only five things, not any more. +You can de-reference raw pointers, you can call functions that are marked as unsafe, +so if you have a function that is called unsafe and in general the name of the function, you need unsafe to call it. +You have to do an unsafe block or function. + +There are some traits that are marked as unsafe too. +If you want to implement those traits like `Send` from the standard library, you have to use unsafe. +If you want to mutate the statics because you're sure that the program needs some sort of mutable global state, even though some people don't like it, you can use unsafe to do that. +You can use unsafe to access fields of unions. +Unions are like enumerations, but they don't have the consistent back to distinguish each variant, +so you can literally join two types in a single one, and use every value of type as any of the possible variance at the same time, so you need unsafe to access those fields. + +However, for the purposes of this, we are going to focus on the first two, because those are like the more likely - one of the more demon, likely we've been exposed to this at one point. +And, the first one is the de-referencing raw pointers is worth discussing at the moment. +What are raw pointers? +Many of you, if you have already used Rust, you know that we have references, we have ampersand mute and sand mutable references, and these are like two brothers or sisters, siblings, whatever. +They are called raw pointers. +We have `*const` and `*mut`, and they exist because they don't follow the same rules as references. +They don't have this liveness constraints. + +For example, you have some data, and you create a pointer to it, and you drop the data, it goes out over the scope, it's deleted. +You can have the raw pointer to it even though it is pointing to something that doesn't even exist any more. +So for these reasons, there is something else, and you can also offset those pointers using integers, +and, if you have a pointer to a particular memory area, you can add it like an integer, and you can offset it so you can read a part of the memory, and maybe you're not supposed to. +For those two reasons, those pointers might have a lot of problems and might misbehave in several reasons. +You can have null pointers that don't point to anything, really. +They can be dangling. + +There are pointers that, let's say, are pointing to something that doesn't belong to us, +so if you're inside a vector, and you saw a pointer from inside the vector to access something outside the vector, that is a dangling pointer. +Also, if you have a pointer that you offset it a bit but you didn't do it correctly, i +so for example you have a pointer between, I don't know, you use 64s, you use 16 bits instead of 64 bits, you will end up reading, like, in between values - that's an unaligned pointer. + +So, those are real pointers. +You can do a lot of messy stuff with them. +We're not sure why that is wrong really, right now. +We will go into that later. + +But let me show you an example of how to use these raw pointers, and how to use unsafe, and so on. +Here in my terminal, I have this tiny crate. +It has a single struct called `ByteArray` which has a mutable pointer to `u8`. +You can think of this type like a slide, or if you want, like a vector, but we are we only have two simple functions. +We can only read stuff from it. +We cannot grow it or make it smaller. +We can just read stuff from it. + +Usually what happens is the system, like you have these two functions, you have like the unsafe unchecked version of a function, and then you have the you have the safe version of it. +Here, we have the unsafe function called `get_unchecked`. +It receives an index, it takes this pointer, casts it to a `usize`, and then adds the index to it and casts that integer back to a pointer, and offsets a pointer by adding index to it and then we reference it. +Actually, all of this code, all of these three lines are not required to be done inside an unsafe function. +The only thing that is unsafe is reading from the pointer, calling the reference star operator. +So you can use raw pointers however you want, but the reference then, you have to use unsafe. +Then we have like the safe counterpoint of this function, so we guarantee that, +if the index you're reading is out of bounds from the length of this array, then we would return none, and if we are sure that we are in bounds, then we return "some", and then do a `get_unchecked` function. + +When you run this, for example, let's say this is a crate in the Rust ecosystem, using crates.io. +They might just do something like this. +They just import our library, by type, colour function that I didn't show, but it's called zeroes. +They might need to use unsafe, because they need to go super fast with this thing. +They will just use `get_unchecked`, and, if we run this, it returns zero. +It works as intended. +Some did you might be asking if you do this, you call this function with a ... +index. +We will get to that later. +Yes, that's the demo. + +And the big question now is, well, actually, what can go wrong when you use unsafe? +You might have answers if you're using it wrong, you're causing undefined behaviour, or undefined behaviour is super bad. +Anything can happen when you deliver undefined behaviour. + +Let's discuss a little bit undefined behaviour. +Let's say the Rust compiler was written under the same assumption how programs work, +about the programs we write, we write programs that need to meet certain conditions so the compiler can actually compile them into what we want. +If we break any of these rules, we say we are calling undefined behaviour. + +As Stefan said, this is like a way of saying if there is something that is not specified in a clear way, +if the compiler is trusting that to happen and you're breaking that rule, then you're causing undefined behaviour. +There is something super important in that undefined behaviour is different in each language. +C has a lot of rules for undefined behaviour, and those rules are not the same. + +For example, whatever Stefan told you about adding an integer, and going out of bounds and adding too much to your integer +because it can feed a number too big, that's not undefined behaviour in Rust, but that's undefined behaviour in C. +Because both compilers were built with different guarantees in mind. + +Actually, the list of things that was considered important rules when we are dealing with undefined behaviour is a little bit tricky, so I'm just going to mention some of them. +Your program might have undefined behaviour if you're referencing pointer that is dangling unaligned. +Also, if you try to produce a value that is incorrect for their type, so, for example, Booleans, when you look at the actual memory, let's say, Booleans are represented by bytes. +They take one byte exactly, so you have a one or a zero, but a byte has, like, eight bits, so you have a lot of values that you could use. +So, one is true, zero is false, but if you take a three, and you try put that an into a Boolean, doing that is undefined behaviour because three is not specified as a Boolean. +The Boolean should not know what to do if it sees a three, on one, or a zero. +Causing that is also undefined behaviour, and there are lots of rules that need to be taken into account here. + +So what happens if you break these rules? Basically, Rust cannot work correctly. +We lose this guarantee that Rust has that of producing programs that do what we want them to do. +Rust can no longer compile that program correctly, so what this means is that, in the best case, your program might not run, maybe it pros receives them into a folder, memory out of bounds error, or something like that. +In that case, it might run, but not as you intended to, so that program might do anything. +For that reason, it's pretty common to see this kind of psychedelic image with unicorns, and a lot of colourful stuff when people discuss undefined behaviour because when we deal with undefined behaviour, we lose track of what our program is doing in the most basic level. +We don't even know any more. +So there is good gnaws for us in the Rust community. + +If we are using safe Rust, if we promise never, ever, ever to use unsafe, we don't have to worry about undefined behaviours because undefined behaviours should not be happening inside Rust. +If you are super sure you're not causing undefined behaviour and you get performance benefits, or you can interact with C libraries correctly, and you've got undefined behaviour, that is also good. +There are also not such good news, and that is the super important part of our ecosystem. +If we're not causing undesirable behaviour ourselves, someone else in our dependencies might be doing. + +Mere, I have interesting statistics about this. +24% of all the crates that had in crates.io uses them safe directly. +And - of those 20% crates, all those crates, 74% of them do unsafe calls to functions that are in the same crate, so our crates using unsafe to Saul function that in the standard library, or, in other crates? +If you want to get more information about this matrix, you can Google or use your favourite web-search engine to look for this paper about how do programmers use unsafe Rust? +My point is that unsafe is everywhere, not because people aren't good at doing their job, because we actually need it. +It's everywhere. + +I also have good news. +There is a tool that you can use to detect undefined behaviour in our programs, called Miri. +If you want to take a look at the Miri repository now or later, this is the URL. +You can find all the coding there. + +So, what is Miri? It is a virtual machine for Rust programs. +Miri doesn't compile your program, it interprets it in the same sense that the JVM interprets the other code, or byte code, or the Python interpreter runs Python, or the ... +Miri is like that but for Rust. +It has a super cool feature that none of the other interpreters has, and it is that it can detect almost cases of undefined behaviour while running code. +What is interesting is that am so. + +Code used in Miri is used in the engine that does compile-time function evaluation, so, if you have any c assistant in your programs, you have a const function, part of the code is used to run. +It is used to evaluate that scant. + +Yes, ... +but here we are talking as Miri is just a standalone tool outside the compiler that can interpret your programs. +So how to use Miri? You need the version of the fire to do this. +You have to install the nightly toolchain. +You can do this by running the `rustup toolchain install nightly`. +You can install the component. +You just have to do rustup - and then, after Miri installs, it takes a while compiling but you can run binaries, you can run your whole program if you want with Miri, or you can run just your test suite if you have a test. + +Let's do a demo with the same code I was showing you before. +Again, we have these super tiny program using an external crate, let's say. +And maybe the person that is writing this program doesn't know about the garden at this time that that crate has to be sure that these functions don't cause undefined behaviour. +You might be attempted to do something like can I read the 11th precision of an array with ten limits? Who is stopping me? The compiler is not complaining. +It works. +It actually returns zero. +That is a perfectly good value because it returns the same as before. + +If you run this with Miri, you will find this super cool error that says undefined behaviour. +Pointer to allocation was de-referenced after this allocation got freed. +It points to the part of the code that causes this undefined behaviour and is appointed a reference. +You can see more information and so on. + +What we are looking for here is what it happening in the execution of Miri is that this function is creating a pointer that is dangling. +You created a pointer that is outside the actual range of the vectors, so, when the vector gets deleted because it is deleted after everyone has used it, you still have this pointer pointing to nothing. +But, for example, if we go back to the perfect case where we didn't have any undefined behaviour, we can just do cargo Miri run, and Miri won't complain and return the same as your regular program. +So that is how we can use Miri, use it to detect undefined behaviour. + +But, now I want to show you, I want to, because it's a little bit how Miri works, actually. +To talk about how Miri works, we have to dig into how the Rust compiler works. +So this is like a super high-level overview of the Rust compilation pipeline. +This is like the lists that a program follows when it is getting compiled. + +So we always start a source code with our .rs file and end up in machine code, or in a binary, or dynamic library, something like that. +And what happens in the middle are like four stages. + +The first one is parsing. +So Rust reads the text of your source code, let's say, and parses it to produce an abstract syntax three or AST. +Then this AST is transformed and produces a high-level intermediate representation, or HIR. +In this stage, it is where the process happens, the typing happens, so a lot of types are here at this stage. + +Then the HIR is lowered to another representation into the MIR presentation, but this is a mid-level representation, MIR. +This is where the borrow-checking happens. +And after that, we start interacting with LLVM, that is for compilers that Rust uses, and the LLVM project has their own intermediate representation, so we lowered MIR to the. +And finally, LLVM does the code generation to produce your binary file, or your library, and so on. + +Miri works almost in this way. +The only difference is the code generation stages don't run so, we don't get to talk with LLVM. +What happens is that Miri lets the compiler run until you have the review program, and we interprets that. +When it has byte code in the JVM, Rust has MIR when running Miri. +That's why Miri is called Miri, because it's an MIR interpreter. + +Here is something super important, that Miri cannot interpret programs that aren't Rust programs. +So you have a C library that you run with your Rust program. +We can't interpret that in any way. +That program doesn't have the same syntax, the compiler doesn't even understand that program. +You can't interpret that. + +And there are many limitations, actually. +There are some limitation that is Miri has. +Miri is not perfect. +It's not a silver bullet for your undefined behaviour problems. + +Another limitation is that Miri is slow, so, if your test or program is performance-sensitive, it consolidate can take a while to run your program, even if you can do it. +This happens because Miri has to do a lot of runtime checks about your pointers, and how memory is managed to be able to tell you when undefined behaviour is happening. +And the other important point is that Miri can only detect undefined behaviour as it is happening. +If that doesn't happen, Miri won't be of use in this case. +Miri cannot detect data races yet. +And, again, Miri can only interpret Rust programs. +This one is super important. + +You might be wondering why does this matter? And it is because, well, you know, programs don't run in isolation. +We tend to use files, we tend to access files, get resources over the network, interact with databases, we need to go to the primitive of our system, whatever. +And the mechanism that Rust uses to interact with, those are for foreign functions. +That is what this last part is about, foreign functions. + +Some of us might be, let's say, might be think that we don't need foreign functions at all. +Maybe we have never used external functions in our projects. +But I'm not sure - I'm sure everyone, or almost everyone, has interacted with the standard library to do standard operation reading files, whatever, and that means somehow you're using foreign functions. +For example, this is the stack trace when you call `File::open`. +It's on the library for opening files. + +There are six functions here. +The first two are like Rust functions that are in the standard library. +They are platform-independent. +Then we have like four functions that are specific for Unix-like platforms, so those only run on Linux, MacOS. +And then we have this `open64` function at the end. +The only part about this `open64` function - the `open64` function it's a Rust function - it's an Linux system used to open a file. +So this is a foreign function written in C. +And it is an unsafe function, and Miri cannot interpret it, so what happens if in any of this process would be we have undefined behaviour? +Can we run that? +It can interpret the `open64` function. + +The good news is that Miri can actually run your program in a particularly interesting way. +And, yes, Miri cannot interpret your foreign function, but it can intercept this call, so, when you're running your program, +if you call `open64,` meaning it will be someone calling `open64,` that's a foreign function that I don't know, that's not a Rust function, and then contributors can write whatever code they want to emulate that function. +We call the code that emulates a function an shim. +And if an shim needs to interact with the operating system, or with any of the resources that the standard library provides, we use the standard library for that. +So it is funny, because the standard library uses foreign functions, but Miri uses the standard library to emulate some of those foreign functions. + +Let me show you. +We are instilling our example with a `ByteArray` crate. +We have a user that is concise with an index it wants to use from a config file. +It uses file open, so eventually, it will use `open64`. +And we are doing the same. +We're just printing something in unsafe. +If you try to run this with Miri, we will get an error, but it's not because we are causing undefined behaviour. +We have this `open`. +It's not available when isolation is enabled. +This is the `open64` function I was talking about. +Open is not available. +Please pass the flag to disable this isolation. +So if we to that, and set the Miri flags to Miri disable isolation, we can actually run. + +In this case, it seems the config file is causing undefined behaviour. +It says memory access pointer must be in bounds at offset 11, but it is outside the pounds of the allocation which has size 10. +It seems like someone is reading the 11th procedure with ten 11ths. +Yes, it is really in the 10 position for the 11th if you want to think in zero, in zero-based indexes. +And that's the whole problem. + +So, yes, we can use Miri to detect undefined behaviour, even in programs that use foreign functions. +That's super cool. +And, actually, the handling files can do a lot of stuff. +You can minute directories, delete files, create symbolic links, you can spawn threads, using locks and atomics, you can use it to get the current time, so run clocks inside your program, +your Rust program running Miri, you can handle environmental variables, and each of those operations is possible someone decided to write an shim for that specific foreign function. + +And this has a super cool side effect, well, not so side effect. +Some people would target to get this working. +That is the std library works across many platforms. +You can use phenotype opening beneath your one to zero platform. +So this means that you can emulate foreign functions, even if you are not in the platform, the program is going to be compiled on, +so for example if you have a program that is supposed to run in Windows with you you don't have a Windows machine, you can use Miri to interpret that program as if it were a Windows program. + +Let me show you. +So here we have another user of our library. +This time, it is using environment variables to set the size and index it wants to read. +Miri can emulate an environment inside it, so we can do - we can use the size environment variable to set the size of the array. +We set the index to 1 bus we want to run that, and we disable the isolation. +And I'm using a Linux machine. +I'm going to run it for a target that is windows. +I don't have Windows installed here. +And it works. + +If I want to run it in anything else, I can do it. +I'm running it on my regular Linux target, and it is working. +This is super cool, because maybe you can use Miri in one situation when you're not sure these codes that you wrote specifically for Windows is working correctly. +Even if you're not using unsafe, you just want to be sure your program runs as intended. + +And, yes, basically, that's like the hard content of my talk, and I want to spend the last few minutes talking about contributing to Miri. +If anything of this caught your attention, I encourage you to contribute to Miri for many reasons. +My personal reasons is that I always wanted to work in compilers because I find them super interesting. +I really like Rust and I didn't know where to start. +So I found Miri. + +I say, like, okay, I could implement maybe like some foreign functions for opening files, whatever, it sounds not too hard. +It took a while while I understood tomorrow Miri-specific stuff but it helped me a lot to understand how the compiler works and get involved in other stuff that I wouldn't be able to do otherwise. +Even then, if you don't feel comfortable yet contributing, because, I don't know, you can help this project by just using it. +Maybe you want to use it because you actually write in safe, and you want to be sure you're not causing undefined behaviour. +Some of your dependencies use unsafe and you want to be sure that they don't cause any undesired behaviour. +So there is that. + +You can say to yourself what a lot others, many, many heads, debugging, and how the different behaviour works. +Maybe you're expecting that Miri catches something, and it doesn't, or maybe it is the other way round. +You think this program is correct, and complaining. +You can open up - you can contact the contributors to discuss it, or from the Unsafe Working Group also. +There is something super important, and this is not like an obligation you have with the community. +If your program is running really slow in Miri, that's fine. +You don't have to give anyone coverage for this. + +But if you're super interested in contributing to Miri, writing this is a super easy way to start. +Or, if you want to try it yourself, it is super cool, because what you have to learn is actually you have to read your platform specification about how would that foreign function work? +And the stuff that you need to learn about Miri is really small. +You need to know how Miri works completely to do that. +I don't know how Miri works. +Like I use some little parts here in there, and I implemented a lot of things because I like them. +Even if you don't need that shim, maybe someone else needs it, and you're not just testing the undefined behaviour, you're helping everyone write better and safe code, because a lot of people use a bunch of things. +If you want some specifics for Windows, many of the chains haven't been implemented yet, and that is fine, because you can cross-interpret like you were in Windows, in Linux, sorry. +For example, if I go back to the program that opens the file, and I try to run it with the Windows target, it will fail, but it will fail because this function createfile W hasn't been implemented yet. +Maybe one of you wants to do it. +There's a bunch of stuff that hasn't been implemented yet. + +That's all, so, thank you for your time. +I hope you found this interesting, and I think we can do some questions now if you want. + + +**Stefan:** +Yes, the question, so, the 11th element in the - said about receiving allocation that was freed, it was out of bounds, so I guess this is the question: +how far can it track stuff, right? E-yes, so this is not - + +**Christian**: +Yes, this is not clear for me, actually. +Sometimes, this program fails because, when Miri interprets it, it frees the memory for the array before you read the pointer. +So it complain about memory being freed, and sometimes the pointer, the array is not deleted yet, so it hasn't been dropped +So it complains about it being out-of-bounds access, even though the arrays are still there. +So the good news is that any of those are on undefined behaviour, +but Miri tries to be deterministic as much as it can, but, when you disable it in isolation. +For example, it's really hard to be deterministic, because you change your file, that might change how everything worked internally. + +**Stefan:** +So there is a second question: +when Miri's engine is used to execute comms code during calculation, does it run in fast mode with less validation, and how do I assess the difference? + +**Christian:** +I don't know a lot of ... +Miri runs without a lot of of the validations. +It runs when you're running a standalone. +In the current version, it's faster than what I showed you, but it's because they had no do less checks. +Let's say the engine is the same, the same engine but in a different car, let's say. + +**Stefan:** +We don't have dynamic evaluations in const eval. + +**Christian:** +There is a flagging ... +in Rust, that something unleashed. +You can run like, let's say, undisrupted constant evaluation, and most of the time, it breaks the compiler, but, yes, you can actually run whatever you want. +Using Miri inside the compiler. +But that is super experimental. + +**Stefan:** +So long-term, one could have a VM, like a full-functioning VM in Miri? + +**Christian:** +In principle, yes, but there are a lot of questions, like, +for example, you read a file, and you use the file to, I don't know, create some const or define a type that is generic but that makes your compilation unsound because every time you read the file, +it might change, or using random-number generators. + +**Stefan:** +Maybe I can introduce my own question here: +do you think in a very distant future, it will be possible to have Miri included in a binary to have Rust as a scripting language inside your Rust program? + +**Christian:** +Oh, would you. I have no idea! +I remember I read someone was writing an interpreter for - so you can use it like was a rebel. +I don't know what happened with that project. + +**Stefan:** +Was this the Python-like thing? + +**Christian**: +No, it was a little bit different because it didn't run Rust but Miri, you had to write the MIR of your program together with the Rust code. + +**Stefan:** +Okay. Interesting. Another question from the audience: +Would it be possible to do this kind of analysis general LLVM IR? + +**Christian:** +I'm tempted to say yes, yes, you could. +The thing is that you don't have a lot of the type of information you have when you're interpreting the MIR. +In the MIR, you have a lifetime for every single value. +I don't know if you can do that in LLVM IR. +In principle, yes, you can build, for example, a stack model for VMIR, but the inference is you can build it. + +**Stefan:** +You would have to add a lot of metadata because new types maybe conscious in ... + +**Christian:** +Yes, it's harder, but I believe it's possible to do that. + +**Stefan:** +Is there anything you would like to show off, like a final use case, or an idea, hey, if someone is bored, maybe give a shot at this project? + +**Christian:** +Yes, actually, let me ... +let me open a new Firefox window here. +If you're bored and you want to do something inside Miri, we have a lot of issues here. +But, we have this label. +There are a lot of tiny - the shims label, there are tiny problems here. +For example, Miri doesn't support custom allocators, and in the last version, now the pointer allows for a customer locator, +so it is super important now to have a way to use a customer locator for Miri to test with different allocators, for example. +If you're board, you can wrap any of those issues. + +**Stefan:** +Cool. +I'm looking forward to a stable box with customer allocatable support. +That will be very interesting. +Wonderful. +I think we have reached the end. +I don't see any more questions. +It was very well received, and great talk. +Thank you again. +Ferrous thanks you as well. + +**Christian:** +Thanks so much. + +**Stefan:** +Will you be in the chat afterwards? + +**Christian:** +Yes, I will hang a little bit in the chat. + +**Stefan:** +Wonderful. +So, thanks, everyone, for listening, and the final talk will commence in ten-ish minutes. +There will be some announcements before and after, so stick around. +Also, we have two artists coming up after the last talk. +Right, thank you, everybody. +Bye. diff --git a/2020-global/talks/02_UTC/06-Christian-Poveda.txt b/2020-global/talks/02_UTC/06-Christian-Poveda.txt deleted file mode 100644 index a18c4a6..0000000 --- a/2020-global/talks/02_UTC/06-Christian-Poveda.txt +++ /dev/null @@ -1,29 +0,0 @@ -Miri, Undefined Behaviour and Foreign Functions - Christian Poveda. -STEFAN: Hello. Welcome to the second-last talk of the Africa-Euro time zone. Joining me now is Christian. He is doing his PhD on refinement types. He is in Spain, in Madrid, originally from Colombia, and I think you will go into more details. Also, I'm very happy to see that we will learn about undefined types - undefined behaviour. For those of you who don't know what undefined behaviour is, it's a very old term coming from from C, and it talks about that the standard of the library does not specify, so one example is if you have annum and you count it up, what happens if you reach the end of the range that number can hold, does it wrap around, or does it do something else? Can the compiler optimise? Stuff like that. That is undefined behaviour. I think the rest is up to you. Don't forget to go to the chat system. On the right side to chat, and then - press 17, and you will see views of the room, so you can ask questions there. -> Thank you. This talk is called Miri, undefined behaviour and foreign functions. So let me me introduce myself. I'm Christian Poveda. I'm Colombian. I'm a PhD student. Occasionally, I contribute to the Rust compiler project. I don't work full-time, I just do it when I have free time. So, first of all, I want to say what I want to give this talk, what I think is important somehow. The first thing is that unsafe is a controversial topic in our community, but, at the same time, it's something that we need something super special that Rust needs to work and be able to do the awesome stuff that it already does. So, basically, every program you have is unsafe in one way or the other, even if you don't know it. It is important to have awareness of the implications of what happens when you use misuse unsafe correctly, or if someone else does. I also want to show you a super cool that can help you write better code, because it the empower philosophy of the Rust community has of being reliability software, and at the same time, having a super helpful community with tools that helps you to build that. This talk will have four parts, basically, first, I'm going to show you a bit what is safe, and what is unsay, what is undefined behaviour and how everything works in Rust, and then we're going to talk about Miri which is a super cool tool I'm talking about. Then I'm going to talk bit about functions. If you like this, if you think this is interesting for you, I can give you some ideas at the end on how can you help contribute in all of this. So, let's begin by talking about unsafe Rust and undefined behaviour. Before even talking about undefined behaviour, I think it's super important to know or to discuss why people use unsafe Rust in the first place. There are two main reasons. The first one is that some people use unsafe because they're vested in performance, they want their programs to run super fast, so they are ensuring everyone of their programs is running correctly, even if those programs don't have any check to be sure that they're running correctly, and not doing a lot of checks lets you squeeze a little bit of performance when you're writing your programs. And there is a lot of controversy around this one. People say yes, performance matters, but safety's first. There are a lot of trade-offs you can do there. But the second reason is a little less controversial in the sense that many projects we have in Rust, you need to interact with other languages, or with your operating system, or with a bunch of resources that aren't Brian themselves in Rust, so most likely you will have to interact with a C, or C++ library, or create that interacts with a C++ library, and that doesn't have the same guarantees that it has about safety, and having sound programs and so on. All those functions that interact with C libraries are unsafe too. So, now we can discuss what unsafe can do. Inside unsafe functions, or unsafe blocks, when you have the two, there's not much you can do, actually. You can do only five things, not any more. You can de-reference raw pointers, you can call functions that are marked as unsafe, so, if you have a function that is called unsafe and in general the name of the function, you need unsafe to call it. You have to do an unsafe block or function. There are some traits that are marked as unsafe too. If you want to implement those traits like send from the standard library, you have to use unsafe. If you want to mutate the statics because you're sure that the program needs some sort of mutable global state, even though some people don't like it, you can use unsafe to do that. You can use unsafe to access fields of unions. Unions are like enumerations, but they don't have the consistent back to distinguish each variant, so you can literally join two types in a single one, and use every value of type as any of the possible variance at the same time, so you need unsafe to access those fields. However, for the purposes of this, we are going to focus on the first two, because those are like the more likely - one of the more demon, likely we've been exposed to this at one point. And, the first one is the de-referencing raw pointers is worth discussing at the moment. What are raw pointers? Many of you, if you have already used Rust, you know that we have references, we have ampersand mute and sand mutable references, and these are like two brothers or sisters, siblings, whatever. They are called raw pointers. We have star const and star mut, and they exist because they don't follow the same rules as references. They don't have this liveness constraints. For example, you have some data, and you create a pointer to it, and you drop the data, it goes out over the scope, it's deleted. You can have the raw pointer to it even though it is pointing to something that doesn't even exist any more. So for these reasons, there is something else, and you can also offset those pointers using integers, and, if you have a pointer to a particular memory area, you can add it like an integer, and you can offset it so you can read a part of the memory, and maybe you're not supposed to. For those two reasons, those pointers might have a lot of problems and might misbehave in several reasons. You can have null pointers that don't point to anything, really. They can be dangling. There are pointers that, let's say, are pointing to something that doesn't belong to us, so, if you're inside a vector, and you saw a pointer from inside the vector to access something outside the sector, that is a - *vector, that is a dangling pointer. Also, if you have a pointer that you offset it a bit but you didn't do it correctly, so, for example, you have a pointer between, I don't know, you use 64s, you use 16 bits instead of 64 bits, you will end up reading, like, in between values - that's an unaligned pointer. So, those are real pointers. You can do a lot of messy stuff with them. We're not sure why that is wrong really, right now. We will go into that later. But let me show you an example of how to use these raw pointers, and how to use unsafe, and so on. Here in my terminal, I have this tiny crate. It has a single struct called ByteArray which has a mutable pointer to u8. You can think of this type like a slide, or if you want, like a vector, but we are we only have two simple functions. We can only read stuff from it. We cannot grow it or make it smaller. We can just read stuff from it. Usually what happens is the system, like you have these two functions, you have like the unsafe unchecked version of a function, and then you have the you have the safe version of it. Here, we have the unsafe function called get_unchecked. It receives an index, it takes this pointer, casts it to a new size, and then adds the index to it and casts that integer back to a pointer, and offsets a pointer by adding index to it and then we reference it. Actually, all of this code, all of these three lines are not required to be done inside an onsite function. The only thing that is unsafe is reading from the pointer, calling the reference star operator. So you can use raw pointers however you want, but the reference then, you have to use unsafe. Then we have like the safe counterpoint of this function, so we guarantee that, if the index you're reading is out of bounds from the length of this array, then we would return none, and if we are sure that we are in bounds, then we return "some", and then do a get_unchecked function. When you run this, for example, let's say this is a crate in the Rust ecosystem, using crates.io. They might just do something like this. They just import our library, by type, colour function that I didn't show, but it's called zeroes. They might need to use unsafe, because they need to go super fast with this thing. They will just use "get_unchecked", and, if we run this, it returns zero. It works as intended. Some did you might be asking if you do this, you call this function with a ... index. We will get to that later. Yes, that's the demo. And the big question now is, well, actually, what can go wrong when you use unsafe? You might have answers if you're using it wrong, you're causing undefined behaviour, or undefined behaviour is super bad. Anything can happen when you deliver undefined behaviour. Let's discuss a little bit undefined behaviour. Let's say the Rust compiler was written under the same assumption how programs work, about the programs we write, we write programs that need to meet certain conditions so the compiler can actually compile them into what we want. If we break any of these rules, we say we are calling undefined behaviour. As Stefan said, this is like a way of saying if there is something that is not specified in a clear way, if the compiler is trusting that to happen and you're breaking that rule, then you're causing undefined behaviour. There is something super important in that undefined behaviour is different in each language. C has a lot of rules for undefined behaviour, and those rules are not the same. For example, whatever Stefan told you about adding an integer, and going out of bounds and adding too much to your integer because it can feed a number too big, that's not undefined behaviour in risk, but that's undefined behaviour in C. Because both compilers were built with different guarantees in mind. Actually, the list of things that was considered important rules when we are dealing with undefined behaviour is a little bit tricky, so I'm just going to mention some of them. Your program might have undefined behaviour if you're the referencing pointer that is dangling unaligned. Also, if you try to produce or produce a value that is incorrect for their type, so, for example, Booleans, when you look at the actual memory, let's say, Booleans are represented by bytes. They take one byte exactly, so you have a one or a zero, but a byte has, like, eight bits, so you have a lot of values that you could use. So, one is true, zero is false, but if you take a three, and you try put that an into a Boolean, doing that is ... behaviour because three is not specified as a Boolean. The Boolean should not know whether a to do if it sees a three, on one, or a zero. Causing that is also undefined behaviour, and there are lots of rules that need to be taken into account here. So what happens if you break these rules? Basically, Rust cannot work correctly. We lose this guarantee that Rust has that of producing programs that do what we want them to do. Rust can no longer compile that program correctly, so what this means is that, in the best case, your program might not run, maybe it pros receives them into a folder, memory out of bounds error, or something like that. In that case, it might run, but not as you intended to, so that program might do anything. For that reason, it's pretty common to see this kind of psychedelic image with unicorns, and a lot of colourful stuff when people discuss undefined behaviour because when we deal with undefined behaviour, we lose track of what our program is doing in the most basic level. We don't even know any more. So there is good gnaws for us in the Rust community. If we are using safe Rust, if we promise never, ever, ever to use unsafe, we don't have to worry about undefined behaviours because undefined behaviours should not be happening inside Rust. If you are super sure you're not causing undefined behaviour and you get performance benefits, or you can interact with C libraries correctly, and you've got undefined behaviour, that is also good. There are also not such good news, and that is the super important part of our ecosystem. If we're not causing undesirable behaviour ourselves, someone else in our dependencies might be doing. Mere, I have interesting statistics about this. 24% of all the crates that had in crates.io uses them safe directly. And - of those 20% crates, all those crates, 74% of them do unsafe calls to functions that are in the same crate, so our crates using unsafe to Saul function that in the standard library, or, in other crates? If you want to get more information about this matrix, you can Google or use your favourite web-search engine to look for this paper about how do programmers use unsafe Rust? My point is that unsafe is everywhere, not because people aren't good at doing their job, because we actually need it. It's everywhere. I also have good news. There is a tool that you can use to detect undefined behaviour in our programs, called Miri. If you want to take a look at the Miri repository now or later, this is the URL. You can find all the coding there. So, what is Miri? It is a virtual machine for Rust programs. Miri doesn't compile your program, it interprets it in the same sense that the JVM interprets the other code, or byte code, or the Python interpreter runs Python, or the ... Miri is like that but for Rust. It has a super cool feature that none of the other interpreters has, and it is that it can detect almost cases of undefined behaviour while running code. What is interesting is that am so. Code used in Miri is used in the engine that does compile time function evaluation, so, if you have any c assistant in your programs, you have a const function, part of the code is used to run. It is used to evaluate that scant. "Yes, ... but here we are talking as Miri is just a standalone tool outside the compiler that can interpret your programs. So how to use Miri? You need the version of the fire to do this. You have to install the nightly tool chain. You can do this by running the Rustup tool chain install nightly. You can install the component. You just have to do rustup - and then, after Miri installs, it takes a while compiling but you can run binaries, you can run your whole program if you want with Miri, or you can run just your test suite if you have a test. Let's do a demo with the same code I was showing you before. Again, we have these super tiny program using an external crate, let's say. And maybe the person that is writing this program doesn't know about the garden at this time that that crate has to be sure that these functions don't cause undefined behaviour. You might be attempted to do something like can I read the 11th precision of an array with ten limits? Who is stopping me? The compiler is not complaining. It works. It actually returns zero. That is a perfectly good good value because it returns the same as before. If you run this with Miri, you will find this super cool error that says undefined behaviour. Pointer to allocation was de-referenced after this allocation got freed. It points to the part of the code that causes this undefined behaviour and is appointed a reference. You can see more information and so on. What we are looking for here is what it happening in the execution of Miri is that this function is creating a pointer that is dangling. You created a pointer that is outside the actual range of the vectors, so, when the vector gets deleted because it is deleted after everyone has used it, you still have this pointer pointing to nothing. But, for example, if we go back to the perfect case where we didn't have any undefined behaviour, we can just do cargo Miri run, and Miri won't complain and return the same as your regular program. So that is how we can use Miri, use it to detect undefined behaviour. But, now I want to show you, I want to, because it's a little bit how Miri works, actually. To talk about how Miri works, we have to dig into how the Rust compiler works. So this is like a super high-level overview of the Rust compilation pipeline. This is like the lists that a program follows when it is getting compiled. So we always start a source code with our .rs file and end up in machine code, or in a binary, or dynamic library, something like that. And what happens in the middle are like four stages. The first one is parsing. So Rust reads the text of your source code, let's say, and parses it to produce an abstract syntax three or AST. Then this AST is transformed and produces a high-level intermediate representation, or HIR. In this stage, it is where the process happens, the typing happens, so a lot of types are here at this stage. Then the HIR is lowered to another representation into the MIR presentation, but this is a mid-level representation, MIR. This is where the borrow-checking happens. And after that, we start interacting with LLVM, that is for compilers that Rust uses, and the LLVM project has their own intermediate representation, so we lowered MIR to the - and finally, LLVM does the code generation to produce your binary file, or your library, and so on. Miri works almost in this way. The only difference is the code generation stages don't run so, we don't get to talk with LLVM. What happens is that Miri lets the compiler run until you have the review program, and we interprets that. When it has byte coat in the JVM, Rust has MIR when running Miri. That's why Miri is called Miri, because it's an MIR interpreter. Here is something super important, that Miri cannot interpret programs that aren't Rust programs. So you have a C library that you run with your Rust program. We can't interpret that in any way. That program doesn't have the same syntax, the compiler doesn't even understand that program. You can interpret that. And there are many limitations, actually. There are some limitation that is Miri has. Miri is not perfect. It's not a silver bullet for your undefined behaviour problems. Another limitation is that Miri is slow, so, if your test or program is performance-sensitive, it consolidate can take a while to run your program, even if you can do it. This happens because Miri has to do a lot of runtime checks about your pointers, and how memory is managed to be able to tell you when undefined behaviour is happening. And the other important point is that Miri can only detect undefined behaviour as it is happening. If that doesn't happen, Miri won't be you-of-use in this case. Miri cannot detect data races yet. And, again, Miri can only interpret Rust programs. This one is super important. You might be wondering why does this matter? And it is because, well, you know, programs don't run in isolation. We tend to use files, we tend to access files, get resources over the network, interact with databases, we need to go to the primitive of our system, whatever. And the mechanism that Rust uses to interact with, those are for foreign functions. That is what this last part is about, foreign functions. Some of us might be, let's say, might be think that we don't need foreign functions at all. Maybe we have never used external functions in our projects. But I'm not sure - I'm sure everyone, or almost everyone, has interacted with the standard library to do standard operation reading files, whatever, and that means somehow you're using foreign functions. For example, this is the stack trace when you call file::open. It's on the library for opening files. There are six functions here. The first two are like Rust functions that are in the standard library. They are platform-independent. Then we have like four functions that are specific for Unix-like platforms, so those only run on Linux, MacOS. And then we have this open 64 function at the end. The only part about this open '64 function - the open64 function it's a Rust function - it's an Linux system used to open a file. So this is a foreign function written in C. And it is an unsafe function, and Miri cannot interpret it, so what happens if in any of this process would be we have undefined behaviour? Can we run that? It can interpret the 64 function. The good news is that Miri can actually run your program in a particularly interesting way. And, yes, Miri cannot interpret your foreign function, but it can intercept this call, so, when you're running your program, if you call open64, meaning it will be someone calling open64, that's a foreign function that I don't know, that's not a Rust function, and then contributors can write whatever code they want to emulate that function. We call the code that emulates a function an shim. And if an shim needs to interact with the operating system, or with any of the resources that the standard library provides, we use the standard library for that. So it is funny, because the standard library uses foreign functions, but Miri uses the standard library to emulate some of those foreign functions. Let me show you. We are instilling our example with a ByteArray crate. We have a user that is concise with an index it wants to use from a config file. It uses file open, so eventually, it will use open64. And we are doing the same. We're just printing something in unsafe. If you try to run this with Miri, we will get an error, but it's not because we are causing undefined behaviour. We have this open. It's not available when isolation is enabled. This is the open 64 function I was talking about. Open is not available. Please pass the flag to disable this isolation. So if we to that, and set the Miri flags to Miri disable installation, we can actually run. In this case, it seems the config file is causing undefined behaviour. It says memory access pointer must be in bounds at offset 11, but it is outside the pounds of the allocation which has size 10. It seems like someone is reading the 11th procedure with ten 11ths. Yes, it is really in the 10 position for the 11th if you want to think in zero, in zero-based indexes. And that's the whole problem. So, yes, we can use Miri to detect undefined behaviour, even in programs that use foreign functions. That's super cool. And, actually, the handling files can do a lot of stuff. You can minute directories, delete files, create symbolic links, you can spawn threads, using locks and atomics, you can use it to get the current time, so run clocks inside your program, your Rust program running Miri, you can handle environmental variables, and each of those operations is possible someone decided to write an shim for that specific foreign function. And this has a super cool side effect, well, not so side effect. Some people would target to get this working. That is the std library works across many platforms. You can use phenotype opening beneath your one to zero platform. So this means that you can emulate foreign functions, even if you are not in the platform, the program is going to be compiled on, so, for example, if you have a program that is supposed to run in Windows with you you don't have a Windows machine, you can use Miri to interpret that program as if it were a Windows program. Let me show you. So here we have another user of our library. This time, it is using environment variables to set the size and index it wants to read. Miri can emulate an environment inside it, so we can do - we can use the size environment variable to set the size of the array. We set the index to 1 bus we want to run that, and we disable the isolation. And I'm using a Linux machine. I'm going to run it for a target that is windows. I don't have Windows installed here. And it works. If I want to run it in anything else, I can do it. I'm running it on my regular Linux target, and it is working. This is super cool, because maybe you can use Miri in one situation when you're not sure these codes that you wrote specifically for Windows is working correctly. Even if you're not using unsafe, you just want to be sure your program runs as intended. And, yes, basically, that's like the hard content of my talk, and I want to spend the last few minutes talking about contributing to Miri. If anything of this caught your attention, I encourage you to contribute to Miri for many reasons. My personal reasons is that I always wanted to work in compilers because I find them super interesting. I really like Rust and I didn't know where to start. So I found Miri. I say, like, okay, I coup implement maybe like some foreign functions for opening files, whatever, it sounds not too hard. It took a while while I understood tomorrow Miri-specific stuff but it helped me a lot to understand how the compiler works and get involved in other stuff that I wouldn't be able to do otherwise. Even then, if you don't feel comfortable yet contributing, because, I don't know, you can help this project by just using it. Maybe you want to use it because you actually write in safe, and you want to be sure you're not causing undefined behaviour. Some of your dependencies use unsafe and you want to be sure that they don't cause any undesired behaviour. So there is that. You can say to yourself what a lot others, many, many heads, debugging, and how the different behaviour works. Maybe you're expecting that Miri catches something, and it doesn't, or maybe it is the other way round. You think this program is correct, and complaining. You can open up - you can contact the contributors to discuss it, or from the Unsafe Working Group also. There is something super important, and this is not like an obligation you have with the community. If your program is running really slow in Miri, that's fine. You don't have to give anyone coverage for this. But if you're super interested in contributing to Miri, writing this is a super easy way to start, or - or, if you want to try it yourself, it is super cool, because what you have to learn is actually you have to read your platform specification about how would that foreign function work? And the stuff that you need to learn about Miri is really small. You need to know how Miri works completely to do that. I don't know how Miri works. Like I use some little parts here in there, and I implemented a lot of things because I like them. Even if you don't need that shim, maybe someone else needs it, and you're not just testing the undefined behaviour, you're helping everyone write better and safe code, because a lot of people use a bunch of things. If you want some specifics for Windows, many of the chains haven't been implemented yet, and that is fine, because you can cross-interpret like you were in Windows, in Linux, sorry. For example, if I go back to the program that opens the file, and I try to run it with the Windows target, it will fail, but it will fail because this function createfile W hasn't been implemented yet. Maybe one of you wants to do it. There's a bunch of stuff that hasn't been implemented yet. That's all, so, thank you for your time. I hope you found this interesting, and I think we can do some questions now if you want. -STEFAN: We have some questions. I will quickly optimise the audio. So, the first one - oh, no, before we forget that. We have an intro for you. I'm sorry, I forgot to share it. Here is your introduction. -BARD: Miri is Rust's interpreter, and Christian will gladly debate her, on how to bequeath the stuff underneath, so sheen run until much later. -> That's really cool. -STEFAN: Glad you like it. Where were we? Yes, the question, so, the 11th element in the - said about receiving allocation that was freed, it was out of bounds, so I guess this is the question: how far can it track stuff, right? E-yes, so this is not - -> Yes, this is not clear for me, actually. Sometimes, this program fails because, when Miri interprets it, it frees the memory for the array before you read the pointer, so it complain about memory being freed, and sometimes the pointer, the array is not deleted yet, so it hasn't been dropped, so it complains about it being out-of-bounds access, even though the arrays are still there. So the good news is that any of those are on undefined behaviour, but Miri tries to be deterministic as much as it can, but, when you disable it in isolation, for example, it's really hard to be deterministic, because you change your file, that might change how everything worked internally. -STEFAN: So there is a second question: when Miri's engine is used to execute comms code during calculation, does it run in fast mode with less validation, and how do I access - *assess the difference. -> I don't know a lot of ... Miri runs without a lot of of the validations. It runs when you're running a standalone. In the current version, it's faster than what I showed you, but it's because they had no do less checks. Let's say the engine is the same, the same engine but in a different car, let's say. -STEFAN: We don't have dynamic evaluations in const eval. -> There is a flagging ... in Rust, that something unleashed. You can run like, let's say, undisrupted constant evaluation, and most of the time, it breaks the compiler, but, yes, you can actually run whatever you want. Using Miri inside the compiler. But that is super experimental. -STEFAN: So long-term, one could have a VM, like a full-functioning VM in Miri? -> In principle, yes, but there are a lot of questions, like, for example, you read a file, and you use the file to, I don't know, create some const or define a type that is generic but that makes your compilation unsound because every time you read the file, it might change, or using random-number generators. -STEFAN: Maybe I can introduce my own question here: do you think in a very distant future, it will be possible to have Miri included in a binary to have Rust as a scripting language inside your Rust program? -> Oh, would you. I have no idea! I remember I read someone was writing an interpreter for - so you can use it like was a rebel. I don't know what happened with that project. -STEFAN: Was this the Python-like thing? -> No, it was a little bit different because it didn't run Rust but Miri, you had to write the MIR of your program together with the Rust code. -> Okay. Interesting. Another question from the audience: would it be possible to do this kind of analysis general LLVM IR? -> I'm tempted to say yes, yes, you could. The thing is that you don't have a lot of the type of information you have when you're interpreting the MIR. In the MIR, you have a lifetime for every single value. I don't know if you can do that in LLVM IR. In principle, yes, you can build, for example, a stack model for VMIR, but the inference is you can build it. -STEFAN: You would have to add a lot of metadata because new types maybe conscious in ... -> Yes, it's harder, but I believe it's possible to do that. -STEFAN: Is there anything you would like to show off, like a final use case, or an idea, hey, if someone is bored, maybe give a shot at this project? -> Yes, actually, let me ... let me open a new Firefox window here. If you're bored and you want to do something inside Miri, we have a lot of issues here. But, we have this label. There are a lot of tiny - the shims label, there are tiny problems here. For example, Miri doesn't support custom allocators, and in the last version, now the pointer allows for a customer locator, so it is super important now to have a way to use a customer locator for Miri to test with different allocators, for example. If you're board, you can wrap any of those issues. -STEFAN: Cool. I'm looking forward to a stable box with customer allocatable support. That will be very interesting. Wonderful. I think we have reached the end. I don't see any more questions. It was very well received, and great talk. Thank you again. Ferrous thanks you as well. -> Thanks so much. -STEFAN: Will you be in the chat afterwards? -> Yes, I will hang a little bit in the chat. -STEFAN: Wonderful. So, thanks, everyone, for listening, and the final talk will commence in ten-ish minutes. There will be some announcements before and after, so stick around. Also, we have two artists coming up after the last talk. Right, thank you, everybody. Bye. \ No newline at end of file diff --git a/2020-global/talks/02_UTC/07-Diane-Hosfelt-and-Daan-Sprenkels.md b/2020-global/talks/02_UTC/07-Diane-Hosfelt-and-Daan-Sprenkels.md new file mode 100644 index 0000000..a29cd03 --- /dev/null +++ b/2020-global/talks/02_UTC/07-Diane-Hosfelt-and-Daan-Sprenkels.md @@ -0,0 +1,233 @@ +**RFC: Secret Types in Rust** + +**Bard:** +Daan and Diane get us to the hype +Of keeping secrets in a type +Disallowing creation +of some optimization +that just might tell the feds what you type + +**Daan:** +Hello, everybody. +I'm here with Diane Hosfelt, and we will be talking about secret types in Rust. +So the main gist of this talk will be that some of you may know that cryptographic engineers tend to write a lot of their code in assembly, +and there is a good reason for that, and I will explain why that is, but, as a cryptographic engineer, or aspiring cryptographic engineer, I have to write it in Rust instead. +Because of some of the compilation quirks in Rust, that's not always a good idea, and what needs to be done to make Rust programming language we can use for cryptographic code. +Both Diane and me are here in the at a conference and in the chat, so feel free to ask any questions at the end of the talk, or put them in the chat during the talk, and we will take care of them. + +**Diane:** +Hi, I'm Diane Hosfelt, and this is Batman. +Before we get started, I have a short disclaimer. +All of this work was done while I was a Mozilla employee and it in no way reflects Apple's views. + +**Daan:** +First, we will talk about timing side channels work, what they are, why are they dangerous, +and then we will talk about how Rust is not suitable to write code that is actually - that actually prevents these channels. +We will look at a couple of hacks that we could use to prevent some of these channels in Rust, +but then we will go more in depth and look at the RSC on secret types to see how we could make Rust for suitable for such code. + +So, first, ... + +**Diane:** +A side channel is any attack based on information gained from the implementation of a crypto system, not a weakness in the system itself. +In this case, we are concerned about timing side chapels which occur when attackers analyse the tame taken to execute a a cryptographic algorithm, which can be seen as an implicit output. +Imagine it takes less time to execute part of the code when a bit is zero than when it does when a bit is one. +That difference is measurable, and it lead to key recovery attacks. +These attacks are a threat in the post-spectre world, primarily used to attack secrets that are long-lived and extremely valuable if compromised, +where each bit compromised provides incremental value and the confidential shalt of compromise is desirable. +The fix is constant time code, or to be more precise, data invariant code, with the time it takes to execute the code doesn't depend on the input. + +**Daan:** +Let me explain to you why at this point it's really hard for us to guarantee that the compiler is constant time. +So this is - this story will be true for basically any programming language that is compiled. +There are some exceptions. +But we are at a Rust conference, so let's focus on Rust. + +So the main problem here is that compilers are in some sense problematic. +They are allowed to optimise thinking they feel does not change the program. +And the behaviour, like, or the runtime of a program, or stuff like that is not considered to change the program in the view of a compiler, so, the compiler might actually optimise stuff that we don't think would be - should be possible. +And so, for example, there is this thing that LVM could do which is eliminate any conditional moves that may load. +Let me show you an example of this. +Okay. + +So what you see here on the left is I have written this nice little CMOV function, so if this conditional value is true, what it should do is that it should move the value in B into A. +And if this conditional value is false, then A should just remain the same value and B should just be dropped by the way. +But the important thing here is that the conditional value is secret. + +We don't want to leak any information about the secret value, so the runtime of this function should be always the same length, the same duration, like depending on the value of this conditional value. +So what we do first is we generate a mask from its conditional value and the value that will come out of this mask will be something like either only once, or if the conditional value is true, sorry, +if the conditional value was false, it will be a mask of only zeros. +And then we will use this mask. + +So the first line here, what this does is, if this mask is only once - so if the conditional was true - then it will x, or the value in A - this will set the value in A to zero. +Then only if this conditional value was true, it will x again with b. +A will become - A will get the value of B. +And then, if this conditional value was false and then this mask would be all zeros, +then both of these end operations will make sure that this value with zero, and this value was zero, and both of these operations would be a no-op, and then A keeps the same value. + +What we see when LLVM compiles this, Rust compiles it, that the compiler is smart. +The compiler sees, this behaviour of this function completely depends on this conditional value, +so first what it does is that it checks if this conditional value is actually zero, so is it false? +And if it is - if it sees that the conditional value is true, it jumps to this instruction, so it skips this complete instruction. +And if this conditional value was false, then it just does instruction and then moves under the instruction, and that's it. + +Basically, in one of the two cases, it's skipped its instruction, and the important thing to see here is that depending on this value, +depending on the value of the conditional, the runtime of the algorithm changes, and so here we have a case where the compiler introduced a side channel which would be a side channel. +The interesting thing is that if we only look at the source code in Rust, it looks like something that, like, it looks like code that feels completely like this could be, or should be implemented in constant time. +We have these operations, and you don't even see them in the compiled code because LLVM is smart enough to see that they're not needed. +And this is actually a pretty big danger for us. +So that is what we mean when we say compilers are problematic. + +**Diane:** +Obviously, we're at RustFest, so we've all bought into Rust, but the question remains if we can do secret invariant programming with assembly, why do we need to do it in Rust at all? +Writing cryptographic in high-level languages like Rust is attractive for numerous reasons. + +First, they're generally more rateable and accessible to developers and reviewers, leading to higher quality, more secure code. +Second, it allows the integration of cryptographic code with the rest of an application without the use of FFI. +Finally, we are motivated to have a reference implementation for algorithms that is portable to architectures that might not be supported by highly optimised assemble implementations. + +**Daan:** +So, then why do we focus on Rust? +Why don't we just, if we can't write secure code, why do we want to use Rust in the first place? + +That is obviously everybody here has their idea of why they would use Rust, and in our case, it's kind of the same. +We want to use Rust. +The reason we want to use Rust is we have all these types, and all these nice checks in the compiler that allow us to make our code that is easier to write secure code. +And we want to utilise these checks and these tools as much as possible because writing just plain assembly is really hard and super error-prone, and then there's the other thing that if we only write assembly, then you've written an assembly for an Intel processor. + +When you want to run the same code on an ARM processor, you have to rewrite the whole code. +We don't want to do that, because it also allows you to make more mistakes, and we want our crypto code to be really secure, so we would like to use a high-level language if that is at all possible. +So it is not all that bad. +So there is some way how Rust can be in a wayside-channel resistant. +And this, like, a couple of these, so, in Rust been make these new-type style references around integers, a struct that only has some integer type and implement some operations on that that are presumably in constant time. + +There are two main examples in the wild. +The first one is the subtle crate which if ever you need to do some stuff in constant time, use this crate. +This is the current state of the art that we have. +This is probably what you should use, and we don't have anything better at the moment, and the other example that I would like to mention is the secret-integers crate which is a bit more academic of nature. +What it does is looks at what if we would replace all of the integer types that is constant time integer type, would that work, +and what the secret-integer crate provides side-channel resistance on the language level, so, on the language level, you're sure that your code looks like something that should be side channel resistant, but it does not actually prevent these compiler optimisations. +The subtle crate does that, and that's why I recommend that crate. + +Both of those crates, they are only a best effort, they're only best effort, and they don't guarantee all of the - they don't fix all of the compiler optimisation issues. +So, it is the language level. +We can also look at like more at the compiler level, what do we need to do in a compiler to actually do it right? It turns out we need to add some kind of optimisation barrier for the secret data. + +Let me go back to the example really quickly. +So it turns out that the problem here is that LLVM seems to be able to completely eliminate this mask variable, so this mask variable is secret, because it directly depends on this conditional value which we said was secret. +And then because LLVM can just analyse through this mask variable, it can do this really nice optimisation of adding a branch, and then just eliminating all these bitwise operations. +We need to add an optimisation barrier to this mask variable. + +And there are a couple of ways that we can add optimisation barriers, and the first example is that we can add an optimisation barrier which adding an empty assembly directive. +We construct an assembly directive which dates this mask value as an input and also takes this mask value as an output. +Then LLVM is not able to reason about what happens inside of an assembly directive. + +We know that nothing happens inside an assembly directive, but LLVM cannot reason about that. +Because it will actually keep this mask value completely intact and it will not be able to optimise through that variable, +and so, the assembly directive doesn't work on stable Rust because, for the assembly derive, you need to have a nightly Rust version to compile, so that is not really optimal. + +And so the other trick that we can use is to do a volatile read of secret data, +and what this does is guarantees that at some point this mask value would have existed on the stack, +and because of that LLVMs are not able to optimise through this read. + +Both tricks kind of work in 90% of the cases. +They do not have like 100% success rate for all our cases. +I won't go into why that is at this moment, but it's important to know that they don't always work. +They're best-effort tricks. + +The most important part is that although these tricks might work at the moment, they are not guarantees, +and the compiler might change in the future, +so perhaps in five years the compiler is actually able to look into this assembly directive and see that nothing happens, +and it might eliminate that assembly directive completely and we don't know, we don't have any guarantee that this kind of stuff won't happen in the future, +so it might be that, in a couple of years, a completely secure version of some software now might actually be insecure with a new compiler version which I find very scary. +So, yes, we like to have guarantees, and we don't want to have just hacks. +So, for the next part, I will give the floor to Diane, and she will be talking how we can use secret types in Rust to make our lives a little bit better. + +**Diane:** +Why aren't these language-level protections good enough? +The compiler and instructions. +it turns out that the general purpose instructions on various platforms take a variable number of cycles, +so for us to truly have secret independent runtimes, we need to eliminate the problematic instructions. +This can only be done at a compiler level. + +Enter RFC, this, 2859. +This defines secret primitive types and the instructions that should be defined for each type. +For all the extra types, we implement all of the normal acceptable operations. +When we know that a value is safe, we can use the declassify version to put it back to a public value. + +For example, a secret key may be an array of secret u8 bytes and keep us secure by disallowing any Rust code that would result in insecure binary code. +For example, we don't allow indexing based on secrets, we don't allow using secret boll and if statements, +and we don't allow division which is a non-constant time algorithm, and we don't allow printing of secret values, +and we say that every time we combine a public value with a secret value, it is also a secret. + +To give you an example of how this would work, here's a mock-up error message of what would happen if we broke one of these rules. +Here, the programmer chose to branch on a secret_bool. +In this case, the compiler should give us an error because that is not allowed. + +There are two parts to this problem: an LLVM part and a Rust part. +There has been some work in that LLVM realm to propose a similar RFC to this one that what we've worked together on at Hacks 2020. +We're not sure what the status of that work is at the moment, but what LLVM needs to do is to make sure that our constant time Rust code is also compiled safely, +so LLVM needs to make sure to guarantee that what we wrote down in the code is safe in the emitted binary, that means no branching on secrets, no branching with secret indices, and no variable time instructions. +At the moment, zeroing memory is out of scope, but when we have this information about public and secret values, then we've laid the groundwork to support that as well. + +Thank you so much for your attention. +If you have any questions, feel free to ask us. +While this is a recorded talk, we are currently present and ready to answer questions. + +**Pilar:** +All right. +Thank you so much, Daan and Diane. +We're lucky enough to have you both here for a Q&A. +All right, so you've been joined by your friend too! [Laughter]. + +**Diane:** +Batman came back. + +**Pilar:** +Entirely ignored during the day. +We do have a couple of questions from the audience, which is great. +The first one we've got is that this is all very complex. +And how do you discover these kinds of problems, and how do you even begin to think of a solution? +Very broad, but I think it would be great to hear your insight on this. + +**Diane:**: +So there are tools that you can use, verification tools, that can determine if on different inputs, there are different runtimes, so that is one of the ways that you can determine is if a program has non-secret independent runtimes. For part of it. Daan? + +**Daan:** +Yes, the way we discover these kinds of issues is, like, at some point, sometimes, write a piece of assembly, +and the first thing I do before I write it is just program it in C or Rust and see what the compiler tells me to do, +and then these are the moments that I stumble on, these, "Wait, if I would do this, this would not be secure." +And that's when I first discovered this for myself, so, yes. + +**Pilar:** +Cool. You said that you gave us an insight you go with what the compiler says first and then you can discover it. +Be curious about what the compiler tells you, not just like, all right. +Someone has asked if there is a working group working on it on a solution? + +**Diane:** +There isn't a working group. +There is just the RFC, which has been a little bit stale, because, you know, life gets busy. +So if anyone's interested in commenting on the RFC, and trying to help me bring it back to life, you know, that is definitely welcome. + +**Pilar:** +If there is interesting for a working group, then, yes, someone will hop on from the audience. + +**Diane:** +That would be great. +One of the things that needs to happen on the Rust side and on the LLVM side, we are going to have to eventually do some implementation work. +You know, it's not enough just to define what has to happen. We have to implement these instructions on the secret types, so that will actually be a lot of work. + +**Pilar:** +So we have very little time left, but there was a lot of chatter in the chat room, so, I guess people can find you in there, and we can get a few more questions. +There were lots of questions, and we just didn't have enough time, but thank you so much for joining us. +It was great to have you here. +She's asleep. She's melted into a puddle! + +**Diane:** +Say bye to your new friend. + +**Pilar:** +See you both. Thank you for joining us. + +**Diane:** +Thanks so much! diff --git a/2020-global/talks/02_UTC/07-Diane-Hosfelt-and-Daan-Sprenkels.txt b/2020-global/talks/02_UTC/07-Diane-Hosfelt-and-Daan-Sprenkels.txt deleted file mode 100644 index 9f47895..0000000 --- a/2020-global/talks/02_UTC/07-Diane-Hosfelt-and-Daan-Sprenkels.txt +++ /dev/null @@ -1,51 +0,0 @@ -RFC - Secret Types in Rust - Diane Hosfelt and Daan Sprenkels. -PILAR: Welcome back, everyone. I would say that I'm disappointed that it's our last talk of the day, but I can reassure you that it's an amazing one, and I mean, there's a whole lot still coming up, so we will be together for many hours to come, and it's already been a great day, so it's a bit much to ask for more! So, our last talk of the UTC block, our last - it is by Daan Sprenkels, and Diane Hosfelt. I hope I've pronounced those correctly, as someone with a difficult, in the meantime, I'm like ... Daan is a PhD student, and he considers himself an aspiring cryptographic engineer. Diane Hosfelt is a privacy and security researcher in Pittsburgh, has an enormous love for cats and castles, cats in castles, and Rust, and I can think of no two better people to tell us and educate us why our binaries are insecure, and why we need secret types. I let our Bard give us our last limerick of the UTC block. Let's take it away. -BARD: Daan and Diane get to us the hype of keeping secrets in a type, thus allowing creation of some optimisation that just might tell the FEDs what you type. -> Hello, everybody. I'm here with Diane Hosfelt, and we will be talking about secret types in Rust. So the main gist of this talk will be that some of you may know that cryptographic engineers tend to write a lot of their code in assembly, and there is a good reason for that, and I will explain why that is, but, as a cryptographic engineer, or aspiring cryptographic engineer, I have to write it in Rust instead. Because of some of the compilation quirks in Rust, that's not always a good idea, and what needs to be done to make Rust programming language we can use for cryptographic code. Both Diane and me are here in the at a conference and in the chat, so feel free to ask any questions at the end of the talk, or put them in the chat during the talk, and we will take care of them. -> Hi, I'm Diane Hosfelt, and this is Batman. Before we get started, I have a short disclaimer. All of this work was done while I was a Mozilla employee and it in no way reflects Apple's views. -> First, we will talk about timing side channels work, what they are, why are they dangerous, and then we will talk about how Rust is not suitable to write code that is actually - that actually prevents these channels. We will look at a couple of hacks that we could use to prevent some of these channels in Rust, but then we will go more in depth and look at the RSC on secret types to see how we could make Rust for suitable for such code. So, first, ... -> A side channel is any attack based on information gained from the implementation of a crypto system, not a weakness in the system itself. In this case, we are concerned about timing side chapels which occur when attackers analyse the tame taken to execute a cryptographic engineer algorithm - *a cryptographic algorithm which can be seen as an implicit output. Imagine it takes less time to execute part of the code when a bit is zero than when it does when a bit is one. That difference is measurable, and it lead to key recovery attacks. These attacks are a threat in the post-spectre world, primarily used to attack secrets that are long-lived and extremely valuable if compromised, where each bit compromised provides incremental value and the confidential shalt of compromise is desirable. The fix is constant time code, or to be more precise, data invariant code, with the time it takes to execute the code doesn't depend on the input. -> Let me explain to you why at this point it's really hard for us to guarantee that the compiler is constant time. So this is - this story will be true for basically any programming language that is compiled. There are some exceptions. But we are at a Rust conference, so let's focus on Rust. So the main problem here is that compilers are in some sense problematic. They are allowed to optimise thinking they feel does not change the program. And the behaviour, like, or the runtime of a program, or stuff like that is not considered to change the program in the view of a compiler, so, the compiler might actually optimise stuff that we don't think would be - should be possible. And so, for example, there is this thing that LVM could do which is eliminate any conditional moves that may load. Let me show you an example of this. Okay. So what you see here on the left is I have written this nice little CMOR function, so if this conditional value is true, what it should do is that it should move the value in B into A. And if this conditional value is false, then A should just remain the same value and B should just be dropped by the way. But the important thing here is that the conditional value is secret. We don't want to leak any information about the secret value, so the runtime of this function should be always the same length, the same duration, like depending on the value of this conditional value. So what we do first is we generate a mask from its conditional value and the value that will come out of this mask will be something like either only once, or if the conditional value is true, sorry, if the conditional value was false, it will be a mask of only zeros. And then we will use this mask. So the first line here, what this does is, if this mask is only once - so if the conditional was true - then it will x, or the value in A - this will set the value in A to zero. Then only if this conditional value was true, it will x again with b. A will become - A will get the value of B. And then, if this conditional value was false and then this mask would be all zeros, then both of these end operations will make sure that this value with zero, and this value was zero, and both of these operations would be a no, and then A keeps the same value. What we see when LLVM compiles this, Rust compiles it, that the compiler is smart. The compiler sees, this behaviour of this function completely depends on this conditional value, so, first what it does is that it checks if this conditional value is actually zero, so is it false? And if it is - if it sees that the conditional value is true, it jumps to this instruction, so it skips this complete instruction. And if this conditional value was false, then it just does instruction and then moves under the instruction, and that's it. Basically, in one of the two cases, it's skipped its instruction, and the important thing to see here is that depending on this value, depending on the value of the conditional, the runtime of the algorithm changes, and so here we have a case where the compiler introduced a side channel which would be a side channel. The interesting thing is that if we only look at the source code in Rust, it looks like something that, like, it looks like code that feels completely like this could be, or should be implemented in constant time. We have these operations, and you don't even see them in the compiled code because LLVM is smart enough to see that they're not needed. And this is actually a pretty big danger for us. So that is what we mean when we say compilers are problematic. -> Obviously, we're at RustFest, so we've all bought into Rust, but the question remains if we can do secret invariant programming with assembly, why do we need to do it in Rust at all? Writing cryptographic in high-level languages like Rust is attractive for numerous reasons. First, they're generally more rateable and accessible to developers and reviewers, leading to higher quality, more secure code. Second, it allows the integration of cryptographic code with the rest of an application without the use of FFI. Finally, we are motivated to have a reference implementation for algorithms that as portable architectures might not be supported by highly optimised assemble implementations. -> So, then why do we focus on Rust? Why don't we just, if we can't write secure code, why do we want to use Rust in the first place? That is obviously everybody here has their idea of why they would use Rust, and in our case, it's kind of the same. We want to use Rust. The reason we want to use Rust is we have all these types, and all these nice checks in the compiler that allow us to make our code that is easier to write secure code. And we want to utilise these checks and these tools as much as possible because writing just plain assembly is really hard and super error-prone, and then there's the other thing that if we only write assembly, then you've written an assembly for an Intel processor. When you want to run the same code on an ARM processor, you have to rewrite the whole code. We don't want to do that, because it also allows you to make more mistakes, and we want our crypto code to be really secure, so we would like to use a high-level language if that is at all possible. So it is not all that bad. So there is some way how Rust can be in a wayside-channel resistant. And this, like, a couple of these, so, in Rust been make these new-type style references around integers, a struct that only has some integer type and implement some operations on that that are presumably in constant time. There are two main examples in the wild. The first one is the subtle crate which if ever you need to do some stuff in constant time, use this crate. This is the current state of the art that we have. This is probably what you should use, and we don't have anything better at the moment, and the other example that I would like to mention is the secret-integers crate which is a bit more academic of nature. What it does is looks at what if we would replace all of the integer types that is constant time integer type, would that work, and what the secret-integer crate provides side-channel resistance on the language level, so, on the language level, you're sure that your code looks like something that should be side channel resistant, but it does not actually prevent these compiler optimisations. The subtle crate does that, and that's why I recommend that crate. Both of those crates, they are only a best effort, they're only best effort, and they don't guarantee all of the - they don't fix all of the compiler optimisation issues. So, it is the language level. We can also look at like more at the compiler level, what do we need to do in a compiler to actually do it right? It turns out we need to add some kind of optimisation barrier for the secret data. Let me go back to the example really quickly. So it turns out that the problem here is that LLVM seems to be able to completely eliminate this mask variable, so this mask variable is secret, because it directly depends on this conditional value which we said was secret. And then because LLVM can just analyse through this mask variable, it can do this really nice optimisation of adding a branch, and then just eliminating all these bitwise operations. We need to add an optimisation barrier to this mask variable. And there are a couple of ways that we can add optimisation barriers, and the first example is that we can add an optimisation barrier which adding an empty assembly directive. We construct an assembly directive which dates this mask value as an input and also takes this mask value as an output. Then LLVM is not able to reason about what happens inside of an assembly directive. We know that nothing happens inside an assembly directive, but LLVM cannot reason about that. Because it will actually keep this mask value completely intact and it will not be able to optimise through that variable, and so, the assembly directive doesn't work on stable Rust because, for the assembly derive, you need to have a nightly Rust version to compile, so that is not really optimal. And so the other trick that we can use is to do a volatile read of secret data, and what this does is guarantees that at some point this mask value would have existed on the stack, and because of that LLVMs are not able to optimise through this read. Both tricks kind of work in 90% of the cases. They do not have like 100% success rate for all our cases. I won't go into why that is at this moment, but it's important to know that they don't always work. They're best-effort tricks. The most important part is that although these tricks might work at the moment, they are not guarantees, and the compiler might change in the future, so perhaps in five years, the compiler is actually able to look into this assembly directive and see that nothing happens, and it might eliminate that assembly directive completely and we don't know, we don't have any guarantee that this kind of stuff won't happen in the future, so it might be that, in a couple of years, a completely secure version of some software now might actually be insecure with a new compiler version which I find very scary. So, yes, we like to have guarantees, and we don't want to have just hacks. So, for the next part, I will give the floor to Diane, and she will be talking how we can use secret types in Rust to make our lives a little bit better. -> Why aren't these language-level protections good enough? The compiler and instructions - it turns out that the general purpose instructions on various platforms take a variable number of cycles, so for us to truly have secret independent runtimes, we need to eliminate the problematic instructions. This can only be done at a compiler level. Enter RF, this, 2859. This defines secret primitive types and the instructions that should be defined for each type. For all the extra types, we implement all of the normal acceptable operations. When we know that a value is safe, we can use the declassify version to put it back to a public value. For example, a secret key may be an array of secret u8 bytes and keep us secure by disallowing any Rust code that would result in insecure binary code. For example, we don't allow indexing based on secrets, we don't allow using secret boll and if statements, and we don't allow division which is a non-constant time algorithm, and we don't allow printing of secret values, and we say that every time we combine a public value with a secret value, it is also a secret. To give you an example of how this would work, here's a mock-up error message of what would happen if we broke one of these rules. Here, the programmer chose to branch on a secret_bool. In this case, the compiler should give us an error because that is not allowed. There are two parts to this problem: an LLVM part and a Rust part. There has been some work in that LLVM realm to propose a similar RFC to this one that what we've worked together on at Hacks 2020. We're not sure what the status of that work is at the moment, but what LLVM needs to do is to make sure that our constant time Rust code is also compiled safely, so LLVM needs to make sure to guarantee that what we wrote down in the code is safe in the emitted binary, that means no branching on secrets, no branching with secret indices, and no variable time instructions. At the moment, zeroing memory is out of scope, but when we have this information about public and secret values, then we've laid the groundwork to support that as well. Thank you so much for your attention. If you have any questions, feel free to ask us. While this is a recorded talk, we are currently present and ready to answer questions. -PILAR: All right. Thank you so much, Daan and Diane. We're lucky enough to have you both here for a Q&A. All right, so you've been joined by your friend too! [Laughter]. -> Batman came back. -PILAR: Entirely ignored during the day. We do have a couple of questions from the audience, which is great. The first one we've got is that this is all very complex. And how do you discover these kinds of problems, and how do you even begin to think of a solution? Very broad, but I think it would be great to hear your insight on this. -> So there are tools that you can use, verification tools, that can determine if on different inputs, there are different runtimes, so that is one of the ways that you can determine is if a program has non-secret independent runtimes. For part of it. Daan? -> Yes, the way we discover these kinds of issues is, like, at some point, sometimes, write a piece of assembly, and the first thing I do before I write it, is just program it in C or Rust and see what the compiler tells me to do, and then these are the moments that I stumble on, these, "Wait, if I would do this, this would not be secure." And that's when I first discovered this for myself, so, yes. -PILAR: Cool. You said that you gave us an insight you go with what the compiler says first and then you can discover it. Be curious about what the compiler tells you, not just like, all right. Someone has asked if there is a working group working on it on a solution? -> There isn't a working group. There is just the RFC, which has been a little bit stale, because, you know, life gets busy. So if anyone's interested in commenting on the RFC, and trying to help me bring it back to life, you know, that is definitely welcome. -PILAR: If there is interesting for a working group, then, yes, someone will hop on from the audience. -> That would be great. One of the things that needs to happen on the Rust side and on the LLVM side, we are going to have to eventually do some implementation work. You know, it's not enough just to define what has to happen. We have to implement these instructions on the secret types, so that will actually be a lot of work. -PILAR: So we have very little time left, but there was a lot of chatter in the chat room, so, I guess people can find you in there, and we can get a few more questions. There were lots of questions, and we just didn't have enough time, but thank you so much for joining us. It was great to have you here. She's asleep. She's melted into a puddle! -> Say bye to your new friend. -PILAR: See you both. Thank you for joining us. -> Thanks so much! -PILAR: So I will have our co-MCs. What a great day, right? -STEFAN: Yes. -JESKE: Such a great day. -PILAR: Mine has also fallen out of excitement. -STEFAN: I felt the need to be cute when you brought out your animals. -PILAR: Yes, we should probably wrap up. That was our last talk for the day, or at least the UTC block. There is more coming up in the LATAM block. -JESKE: Please stay on board for the upcoming stuff if you can. And also for the next block. -PILAR: If feeling a bit like Fiona, take a little nap. Come back re-energised. -STEFAN: We have two artists coming up. -JESKE: Our beautiful next performer Earth to Abigail create music with computer code, voice and various. I'm excited for that. She integrates electronic soundscapes into her song writing. I think it will be beautiful, and also relaxing, but also like having a little bit of warmth into it for this day. -PILAR: And so the other artist following up after that is, I condition pronounce the name, Aesthr. It's just written interestingly, but that's really cool. Aesthr. So, yes. The description I got which which sounds really cool is Aesthr is piloting a spaceship station made from wires, Tehran sisters, and a little witchcraft. We are in for a huge treat. We should thank all our lovely speakers, our amazing sponsors, all of you for being so great during the day, our sketch noter as well, Malwine, our captioner, Andrew, thank you so much. -JESKE: Thank you, everybody, for tuning in, in to the UTC time block. I think if everybody can check the recordings afterwards, and we wish everybody in the next block a lot of fun as well before -PILAR: Thanks again to our sponsors as well. -STEFAN: Yes. I think we have the Latin team coming. We wish them all the best fun. -JESKE: I had a lot of fun day today. You two? -PILAR: It was really great. -STEFAN: I hope we can keep this on. If I may, I have this idea, so the next time, maybe we're allowed to meet in the hall again, right, so we do this with the 24 hours, we do this again, but next time, we have two walls, and each side of each building, like three buildings around the planet, and the walls project the camera feed from the next venue in that direction. Yes, ... -PILAR: It feels we're in one building. That also sounds tiring but sounds really great. -JESKE: We will do it somewhere where you can bring your dogs. -PILAR: I have three. This is just the calm one. -JESKE: I have zero, that will be fine. I will adopt one for that day! -PILAR: You can borrow one of mine! You too, Stefan. Three dogs for three emcees. -STEFAN: Perfect. -PILAR: Thank you both to you two as well. It's been really great emceeing with you. -STEFAN: I think we will hand over now. -JESKE: Stefan, you're the technician from all three of us, thank you, everybody, and we will see you hopefully. -PILAR: See you at the Latin block [Spanish spoken]. Ciao! \ No newline at end of file