Cody Goodman and Taylor Fausak explore the event log that GHC can produce when compiling or running. Episode 21 was published on 2019-10-07.
Hi! Welcome to the Haskell Weekly podcast. I'm your host Taylor Fausak. I'm the lead engineer at IT Pro TV. With me today is Cody, one of the engineers on my team. Thanks for joining me today, Cody.
>> Yeah, no problem. I'm excited to talk about our topic today.
>> Yeah what topic is that? What are we gonna be talking about today?
>> We came across the "Eventful GHC" blog post and hope to dig into that.
>> Sounds good to me! Let's jump right in. So what is this blog post about, Cody?
>> It's about the event log and how you can use it to get a picture of what GHC is doing not only about the events it already throws but adding your own user-defined events to get a complete picture of the things that you do care about in terms of time and space.
>> Sounds cool! I think I could probably guess based on context clues but what exactly is the event log?
>> It is just a generic list of events written to a file inside of the binary.
>> Alright so that sounds like a pretty lightweight file format. The way that I'm most used to looking at the performance of Haskell programs is using profiling,. How does the event log compare with profiling?
>> So a big advantage it has they mentioned it in a different order in the blog post but I think the last thing mentioned is: the biggest reason to use it; it has a lot less impact on performance and you can keep all your same optimizations you'd be running anyway.
>> That sounds nice.
>> And get an eye on those.
>> Yeah because I know that with profiling you have to rebuild basically everything, right?
>> Yeah, and it can really take quite a bit. Then you have something that you're uncomfortable without putting in production.
>> Right because we use Haskell here at IT Pro TV and if you try to rebuild our project with profiling, I think you're probably going to want to go get a coffee, maybe even go get lunch while that happens because it's got to build you know two hundred some-odd dependencies and rebuild the whole project. It's a lot of work for the compiler to do.
>> Yeah last time I did it it took at least a good 30 minutes.
>> And then once you've got that built like you said I'm not super comfortable throwing that in production because profiling defeats so many of the optimizations that we kind of rely on to have a fast service.
>> Right and if you're defeating those optimizations that means you're getting a fuzzier picture of what the performance was like anyway. So in that way the event log here let's just have a lot more clear look at that and take more clear actions as a result.
>> Right and to put a fine point on that, when you enable profiling you can change the behavior in terms of performance of the thing that you're looking at. So you may identify some part of your program that's slow and turn on profiling to figure out why it's slow and then once you do that some other part of your program turns out to be the hot spot with profiling enabled. So the event log seems like a good way to avoid that problem but still get a peek into the performance characteristics.
>> Right, and if you do go ahead and need more information from profiling and you can keep in mind what you learn from the event log and have at least an idea of things that maybe are just noise in the profile.
>> Yeah it's you know by no means one or the other. You can definitely use both. So with the event log I think GHC has some custom tooling around making events but there's some stuff in Debug.Trace which is a handy little module that provides a lot of stuff for like logging out from pure functions. That's the way that I most commonly use it. But it seems like it's recently gotten some other stuff for logging events, could you talk about that?
>> Yeah they recently added the withTraceIO function and I believe a couple others. Oh sorry it was traceEvent and traceEventIO, traceMarker and traceMarkerIO. That lets you have user-defined markers and that's really powerful. That lets you see, you know ,when that spike in heap usage happens, what marker is it near. What logical thing you defined is happening is going on.
>> Right, so we've been talking a lot about events which are you know just kind of ad hoc something happened, but a marker it sounds like is a bigger picture than that?
>> Right. This is amongst all those events that are happening, what function just happened. Or not even what function but at a higher level what unit of work is happening.
>> Right, so like right now we're in the, you know, collecting dependencies part of the job, and then now we're on the optimizing part, and now we're on the linking part. Those are those might be markers.
>> Cool. So Debug.Trace has these functions that anybody can use, but GHC itself uses these in kind of a neat little way. Could you talk about that a little bit, Cody?
>> Yeah sure. They have a function they use called withTiming now, and they create their own user-defined events of systool:linker, CMM pipeline, chasing dependencies, stuff like that. And they're gonna be recording these by sampling all of Hackage which is amazing.
>> Yeah, that's a huge data set to work with. You know obviously not every Haskell program is on Hackage but a huge subset of them are and it seems like a good way to get a peek at the performance characteristics of GHC in the real world.
>> Right and something so exciting to me is I'm thinking that is a representative sample.
>> Yeah, for sure! And I think it's kind of in stark contrast to in this blog post they give a little "hello world" program as an example and they show the event log for that. And that's the type of thing that I'm more used to seeing in like my own personal benchmarks or whatever's in GHC. It's a little program that's meant to show either a typical use case or like some weird corner case, but not representative of programs at large.
>> Right those have their place but this is gonna give, you know, there's more complicated behaviors and gives a chance to get insights on those.
>> Right which is super exciting. And one of the ways that we can look at the results of this data set is using a tool they mentioned called ghc-events-analyze, which reads that binary event log and spits out a human readable kind of table of information along with an SVG chart, so you can visualize where things are spending its time. And I think they mentioned a couple other tools, could you talk about some of those?
>> Yeah one of the other ones was eventlog2html, which is like an interactive look at your heap, and also when you add those markers in it will it will tell you what units of work happened near a particular place and the graph you're looking at like a spike or something or a drop.
>> That sounds super useful. So you could use ghc-events-analyze to look at timing information, where are we spending our time, and then look at eventlog2html to say are we, you know, doing too much garbage collection there, are we allocating too much, or is it probably CPU bound instead, or maybe something else, you know maybe it's IO bound.
>> Right and we don't want to leave that threadscope, which will be the non web app it's a GTK app. I believe it offers some of the same features. If memory serves correctly maybe, a couple more.
>> Nice, so we've got a lot of tools to look at these event logs. And one of the things that I really like about these event logs is it sort of separates when you're running the program versus when you're analyzing it. So the way that I'm most used to using Haskell is with web apps, so typically you want to analyze it while it's running, but lots of tools like the GHC compiler itself or a lot of other programs you know you run them once and then they're done and you can look at this event log to see over the course of running it where did it spend its time. Which can be super useful for like a command line tool that you know if you wanted to write a grep competitor in Haskell you could do that and run it over some data set and then look at the event log and spend some time poking around with that. So for GHC they've kind of built this in to like you said earlier there withTiming function to give a picture of where time is spent. They don't quite have everywhere figured out yet still missing a few spots, right?
>> Yaeh they were missing things like when it's parsing information at startup as well as at the end after code generation hopefully they'll build that, add those in too, so we can get that full picture, make sure we're not missing a hotspot.
>> Yeah I'd be really nice to have that picture, complete end to end you know. From when GHC starts to when it's done and everything in between.
>> I agree.
>> And looking at this you know kind of sample output they have of building a "hello world" program, it was shocking to me maybe not shocking but they they sort where it spends all of its time and the number one thing is the linker. Because obviously hello world is an executable, so you want to run it, so it's got to be linked, and of it's like three and a half seconds of total run time, two and a half seconds are spent in the linker. That's crazy!
>> It really is and you know after this I remember reading some old Stack Overflows about the gold linker being faster. I think you've mentioned lld before. I want to specify those manually and benchmark these. That is crazy!
>> Yeah because if we can save you know 80 percent of our time on the linking step, that's huge. And I know that here at IT Pro we actually changed that we build our executables. We used to we have many services, so we used to have an executable for each service, and we would spend a huge amount of our build time just linking all of those executables. So we actually changed it to where we only have one executable and it has a bunch of like quote unquote sub executables, so we only spend that linking, or we only pay for that linking thing once, and then we get all of the executables we want out of it. So a little hack that hopefully wouldn't be necessary if we could switch to something like the gold linker.
>> Right and then there's there's some things I've heard happening like Bazel and adding more advanced Haskell support for that, so hopefully we can make some headway in that space.
>> Yeah it would be a huge gain for for everybody that builds executables, which I think is most people working with Haskell. So yeah you mentioned earlier that one of the things you could do with this timing information is build all of Hackage and get an accurate assessment for some real-world code. And they talk about doing that in here with head.hackage, which is a I guess subset or snapshot a Hackage that is built with the current HEAD version, like what's currently on master, for GHC. So that would be really cool for identifying, you know, "Oh we made this change that we thought was going to be improvement but it didn't actually speed things up in Hackage," stuff like that. Anything else we could do with that information?
>> Yeah we can also track that performance over time and get a fine-grained picture at where regressions are happening exactly.
>> Yeah which could be really useful for people working on GHC to say, "oh this commit slowed things down unintentionally," or "this performance improvement wasn't as big as we thought." But also for people like me who don't hack on GHC all that often we could look at you know the big question of is GHC 8.10 faster than 8.8, or you know other big markers like that.
>> Right maybe I maybe I want to skip out on 8.10 because it doubles my compile time.
>> Exactly. Hopefully that's not true. We don't have any data to back that up right now.
>> Yes, luckily.
>> But yeah you know that's a common thing that pops up you know. With these new versions of GHC that are adding a bunch of extensions and features and fixing a bunch of bugs and stuff, are they slowing things down or getting faster? We don't really know right now. Another thing you talked about earlier was profiling our own code. So we've been talking about GHC and obviously that's for the kind of build step of our pipeline. But if we're building stuff and then running it in production, we're interested in how fast that stuff is running and which parts are the hot spots there. So could you talk a little about using these events to profile your own code.
>> Right so something we recently had an issue with is a producer-consumer app that runs across many threads that we call Geoffrey. And we were looking at its performance. Something that you can do with the event log is to tag each thread ID, you can look at things like is the heap usage per thread even, you can get an insight on if you need to do things like apply back pressure, or if you know your consumers are getting starved.
>> Yeah all super useful information and hard to gather through other means like profiling, so the event log is a great way to get some insight into your running code like that.
>> One hundred percent.
>> All right well I think that about covers this "Eventful GHC" blog post. Any closing thoughts, Cody?
>> Uh, no. Start using an event log. Watch GHC.
>> It's a great project, that GHC.
>> Yep. Thank you to the developers. Even though we talked about slow compile times, we appreciate all the work. We know it's a struggle.
>> Oh yeah, it's hard. I would not want to write my own GHC. Seems very challenging to do. So yes thank you to the GHC devs. And thank you Cody for joining me today it's been great talking with you.
>> Great talking with you too, Taylor. Thank you.
>> And as always, thank you for listening to the Haskell Weekly podcast. Please rate and review us on iTunes, if you liked what you heard. If you didn't like what you hear, I don't know hit us up on Twitter and we'll try to do better. Thanks again and we'll see you next week.