Podcast
Strategic Deriving
You can also follow our feed. Listen to more episodes in the archives.
Cameron Gera and Taylor Fausak discuss the pros and cons of various deriving strategies.
Episode 25 was published on 2020-10-08.
Links
- https://kowainik.github.io/posts/deriving
- https://hacktoberfest.digitalocean.com
- https://dev.to/tfausak/how-to-define-json-instances-quickly-5ei7
- https://www.parsonsmatt.org/2019/11/27/keeping_compilation_fast.html
- https://www.youtube.com/watch?v=pwnrfREbhWY
Transcript
>> Hello and welcome to the Haskell Weekly podcast. As you might have guessed this show is about Haskell, a purely functional programming language. I'm your host Taylor Fausak and with me today is Cameron Gera. Welcome Cam.
>> Hey how's it going man?
>> It's going good. It's been a while. It has yeah. I was looking and it's been about a year since we've done a podcast. So really glad to be back with you and just give all of our viewers some new content to listen to.
>> So just a little off of our weekly schedule but we're getting there. And we're coming back at an exciting time because it's October which means it's Hacktoberfest. And I understood that you found an interesting repo for Haskell Hacktoberfest shenanigans. Can you tell us about it?
>> Yeah yeah so you know Hacktoberfest is an opportunity for us to all contribute to the open source world and to contribute to public repos and create PRs against these public repos. So you put four PRs together and get those submitted and you're going to get a t-shirt. And so with this new project that I'm going to be talking about we're going to be able to do that and learn Haskell at the same time. So funny enough the name is Learn4Haskell. So it's an opportunity to kind of --
>> With a number four in the middle: Learn4Haskell
>> Exactly 4 Haskell yeah. And you know we're gonna provide the link for you there's gonna be two places: in the show notes you can check that out and also on this week's edition of Haskell Weekly which Taylor could you remind me what version we're on?
>> Oh that's a good question. We're on version 232 I think
>> Wow so 232 weeks you've been doing this
>> Yeah that's a lot of weeks almost four years ... four, five years? I can't do math
>> Well hey we're super thankful you've taken on that project and you know now we're spinning it out a little bit more doing some more podcasting
>> Yeah man so yeah look for that link to Learn4Haskell in the show notes and if you're interested in Hacktoberfest but maybe not interested in Haskell then I wonder what are you doing listening to this podcast but you can go to hacktoberfest.com and read more about it there
>> Yep awesome man well I mean without further ado i think we should kind of dive into our subject for the day huh
>> I agree and today we're going to be looking at a blog post called "Strategic Deriving". It comes to us via Kowainik and it's authored by Veronika Romashkina and Dmitrii Kovanikov and I'm sure I butchered their names so I apologize in advance for that. But yeah this is a post talking about how deriving works in Haskell and why you might want to use it.
>> Yeah I think that's awesome and you know I want to give a shout out to these two authors they did a very thorough job on this post and so you know I'm glad we can kind of you know just use it as a jumping off point because there's a lot of content in it so you know feel free to find the article and check it out yourself. But for us yeah we're super excited to talk about it. So Taylor what's the big idea?
>> The big idea is that we can avoid writing a bunch of boilerplate and we can have the compiler do the work for us and prove that it's correct. And this is super convenient because it'll save us a bunch of typing and it'll make us feel more confident in the code that we produce. But as you mentioned this blog post is -- it's a giant resource and it's great but there's so much here that we're just going to start scratching the surface on it and if you're listening to this and you think it sounds interesting I encourage you to go read it. We'll put the link in the show notes. It is very thorough, very good resource.
>> Yeah so you know we're talking about deriving. You know we have these type classes in Haskell that offer us you know some out of the box functions and functionality for different types. You know what are some of the pros and cons of deriving? I know you kind of mentioned a little bit but can you go a little more detail?
>> For sure. To me the main pro to deriving -- and when we talk about pros and cons here we're going to be contrasting it with writing these instances by hand -- and for me the biggest pro with deriving is that it reduces the amount of code that you have to write. It reduces boilerplate. So I think the typical example is like a show instance where maybe you defined a record and it has a bunch of fields on it and if you wanted to write a show instance by hand you would have to write out all of those field names and then write show that field name of that record. And if your record has you know 20 fields on it all of a sudden that becomes 20 more lines of code that you have to maintain and keep in sync with that data type. It's just -- it's no fun. It's no fun to write that.
>> Yeah and I know from personal experience we've had some of those issues where you know we've added a new field to a record and we didn't update a JSON instance and you know we went through PR and it ended up into production and people were like "hey where's this field" you know because they were relying on that being there and then it would you know create issues for us that we'd have to you know jump to resolve
>> Right yeah it makes our job harder because we can't rely on the compiler to tell us that our instance matches our type declaration or you know the record we're defining. And like you're saying leaving out fields is a super common one. And if you have a bunch of instances -- you mentioned JSON. So like if you define a data type you have to define how that thing can be parsed from JSON how it can be converted to JSON. And if you have other formats that you convert in and out of those are listed as well. So like BSON or YAML or to the database or you know Protobufs, whatever you're using. If you have all of those suddenly you have all this code to maintain. So you go update one little field and GHC says "yeah I'm happy" but you forgot to update it in 10 other places.
>> Yeah that that's definitely one big downside to you know deriving your own instances and not using deriving
>> Yeah downside to writing them by hand upside to deriving sorry yeah upside to deriving
>> yeah sorry i flipped the pro con list there i apologize but yeah so the -- yeah and another one that you know I've run into a few times and kind of been baffled when you know my code crashes and my everything kind of freezes on it is because i've created a you know infinitely recursive instance that's not properly translated and the compiler says "yeah you're good" but you know it doesn't detect or see that you know infinite recursive bit of code
>> Yeah and to give a kind of more concrete example of this if you're defining like a JSON instance for something that's a newtype wrapper what you probably want to do is remove the newtype so like unwrap it and then call toJSON on the thing on the inside but what you might accidentally do is forget to unwrap it and just call toJSON directly and that's actually going to type check because you are defining an instance for how to convert this thing to JSON but then in your instance you're using you're calling itself over and over again and GHC won't catch this error and you'll be mystified in production like we're seeing 100 cpu usage why is that everything passed the tests everything pass code review well hopefully it doesn't pass the tests but the compiler was happy
>> Yeah exactly and that's you know great thing for tests we can always be about plugging tests here at Haskell Weekly yeah and you know another nice one and I kind of think you've touched on it but like you know you've got consistency you know between different modules different types there's you know a consistent code feel right so you know you look at a type and you've seen all the record accessors and then you say oh yeah let me see what instances it has and it's right there easy to see easy to understand so i really think that's a nice pro for this deriving tool
>> I agree i think there are two parts to this even where on the one hand you have across your code base as people get acclimated to it you can look at a data type and just know what all of the instances are going to look like so for instance again with JSON using it as an example maybe for every field you make all the characters lowercase and put hyphens between the words for the field names and you can just look at the record declaration and see oh this field is called whatever it is and i know that when it gets converted to JSON it's going to look like this instead you don't have to actually read through the instance to see how it's doing any of those conversions and then the other side of this is that by using deriving like visually the amount of code that you have to parse in order to understand which instances are provided is very little you can see deriving ToJSON FromJSON Show Eq Generic Ord rather than having you know 50 lines of code with five different instances there you can just read it all in one line
>> Yeah i think that's a super big benefit and you know really helps you know just keep that code more maintainable I mean that's that's the big thing yeah so you know we've heard a lot of pros you know there is a lot of good reasons but to use deriving but there's also you know some some drawbacks more cons as you will you know so what are some of those
>> yeah it's not all roses with deriving unfortunately one of the the ones that we run into frequently and i should probably specify that for you and me cam we're working on a Haskell application at ITProTV day in day out so most of our experience is colored by that we don't maintain many open source libraries we mostly focus on this one application so some of these things are more applicable to apps versus libs but just so everyone knows where our biases are but yeah one of the downsides for us with deriving is that as soon as you want to do something a little bit different then you have to kind of scrap all of the deriving stuff and go back to doing it completely manually and again to keep using JSON as an example here imagine that for any API that we get to define we use our lowercase letters with hyphens between the words scheme but we have to integrate with a third party and they wrote their thing in C# and it wants camel cased words so when we want to talk to them we can't use that derived type anymore we have to write it ourselves manually
>> Right so that leads to that special case right that's that one special case that kind of pokes a hole in in easy deriving which is definitely a bummer yeah and i think another thing too with this deriving is that it can be a little unclear what an instance does because the code's not right there right you you need to go to you know Stackage or Hackage or Hoogle and find you know what the definition of this type class and these functions are
>> Yeah and this is where you know in a language like Go something that proponents of that language go to a lot is that you can look at the code and see pretty clearly what it's doing there's not a whole lot there to surprise you but with deriving the whole point is that there's nothing there it's all abstracted away so if you don't already know what it does there's not really anything to guide you into guessing what it does
>> Right so but I mean in my opinion I don't find it that bad to you know oh I'm not sure what this you know very specific type classes doing let me go check out the functions and see what the type definitions are you know i think the documentation in Haskell is good enough to check that out and figure it out you know
>> I agree and typically there aren't that many type classes that you're working with so as you get you know introduced to a code base you're probably going to become familiar with those type classes and then you'll get a feeling for how they're implemented by default
>> Right yeah so you know what is the performance impact of using this abstraction level or abstraction level excuse me
>> Yeah this is actually something I've spent a lot of time looking at and i can leave a couple links in the show notes to blog posts I've written that look at the performance of various methods of dealing with this boilerplate but the short version is that deriving specifically through generics which is what this blog post spends a lot of time talking about is one of the slowest methods to compile which is unfortunate because it has a lot of other really nice benefits and when i say one of the slowest what i mean is in comparison to writing the instance by hand or deriving it via Template Haskell which is essentially code generation at compile time using generic deriving is going to be slower than either of those methods but clearly there are other upsides and so it's up to the people writing the code to decide is it worth you know taking an extra second per instance declaration this is just a fake number I don't know how long extra it would be you would have to run your own benchmarks but is it worth taking a little extra time in order to you know get these other pros that we talked about at the top of the show
>> Right yeah yeah i think that's you know a good thing to consider when you're working within a team and you know you're kind of focused on oh hey like you know we want to be as performant as possible you know we want to keep our compile times down like oh this is a side effect of you know deriving you know and generic deriving you know so that's something to consider for us we're like yeah let's do it you know less boilerplate less you know yeah I mean as someone who's come from using both you know instance deriving and you know this deriving feature that you know you can do you know we've experienced a lot of the tensions of the instance deriving whereas you know and we've really been making a concerted effort to moving towards you know just regular old deriving
>> Yeah and there are definitely ways to lessen the impact of the performance penalty you take with generic deriving Matt Parsons has a good blog post talking about how to keep your builds fast and for the most part it comes down to keeping modules small and putting types in their own modules and we've been trying to do that as we move to more generic deriving in our code base
>> Right right you know so we've got one more con here you know what's so difficult you know about deriving you know a type class that allows deriving or writing a type class that allows deriving
>> Yeah so using JSON as our go to example um when you're using a library like Aeson it provides this whole mechanism for you so you have your choice is to opt into using deriving but providing that option to other people can be a challenge because you have to rely on the generic machinery that GHC provides you and that can be a little different or a lot different than your day-to-day programming and i feel like it's important to contrast this with the alternatives so with Template Haskell you could write as essentially something that parses a Haskell data declaration and produces some instance declaration from that which is not actually but you could think of it as being a textual like search and replace if you see this do this or in your documentation you could provide you know this is what an instance normally looks like copy paste it and change some stuff around to do the generic deriving you have to understand how generics are represented in you know Haskell level values how to connect all these type classes together and these things aren't insurmountable problems but they are a barrier to clear and I think kind of the silver lining here is that it's pretty uncommon to define new type classes that need generic deriving and if you do need to do that there are good resources for doing it either you can go crib from another library that does it already or there are some recent talks about you know how does the generic type representation what does it look like and how do you work with it
>> Right yeah and i think you know reading and understanding what was happening in this blog post you know you hit you know this section about the generics and what it looks like under the covers and you're like ah like you just wanna you know put those down and you know not look anymore because it's just you know there's a lot going on there and i think you know it kind of proves how powerful you know Haskell can be but it obviously takes you know a little bit of you know thought and brain power to kind of parse those you know what that's actually doing well yeah so we've got you know different types of deriving correct is that a thing i think they're called strategies right
>> yeah i don't remember which version of GHC introduced them but there's this new concept of deriving strategies or it's not actually a new concept but it used to be implicit and now it's explicit and the strategies there are four of them one of them is stock and what stock means is this is a type class that the Haskell report like the spec for the language has defined how it should behave so it's only a handful of them and they're the stuff that you're familiar with like Show and Eq and Read and that kind of stuff
>> and Ix everybody knows Ix
>> Ix can't forget Ix and then there's also newtype which you may be familiar with through the GeneralizedNewtypeDeriving language extension and newtype deriving lets you kind of delegate your instance to the type that you're wrapping around so for instance if you have like a user ID type that is a wrapper around an int newtype deriving would just effectively just say use the int instance for whatever type class I'm deriving here and then the last two strategies are anyclass and via and any class is the one that powers this generic deriving stuff where there is typically going to be a default implementation of the type class methods and those default implementations are going to be powered by a generic representation of the type and then via kind of piggybacks on the other three but what it does is it lets you define your type class instance through another type so you kind of like conceptually wrap up the type you're dealing with in this other type that you're deriving via and then the instance will be generated based on that one i feel like i'm doing a poor job explaining exactly what it is but it's a very powerful tool very neat you should check it out
>> Yeah yeah we we've started to use this a little bit in our day to day and it's really saved us a lot of time and effort from having to write instances with you know with Swagger so we you know use servant and swagger more recently and that's something that you know we Taylor did a lot of effort on creating this type that would allow us to not have to do so much boilerplate and so we're very thankful for that and it kind of gave us the opportunity to kind of learn more about via
>> Yeah and this is a good way to come back to the pros we were talking about earlier because we've been using JSON as an example over and over again and once you bring Swagger into the picture which is like automated API documentation you're going to want the shape of your JSON data as part of that API documentation and by using derived instances for this stuff you can make sure that your JSON instance and the documentation for that JSON instance actually match each other rather than writing them by hand and potentially having a mismatch there
>> Right yeah it saves us a ton of time and it's it's great question why don't we really see when we're deriving like this need to define exactly what strategy we're using
>> So normally GHC will pick a strategy for you and like i said earlier in the past it has always been implicit and then in some relatively recent version of GHC they give you the ability to explicitly specify the strategy so for instance when you do deriving Show that's always going to be stock but if you turn on this deriving strategies extension you could instead say deriving newtype Show and the difference there would be that the you know stringified representation of that type would no longer include the newtype wrapper's name like it normally does that would be stripped out because you're going to be using the inner types instance directly but to answer your question normally it is implicit and then you can make it explicit and for us the normal way that we make it explicit is by using deriving via which is one of the strategies
>> Right right right yeah I found that bit cool when they talked about how you can explicitly say yes we're going to use newtype deriving here and it's you know for the show instance you know it's going to take away that type wrapper within the stringified version I thought that was pretty neat because you know obviously you know Show is a great resource and you know or a great type class that allows you really to debug and understand what's happening in your code but sometimes the output can be a little bit daunting because it's so you know if you have large records or a list of records it's kind of hard to parse what's going on and so you know being able to like maybe remove some of that you know complexity it could be nice
>> Yeah a good example from our code base is that we use UUIDs for some of our unique identifiers but we have an internal type that we wrap around UUIDs that we call a GUID and then we wrap domain specific types around that one so we have like a user ID is a wrapper around a GUID which is a wrapper around a UUID so by default when you show that you'll get the literal text user ID parentheses GUID parentheses UUID parentheses and then the and then the thing you're actually interested in right and by using newtype deriving at each step of the way you could strip out those things and just get the ID
>> Yeah I mean I would be curious to see how you know our team felt about doing something like that you know obviously we don't have to use Show too too much but when we do you know that would be a really nice thing for the instances where we have a very nested type
>> Yeah because you can lose kind of the overall shape of the data you're looking at when there are too many details included like that
>> Yeah well awesome thank you for you know kind of talking about the strategies a little bit you know for me it's this has all been a learning experience kind of you know obviously I've used deriving and I've you know derived specific instances but you know kind of the mechanics behind it were really kind of cool and neat to learn about
>> Yeah I agree
>> Yeah so we've got a couple more minutes here and i just want to kind of touch real quick on some of the best practices they mentioned and then you know we'll wrap it up but you know trying to be cognizant of our viewers time
>> Listeners, maybe they're viewers too
>> Glad you're on board with us but we don't want to bore you to death
>> But yeah so some of the best practices so the one that they touch on first I think you could talk about a little about always deriving the Show and Eq type classes why do you want to do that
>> Yeah I mean it helps you know working in in the REPL you know if you're trying to you know see what an output is going to be you're going to need a Show instance so it can display you know and also testing you know you're generally going to have to kind of see what's happening see what the mechanics of your code are doing and see what the types are along the way so I think that's you know some of the real pros to that but one of the cons being you know you have to be cognizant of maybe a type that you don't want publicly shared like a password or you know a token you know and if you do want to still have Show instances as well as you just be cognizant to make a custom instance for that that you know redacts the sensitive information
>> Yeah and this can be a somewhat tricky decision of do I want to provide a show instance that hides the information or do I want to not provide a show instance at all and for us the way that we typically lean is like using password as an example let's go ahead and provide an instance for it but the only thing that it outputs is going to be like the word redacted and that way if we have a record that includes a password within it we can still show that record and see kind of the overall shape of it but we're not going to leak any of that sensitive data if you go the other way and don't include the instance at all then if your user type has a password on it you can no longer show a user which to us is super frustrating
>> Yeah and I mean and if you're dealing with maybe some smaller types you know you can always kind of create your own function that turns that type into a string that can easily be shown or you know turn it into another type that has a show instance but you know i think you know what you said about kind of creating your own so you can then use it in a larger record and and have a show and since for the larger record i think it makes a big deal yeah so i mean at that point you know they kind of talked about you know deriving generics do you have anything kind of add on that as a best practice
>> Yeah so as I mentioned we are mostly focused on application development so for us the choice of do we add a generic instance or not really comes down to are we using that generic instance but for library authors the question is different because whether or not you use it in your code base or in your library I should say if any of your users are going to need that instance then you have to provide it so more often than not if you're defining custom data types in your library that other people are supposed to use you probably should have generic instances on them and I just want to touch on earlier I also mentioned how generic deriving can be slower than other other ways to provide instances and that's true but providing a generic instance by itself is very quick it's when you start to use it that things slow down a little bit so if you're afraid of providing generic instances for performance reasons I would say don't be afraid of that go ahead and provide it
>> Persevere and go through strong you've got this
>> For sure
>> Well awesome well hey I really appreciate you kind of you know talking through this post a little bit from you know a high level obviously if you want more detail you know go read the post kind of dive in because you know you're going to really be able to you know kind of walk away with you know kind of a new understanding of what's happening behind the scenes at least at least I did so I mean I think that's about it
>> Cam thank you for bringing this post you know to my attention and really taking the time to dig into it it's always nice to get a deeper understanding of something that we use day in and out but haven't really had the reason to go look into the intricacies
>> Exactly yeah I mean you're welcome you know I I've I'm obviously you know honestly just can trying to continue to learn you know as an engineer like that's our job obviously yes your job is to code but you know if you don't learn and adapt and kind of figure out something new every day you're gonna you know you're gonna get past real quick so you know I'm definitely always looking for stuff um you know and I would like to actually invite maybe some of our viewers if they have a pod you know well a podcast if they want but like a blog post that they you know really enjoy you know feel free to send it to us you know and we would love to you know talk about it look at it review it you know and spread the word per se you know what would be a good way for them to send that if they were interested
>> So for our listeners if they have something that they want us to do a deep dive on do a podcast episode on you can reach out to Haskell Weekly via email which is going to be info@HaskellWeekly.news or you can hit us up on Twitter our Twitter handle is @HaskellWeekly shouldn't be any surprises there or if you find me or Cameron on you know Reddit Twitter wherever we can take suggestions there as well
>> Yep
>> Yeah so I think that will do it for us today thank you for listening this has been a little longer than our normal podcasts but this has been you know a really solid resource that we've been able to chew through and we wanted to do it justice so thank you Cam for hanging out with me I appreciate it
>> Always man always
>> And thank you again for listening if you would like to find out more about Haskell Weekly please go to our website HaskellWeekly.news and once you're there you can subscribe to the newsletter which goes out every week or you can subscribe to this podcast which comes out either every week or about once a year depending
>> Yeah and you know and and also feel free to follow us on Twitter you know
>> Twitter Reddit everywhere
>> We are everywhere
>> Alright thanks so much y'all happy hacking
>> Happy hacking