Can you use types with Data Orientation?

Are types compatible with data orientation? The short answer is 'yes'. Types trade freedom of movement for clarity.

Transcript

Eric Normand: Can you use types with data orientation? By the end of this episode, I hope to clarify this, because data orientation is often associated with un-typed languages like Clojure or JavaScript. My name is Eric Normand, and I help people thrive with functional programming.

The example I gave in the last episode, which you should listen to if you haven't, they're all JSON, un-typed. I did mention that there was a change of types. We started with byte stream, we moved up to character string, and then we parsed that into JSON.

After that, there was no change of types. This is very common in something like JavaScript, where you get some JSON. Once it gets parsed, you leave it as JSON. You're just working in the JSON until you finally serialize it back out or something.

The question comes up, can you use types with data orientation. I think the answer is yes. Short answer, yes, indeed you can.

There is a tradeoff, and tradeoffs are rather nice. They are a decision you get to make. Let's talk about that tradeoff.

I like Clojure. I use that most of the time. In Clojure, which is un-typed, we use maps for a lot of things. Maps, we use all the basic data types, your numbers, your strings. We have things called vectors, sets. We use these to model our data.

We typically don't make new types. Sometimes, you do. Sometimes, you have to. Typically, we don't.

We got maps with some keys and values in them to represent, say, an entity like a person. We're just using a map.

Sometimes, you're getting that map over the wire. It's coming in as JSON or some other format. Then you parse it into a map or data. Once you got it as a map, that's it.

You're leaving it as a map. You're transforming it. You're adding keys. You're removing keys. You're passing it around. You're doing all sorts of stuff, but it stays as a map.

Why do we do this? Before we get into the tradeoff, I think why we do it in Clojure — I've thought a lot about this — is that as a character string, as a string, before you parse it, it's really hard to work with. Let's say you had some JSON in a string. Instead of parsing it, you tried to figure out the person's first name while it's still a string.

You basically would have to parse it somehow, but you need to walk the characters and figure out a little custom parser to find the string inside of it that represents the first name. It's ridiculous. You do parse it into a map. In theory, it's wasteful because you're parsing everything, not just the first name.

Once it's a map, it's so nice to work with. It's so easy to do a bunch of things with it that weren't possible with that string. Now, it almost feels like there's no need to go any further. That's my explanation for why we don't make new types in Clojure. We're satisfied with maps, vectors, strings, and stuff.

It does let us do that thing that I talked about in the last episode, where you get to move up and down without even changing types. Once it's a map — it's a HTTP request, and now it's a user setting, now it's a set the password, all those things — we don't even have to change the type.

We're looking at different pieces of data and making decisions about it. That's our interpretation. That's how we're adding meaning to it, but that's dangerous. I want to call that out. That is dangerous. That is one of the main messes that I see in Clojure code bases.

Maybe it's the number-one problem that you don't know what level of meaning you're operating at anymore. You can treat a set user password request as a map, meaning you can add keys to it. You can remove keys from it, random keys.

You can iterate through the keys. You interpret this map as a collection of keys and values, and you give me all the keys from it. That's another way of interpreting it.

If it was a user request, that's not really a valid operation on the user request — give me all the keys. No, that's a valid operation way down like four levels down as a hash map.

I see this happening a lot where people get into messes, because they're playing so loose. They're abusing their freedom, I guess I should say, of moving freely between the levels. Too much freedom. They probably don't even realize they're doing it, because the levels are not so clearly articulated.

Types can provide a discipline for this at the extreme. If you wanted to be very type-safe, you would always change types when you're changing levels. You go from bytes, to characters, to JSON, to maps, hash maps.

If you want to pass through that, you probably have to if you're paring JSON. You go through some kind of JSON representation. Then when you notice it's an HTTP request, you convert that into a type called HTTP request.

When you look at the path to route it, you convert that into a user update request. Then when you look at what parameters, what attributes it's trying to change — it's trying to change the password, boom — you turn that into a password update request. You would have a new type for every level there.

Do Haskell programmers do this? Some do, some don't. Maybe they'll stop at HTTP request or maybe they stop one level more. They don't actually do that final interpretation step to where they really look at what specifically is being changed.

Maybe they don't do that, or maybe they're not aware that they're doing it, but it's not so dangerous to be able to move two levels. There is a tradeoff there. There's a spectrum that you get to choose where you're going to land on.

Types provide that discipline, and they make it a little bit harder to move between the levels. They make it much easier to know what level you're actually at right now. I have interpreted this HTTP request. I'm now at this other level.

It's not an HTTP request anymore. It is something else. It is this new type that has its own operations on it, that has its own interpretation.

That's what I have to work with. It might be hard to go back down. That might be OK. You might not need to go back down. That's fine.

You might not need us to step back and say, "I need to be able to turn this user settings change back into an HTTP request." You might not need to do that.

It could be lossy like, "I don't care what port this request came in on anymore. I don't care at this level of meaning." I won't be able to go back down, but that's fine. I don't want to go back down.

Both of them have advantages. Both of them have disadvantages. Without types, you've got way more freedom, but sometimes that freedom is rope to hang yourself with. I've seen a lot of Clojure systems that get real messy because of this.

They're treating something that's already been interpreted. It's already at a new level of meaning. They jump back a couple levels and treat it like a hash map again. Now, they're wondering, "Why am I confused about what this map is, what it has?"

That is the issue that Clojure programmers have. I'm sure JavaScript programmers, again, do the same thing when they're using JSON, when they're doing this data orientation. Types can help you with that.

At the expense of a little bit of freedom, a little bit of ease, you have to define these new types and all those things. A lot of times, that's exactly what you need. It gives you some safety there.

Like I said, the short answer is yes. Types are compatible with their data orientation, depending on the type system. In Haskell, the data is not hidden. It's not buried somewhere. Usually, you define a new data type, and it has certain fields in it. Those fields can be read by anything.

You're not hiding the data. There's no encapsulation with the protocol that you have to call to have a predetermined interpreter. You can still take this HTTP request and interpret it in multiple ways. You can route it. What kind of request is this? What is the intent of this request?

You can also log it and say, "What IP address is contacting me? What time is it?" All those things that you might want to do in an HTTP server, you can still do with that one piece of data interpreted in multiple ways.

In Clojure, we use maps a lot. They really are flexible, let you move up and down, but you might shoot yourself in the foot with them. Types don't let you move up and down so easily, but they give you a lot of safety. They make sure you know what level of meaning you are operating at.

The more you use those types, the more freedom you give up but the more safety you get. If you know exactly what levels you need, why not have some help from the compiler? That's what I'd say.

This might be the last data orientation episode I do for a while. I've done two more. If you want to find those and all the other past episodes, you can go to lispcast.com/podcast. There, you'll find audio, video, and text transcripts of all the prior episodes.

You'll also find links to subscribe — however you want to subscribe — and on social media. If you want to get in touch with me, if you have questions about this, disagreements — I love discussions. I love understanding how well I'm communicating these ideas.

That's all I have for this episode. This has been my thought on functional programming. I'm Eric Normand. Thanks for listening and rock on.