Strong Relationships

close-up-photo-of-two-person-s-holding-hands-1667849.jpg

One of my favorite parts of working with data is being able to start looking at a new set of data, a new domain altogether, and start piecing together the logical data model of how it all fits. Even in life, as a situation unfolds itself to me, I often find myself imagining the logical data model - how the various pieces and parts come together to construct a situation, or how a business might be running behind the scenes when I’m on the phone with their front-line Customer Service Representative.

As a result, when I’m working with clients, one of the things I really enjoy putting together with them is a logical view of all the data that we’re being made aware of. And almost without fail, as we look over what has been constructed, a question will arise along these lines: well, why isn’t there a relationship between this and that?

Those moments are fun. For me, at least, they’re fun. Two things are happening in moments like that: first, we’re testing the model, which means that we’re at a point where we largely understand and agree on the shape of the model writ large, so we’ve achieved something. That’s good. Second, we’re finding out whether or not our relationships that we’ve drawn into the model are all strong relationships.

I don’t know that strong relationships versus weak relationships are necessarily a concept that is taught in textbooks about data engineering, but it’s a concept that I have when I look at a data model or construct one: some relationships between entities are weak, and others are strong, and we should endeavor to replace weak relationships with strong relationships wherever we can. Somewhat counterintuitively, this results in relationships between some entities that are a bit more indirect - but much stronger.

I’ll explain.

A question that might come up - for example - would be why don’t we have a relationship between the customer and the state they live in? Well, sure. We can build that relationship into our logical data model. It isn’t that it doesn’t exist. I live in Ohio. Bill lives in Illinois. I can construct that relationship on the logical model. But in most cases, I’d consider that a weak relationship (unless, somehow, the state of residence is the only information I have about the customer). It’s a weak relationship because the connection between the customer and the state isn’t really the primary relationship. The customer lives at an address. That address is located in a state. In fact, somewhat more correctly, that address is located in a city or a zip code within the state. The relationship between customer and the postal address is a much stronger relationship than the relationship between the customer and the state.

Similarly, if I’m building a model for airline travel, I could build a relationship between a passenger and the manufacturer of the planes on which that passenger has traveled or is going to travel. But that would be a weak relationship - the stronger relationships are between passenger and flight, flight and plane, and plane and manufacturer. It’s more indirect, perhaps, but each individual relationship is much stronger and is much more likely to have reliable sourcing.

As we get started on another year soon, I encourage us all to think about strong relationships and to implement them in our data models. That, and eating healthier, exercising more, and driving more cautiously. It is New Year’s, after all.

Previous
Previous

2020 New Year’s Data Resolutions

Next
Next

Data as Water