August 3, 2021
/
5 Minutes

Modeler’s Corner #2: Entity types in Data Modeling

Development
Data Modeling
Juha Korpela
Chief Product Officer, Ellie.ai

Modeler’s Corner is our series of blog posts on best practices and practical tips & tricks for all you Ellie modelers out there. The series focuses on everyday issues you might face as a Data Modeler. We aim to help you build the most informative, understandable, and efficient business-driven models with Ellie. For more comprehensive training needs, don’t hesitate to ask us!

Color theory – using entity types in Ellie

In our previous Modeler’s Corner post, we talked about how identifying correct entities to be used in your models is the vital starting point of all modeling efforts. Now that you know your entities need to be singular nouns, countable, existing in real life, and you’re not mixing them up with attributes, you’re already a long way into a great Ellie model – good job!

Having a  bunch of well-defined entities on the canvas gets you far, but Ellie has a nice trick up its sleeve for making your model more informative. You might’ve already noticed that the entity “boxes” on Ellie’s canvas sometimes have different colors. What’s that for?

Fancy colors and where they come from

Looking at an Ellie model in the “View” state, you’ll find the following menu from the lower-left corner of the canvas:

The explanation. Probably should call it “Legend”, actually…

This explains what the colors mean: they are entity types. You can easily switch the type of an entity when in the “Edit” state on the canvas by right-clicking an entity. But what do the types mean then, and where do they come from?

One of our company’s founders, Ari Hovi, has worked in Data Warehousing and Data Modeling for decades. He defined a set of best practices and methodologies that he had witnessed working in data projects over the years: this collection of practical wisdom was to be called the “Hovi Data Framework”, or HDF for short. Implementing the practices of HDF in real life, however, required a tool with which business-driven data modeling would be fast and easy. This method of bringing the business and IT people together to figure out the REAL structure of the information within the business processes was at the core of HDF.

After a long search and many almost-but-not-quite suitable candidates encountered (most of them far too technical to suit the communicative purpose), the decision was made: let’s build the tool. Enter Ellie!

HDF practices originally included the categorization of identified business concepts into three classes: Master, Transaction, and Contract. This was later extended to include Reference entities and the possibility to split Transactions into Transaction headers and Transaction details. This categorization was added to Ellie.

One thing to note before we finally get to the details of what those types mean. It’s perhaps helpful to think of the categorization here as a “style” or a best practice on top of the basic Entity-Relationship modeling. We’re not claiming this is an entirely new modeling method – it’s merely an addition, something that has been seen to improve the readability and usability of the models over years and years of projects. A few words about these benefits will come later; now we need to tackle the elephant in the room.

The colors, what do they mean?

Right! Let’s have a look at each of those entity types in turn. This is a small example model containing entities of various types:

Our example model

Master entities (black)

The Master type is probably the simplest one. Everyone has pretty much the same core Master entities: Customer, Product, Factory, Raw material… A Master entity is something stable and central to your business, of which you can (at least theoretically) maintain a kind of a register – you know exactly how many Factories you have. These are often reused over and over again in the various models of your organization. Note that assigning the Master type to an entity doesn’t have to mean that you have a fancy Master Data Management process in place for that particular data; it’s just that type of data.

Transaction entities (blue)

The Transaction entity is also very straightforward. You have an event that happens at a certain time: that’s a Transaction. Transactions normally cannot exist by themselves; they need to have Master- or Contract-entities linked to them. After all, there’s always something that is acting in the event. Typical examples of Transactions might include Invoices, Website visits, Bank transactions, etc., but you can also consider things like IoT measurement events Transaction entities.

Contract entities (pink-ish? or is it fuchsia?)

Now we are getting to the interesting part. Many of you might know star schema models; in those, you have the fact in the middle, and dimensions around it, say Sales transactions and Stores and Products. Those, of course, can be easily identified as Transaction types and Master types.

Back in the day, Ari Hovi recognized a pattern: there were cases where an entity seemed to be a bit of both at the same time. It wasn’t a stable Master entity, as you weren’t able to know beforehand how many you had; nor did it happen at a single point in time, but it existed for a period of time and then disappeared. In a star schema, these were problematic.

This was labeled the Contract type. The Contract type has one defining characteristic: it’s something that has a start time and an end time. The time between might be very long, like for Bank accounts, or shorter, like for Advertising campaigns; but the point is that these should be separated from both Master entities (more or less permanent) and Transaction entities (single point in time).

“Contract” is the name HDF gave to this type, but it doesn’t mean all of the entities in it need to be legal contracts. Many actual contracts, however, are usually modeled as Contract type, as they have a clearly defined period of validity!

Extended types: Reference (grey)

The Reference entity is something that we know of, we maybe have a list of possible values, but we’re not that interested in the entity itself. Think ZIP codes, countries, cities… All of these would be modeled as Reference entities. Usually, it doesn’t make sense to try to fit in all possible classifications and codes as separate entities. Rather, a good practice is to add such Reference entities that are very important to the business, and that might end up being dimensions of their own in a star schema model later in the pipeline. The rest can just be attributes.

Extended types: Transaction header & Transaction detail (shades of blue)

A picture tells more than a thousand words, so see below:

An updated version of the example model with Transaction header & detail separated. See how it affects the relationship to Product!


Whereas the basic Transaction in our example model was just the whole Sales order, we have now divided it into two:

  • the Sales order header, containing basic information like who made the order, when, and what’s the shipping address
  • the Sales order detail, containing information on the products ordered, their prices, maybe things like customization requests for individual items

This can be practically thought of as, well, the header information and individual lines in a paper order form.

Separating the Transaction header from the Transaction detail usually solves some issues you might otherwise have with many-to-many relationships. It might also be good for making the relationship structure simpler. However, it also makes the model’s entity structure a bit more complicated. There is a tradeoff here between getting the overall idea correct while masking some details (=use normal Transaction) and getting the details right while adding a bit of “fluff” (=use header & detail). On higher levels of abstraction, header & detail are rarely necessary.

Benefits of using the entity types

First of all, a model with the above-mentioned entity types set is easier to understand for everyone. You can instantly spot which ones are the Transactions and which ones are the possible “dimensions” or Master entities.

It’s also faster to create models this way, as you can e.g. start with a Transaction and ask “well who is doing the action here?” to find what sort of Master entities need to be added. There are various kinds of interesting patterns here which we will discuss in later installments of Modeler’s Corner, all based on how different entity types tend to behave in different situations.

An interesting use case for the entity types is Data Vault modeling. As you know (and as we will tell you in our webinar with Cindi Meyersohn on March 11th!), Data Vault modeling always needs to be based on business entities. Those you naturally model with Ellie. But as you identify the entity types, you are in fact adding extra information into the model which is very useful in the Data Vault context. When you’re moving forward to the logical models, you can create rules-of-thumb that will guide you to correctly implement a Data Vault scheme that fits the underlying conceptual model. For example, all Master entities usually should become Hub tables in Data Vault! This is of course dependent on how well you have identified the Business Key attributes of your entities, but some general rules can be derived.

So there we go, Ellie’s entity types explained! Now go ahead, release your inner Matisse, and make your models a bit more colorful.