page title icon Generally Accepted Data Modeling Principles

Click to learn more about author Thomas Frisendal.

What can data
modelers learn from accountants?

Accounting is a solidly established practice that the world
cannot live without. One of the established guidelines for accountants is
called GAAP (Generally Accepted Accounting Principles in the US), and there are
similar international setups. You might guess these standards are about rules,
but actually, accounting is much more than that. It contains some very well
thought out best practices and some bright concepts about keeping records. In
fact, there is a lot to learn (never thought that I should say that). 😊

Let’s start with one of the best-kept secrets of the trade.

Double-Entry Bookkeeping

Management of critical resources is business as usual for accountants and has been so for around 2000 years. Quoting Wikipedia: “In 70 AD, Pliny the Elder described the structure of the “Tabulae Rationum” as “on one page all the disbursements are entered, on the other page all the receipts; both pages constitute a whole for each operation of every man.” This is known today as double-entry bookkeeping, and it is still at the core of accounting. Here you see a posting of the purchase of a machine on January 31:

All transactions are booked on two accounts. No corrections
are allowed. If you made an error, you would have to make a new double-entry
transaction with the correction amount (difference) and book that. It is
basically an error detection and prevention best practice.

Supplemented with the journaling best practice, it is also quite powerful and can really explain what has happened, including error corrections, etc.

You might very well, as I did recently, get the idea that a
ledger with a journal is a specialized database designed to keep organizations
alive and kicking, at least from a financial point of view.

Data Model

What, then, does the data model involved look like? Here is a simplified concept map:

Image Source: Thomas Frisendal

This simple conceptual data model, coupled with simple
functionality, a.k.a. accounting programs, is the machine room of financial
governance. So, at this level, what you see is sort of a three-dimensional data
model with credit account, debit account, and date/time as the dimensions and
having two fact tables: postings and journal entries. The accounting
functionality is based on the semantics just described above.

But there is more.

The Chart of Accounts

Here is a textbook chart of accounts:

The Chart of Accounts (example above) seems to be just a boring hierarchical account structure. And that it is. But you can also look upon it as a semantic model of a domain covering financial governance. Accounts live in one of the five contexts:

1. Assets: values that the legal entity owns
2. Liabilities: debt that the legal entity owes to someone
3. Equity: keeping track of the owner(s)
4. Revenue: keeping track of the income
5. Expenses: keeping track of the operating expenses

This can be thought of as just another dimension, but in
reality, there is more going on: Each and every transaction is posted on two accounts,
right? Each account belongs to one of the five “subdomains” above. Just a few
examples:

  • A sales transaction involves a revenue account
    (e.g., an unknown online customer or Thomas Frisendal over in Denmark) and an
    asset account (PayPal account or asset account of receivables)

    • A paycheck involves an expense account (payroll
      of the employee) and an asset account from where the money is withdrawn (e.g.,
      a bank account), or the payment could be some days in the future, in which case
      the other account would be a liability (account of payables)

This is called maintaining the Accounting Equation (Assets=Liabilities
+ Owner’s Equity), and it is the foremost reason for the double-entry
accounting principles.

The fact that each transaction typically spans two “subdomains” means that the account’s domain really is a mesh of interconnected things:

Image Source: Thomas Frisendal

The connecting arrows represent well-defined semantic
relationships, defining distinct business processes. In reality, there is not
just a single Chart of Accounts but a set of them working together to solve
both the “mechanics” of posting transactions and keeping track in general. Similarly,
there are also accounts that support management’s needs for keeping an eye on
things in need of supervision. This really is a many-faceted and complex
network of postings, transactions, and accounts. Fortunately, the semantics of
the models and of the best practices are well understood and mostly built into
the software that people use on a daily basis.

There are some clear parallels between double-entry
accounting and semantic networks: <debit>-<transaction
type>-<credit>, having accounts filling in the roles of subjects and
objects.

The Business
Perspectives

The basic structure just described supports many aspects of the
financial management of organizations and businesses. Let us cast a brief view
of the benefits of organizing things these ways.

In order to be certain that I understand current trends in
the domain, I bought the book “The Finance Book” by Stuart Warner and Si
Hussain (Pearson, 2017). The following is not a review or a  writeup of the book. What I will highlight are
my observations as I read it. If you need to improve your understanding of
financial management and accounting, this is the book you could well begin
with.

The organizational structure of a finance department is
pretty standard across many organizations and involves accounting, treasury,
and management accounting. Their work is performed using basically the same
kinds of software that contain concepts like:

  • Ledgers (including a nominal ledger)
    • Balances
    • Journals
    • Ordinary as well as statutory accounts
    • Built-in controls including (among many) reconciliations, audit trails, and (internal and external) auditors
    • Governance and whistleblowing best practices

There are national and international standards for
accounting and reporting in place. Here are the two most important ones:

  • Generally Accepted Accounting Principles, GAAP
    (US)

    • International Accounting/Financial Reporting
      Standards, IFRS (related to IAS, International Accounting Standards)

The standardization could be improved towards truly
internationally comparable standards. There are regulations and reporting
requirements all over the globe, and, except for tax havens, countries are
pretty open about financial information of registered companies and entities.

Management Accounting “Packs” are used in all kinds of
businesses for reporting and typically contain things like:

  • Key performance indicators
    • Period and cumulative positions

In short, we are looking at mature practices for keeping
track of what many consider the most important thing: the financial situation
of shareholders’ investments. There are some “shipwrecks,” but the vast
majority of organizations and companies are very good at guarding the finances.
And the business people and owners do not question these practices; they
actually take them for granted and appreciate their benefits.

People should learn from this.

Which Other
Business-Critical Functions Are Doing Equally Well?

Looking across larger organizations and businesses: If you
ask me, the information science practitioners are doing fine. People are
keeping track of information in many different government contexts (including
law enforcement, etc.) as well as business contexts within pharma, healthcare, manufacturing,
aerospace, etc. They have plenty of regulated and/or best practices, follow
strict procedures/best practices, and are supported by many technologies and
defined semantics such as:

  • Ontologies and taxonomies (many international standards — think Financial Industry Business Ontology as an example), mostly built upon the W3C standards, RDF and OWL
    • Tight regulations — think the US Food and Drug Administration, for example
    • Record keeping ISO standards (ISO 15489 Records management et al.)
    • Knowledge graphs
    • Specialized software built around semantics in general and the semantics of information science specifically

The ISO committee that maintains these standards offers this clarification:

“Records” is an English word
that is difficult to translate to other languages. Even in English,
it is used in different contexts with different meanings. An easy explanation:

1. Record = information
2. Not all information is a record > information created, received, and maintained as evidence and as an asset by an organization or person
3. When does an organization create records? > In pursuit of legal obligations or in the transaction of business

I am certainly not an information scientist, but I
understand the semantics, and I have been working with quite a few. I have not
seen “double-entry records posting” (could well exist, though), but there are some
very convincing journals, and users have access to them, not least for auditing
and compliance purposes.

So far, so good. All is well that ends well. But…

What About
Data Management?

Yes, what indeed? Accounting manages the important issue of having
sufficient finances. And information science handles the important issue of
having correct and sufficient information, for instance, the results of phase 3
clinical vaccine trials, just to mention one thing. And data is an important
container of information and, consequently, also of records.

In DBMSs, journals exist mostly for roll-back and
roll-forward. Why did we not think about the business relevance of journal
entries? They could well be expanded and made accessible (and persisted).

Data is also an important and business-critical issue — on a
level with finances and records, right?

You might argue that some effects of Data Management shine through in both financial and regulatory reporting, for example. And that is true. However, such accomplishments are not based on anything specifically built into Data Management tools (multidimensional and graph-based reporting tools being the closest contenders). Nothing like debit/credit/journal semantics is found in SQL-based DBMSs or in NoSQL products either, in general.

Wait a minute; I hear your say: Isn’t finance a special
application domain that is not on the DBMS level? Maybe so, but finance and
accounting are indeed very general, and the way accounting packages bundle the
posting and journaling of transactions keeping the accounting equation in
balance can be seen as just a thin layer on top of the DBMS.

Could we imagine other kinds of — very general —
functionality that could be handled in similar ways? If we put accounting
aside, what other general functional requirements are in dire need of being
handled better than in today’s DBMSs? (Be they SQL or graph or other, for that
matter). Keeping in mind that such functionality should address real business
requirements having considerable impact.

I can think of four, off the top of my head:

  • Temporality
    • Records management in general
    • Query transformation semantics

Let me explain.

Temporality

This beast is still annoying to many IT users. The simple
need is to be able to recall the contents of a database, either as it is or as it
was at some point in time in the past. And it is completely agnostic of any
business domains.

This is handled to a certain extent in SQL 2015 versioning clauses (system or user-based versioning), but the problem is more general than “bitemporal.” See “SQL Technical Report — Part 2: SQL Support for Time-Related Information” linked here (SQL:2015 standard, [TR 19075-02_Time_2015]) for more information.

Here is an example of an ”as-of” query based on a Microsoft SQL Server Systems Versioned Temporal Table:

(Downloaded from the MSDN documentation on 10-14-2020)

Until 2015, consultants stood on each other’s shoulders
building data vaults and anchor models and slowly changing dimensions and what
have you. And many of them still do.

I have dealt with these issues in a number of posts here on DATAVERSITY’s
site:

The bottom line is we can come a long way by adding an “upper-DBMS
level” of functionality and associated syntax to deal with:

  • Setting up timelines
    • Expanding queries that access timelines to deal
      with as-is/as-of style of logic in transparent manners

Much like “accounting DBMSs” (Microsoft Dynamics and what have
you) deal with double-entry bookkeeping and journals — keeping track of
time-related changes is basically journaling. And my proposals basically extend
the current offerings of SQL.

Records Management
in General

I mentioned this earlier as something that comes out of information science. But it is easily generalized. The ISO 15489-1: 2001 standard defines records management as “[the] field of management responsible for the efficient and systematic control of the creation, receipt, maintenance, use, and disposition of records, including the processes for capturing and maintaining evidence of and information about business activities and transactions in the form of records.” (Seen on Wikipedia).

Clearly, there is a huge overlap with temporality, as
applied above. Actually, I believe that temporal DBMS thinking can influence
Records Management. Likewise, Records Management can and should influence DBMS
thinking. Using double-entry bookkeeping and journaling as analogies, the
possibilities are unfolding nicely.

I would be deeply interested in helping incubate such ideas
into actual software solutions.

Query
Transformation Semantics

In my Learning from Complex Data Modeling Practices
post linked above, I also mentioned the efforts of the data vault modelers to
enable “insert only” practices in SQL databases. This is first and foremost
driven by performance concerns, but it also rimes well with temporality and
records management. I think there is potential here for generalizing it all
into DBMS functionality that looks like SQL as-is and/or as-of, also in SQL,
but with the cloak and dagger insert-only thing going on in the background. It
needs some consideration but can probably be done in elegant ways.

Semantic
Cognizance

In the temporal blog posts mentioned above, I also claim
that going to an atomic level, basically the 6th normal form, is
necessary to understand the full semantics of the data model. And that is true
for all of these “extended functionality” proposals. But, interestingly enough,
going to that level basically aligns you with the RDF databases’ triples
(subject-predicate-object). The data models in both camps are pretty close, and
this level is where you find many RDF to property graph mappers operating. And
property graphs map quite intuitively to normalized databases.

The most promising opportunity is embedding inference into the DBMSs, similar in many respects to what has been available to RDF/OWL users for many years. Inferencing involves both backward and forward reasoning, and you can think about it like this classic example starting out with two simple triples:

All persons are mortal.
Socrates is a person.

Then we can infer that (ergo) Socrates is mortal.

And that Socrates is not mortal is incorrect.

Reasoning capabilities like this have been available in RDF
for almost 20 years. It was not the panacea that some expected it to be, but it
has some excellent use cases.

Another benefit of improved semantic functionality would be that SQL and SPARQL came closer to each other — easing interoperability across platforms.

Cultivating New “Equations” and Creating Opportunities

Actually, what I am talking about is just a little added
functionality and associated semantics. But it is amazing what it has done for
accounting for over 2000 years. To recap — architecturally, it boils down to:

  • Double-entry as a general principle
    • Semantics as part of the DBMS declarations
    • No deletes or modifies but a lot of temporality

It is not just about making, for example, a more intelligent SQL or GQL (Graph Query Language in the works), but it is certainly also an exercise for raising the bar for database capabilities and benefits. A quest for raising the levels of quality, consistency, accessibility, and, in general, trustworthiness. Seems to me that in the RDF-space companies like Fluree and TerminusDB are thinking along these lines, but I have not seen SQL or graph DBMS’s going in these directions. Hopefully, the paradigms can all be explained as elegantly as the first mover, The Accounting Equation. Here are the five contenders:

Image Source: Thomas Frisendal

It is about time! Send your comments and suggestions to me
at info@graphdatamodeling.com,
and I will curate them into a follow up blog post sometime in 2021.

Plaats een reactie