How to store order history / invoices

Audit trails require a change of mindset. There are two approaches that are commonly used, and both are successful within the bounds of the issues that they are intended to deal with, in reasonably large (terabyte plus) systems.

The approach that I tend to prefer is the polyphasic timeline or "insert-only" approach, in which rows in the tables are only ever inserted, never deleted or updated (except for the inactivated_at column). This is accomplished by keeping "created_at" and "inactivated_at" columns on every table. The appearance and business effect of an update is accomplished by inactivating the existing row and creating a new row. The appearance and business effect of deleting a row is accomplished by inactivating the row. Rows that are currently active are readily identified by "inactivated_at is null" in the sql.

create table Orders (     id integer not null,     created_at timestamp default curent_timestamp not null,     inactivated_at timestamp,     other columns ... )

create table OrderLineItems (     id integer not null,     created_at timestamp default curent_timestamp not null,     inactivated_at timestamp,     parent_id not null references Orders(id),     sequence integer not null,     other columns ... )

This mechanism is always implemented in the application code, and it requires a bit of a twist to the mindset during design and coding. Personally, I find that it simplifies matters but not all programmers agree with me on that. The only enforcement possible in the RDBMS is that no permissions are granted for delete on any table.

The other approach is to keep parallel transaction tables that snapshot the information being audited, plus the transaction that was being performed. So an Order table would have a parallel OrderTransaction table that contained all of the columns of the Order table that were important to audit, plus columns to say who, when, and what transaction was performed.

Note that the foreign key relationship between transaction and baseline tables are conceptual, but never made concrete in the database schema. This means that the base business tables may be acted upon in any way without interference by the RDBMS' referential integrity. You will probably choose to create indexes

create table Orders (     id integer not null,     other columns ... )

create table OrderLineItems (     id integer not null,     parent_id not null references Orders(id),     sequence integer not null,     other columns ... )

create table OrderTransactions (     id integer not null, -- transaction id     created_at timestamp default curent_timestamp not null,     trans_type varchar(20) not null constraint ('INSERT','UPDATE','DELETE'),     trans_by_user VARCHAR(20) not null,     order_id not null,     other columns cloned from Order ... )

create table OrderLineItemTransactions (     id integer not null, -- transaction id     created_at timestamp default curent_timestamp not null,     trans_type varchar(20) not null constraint ('INSERT','UPDATE','DELETE'),     trans_by_user VARCHAR(20) not null,     orderLineItem_id not null,     other columns cloned from OrderLineItem ... )

The parallel transaction table schema may be easily implemented in the trigger/stored procedure language of your RDBMS, and become transparent to your application code.

Does this help?

Ben Johnson wrote:

I forgot to define the primary keys ...

create table Orders (     id integer not null,     created_at timestamp default curent_timestamp not null,     inactivated_at timestamp,     other columns ...,     primary key (id, created_at) ); create index order_inactive_at on Orders (inactivated_at);

create table OrderLineItems (     id integer not null,     created_at timestamp default curent_timestamp not null,     inactivated_at timestamp,     parent_id not null references Orders(id),     sequence integer not null,     other columns ...     primary key (id, created_at) ); create index order_inactive_at on Orders (inactivated_at);

If you can reproduce the state of the order at any point in its history, no matter how many or what changes the user made, the model is sound.

In the exampl DDL, I have made foreign keys explicit in the business tables. "column type references table(col)" in the create table DDL is a shortcut for "create foreign key on table1(col) references table2(col)" in a separate DDL command.

However, in transaction tables you need to record events that would cause some RI mechanisms to delete the transaction rows if explicit foreign keys were used. The transaction tables should be indexed, but probably not explicitly tied to the foreign key. For reporting purposes, there is no difference between well-defined indexes and explicitly defined foreign key relationships.

I shy away from explicit foreign keys for transaction tables because, with explicit foreign keys, you have referential integrity and its side effects, which vary between RDBMS'. Creating the necessary indexes for efficient reporting is good and predictable, and in many cases more efficient than foreign keys.

I hope this helps. David Johnson