How to store order history / invoices

Audit trails require a change of mindset. There are two approaches
that are commonly used, and both are successful within the bounds of
the issues that they are intended to deal with, in reasonably large
(terabyte plus) systems.

The approach that I tend to prefer is the polyphasic timeline or
"insert-only" approach, in which rows in the tables are only ever
inserted, never deleted or updated (except for the inactivated_at
column). This is accomplished by keeping "created_at" and
"inactivated_at" columns on every table. The appearance and business
effect of an update is accomplished by inactivating the existing row
and creating a new row. The appearance and business effect of deleting
a row is accomplished by inactivating the row. Rows that are currently
active are readily identified by "inactivated_at is null" in the sql.

create table Orders
(
    id integer not null,
    created_at timestamp default curent_timestamp not null,
    inactivated_at timestamp,
    other columns ...
)

create table OrderLineItems
(
    id integer not null,
    created_at timestamp default curent_timestamp not null,
    inactivated_at timestamp,
    parent_id not null references Orders(id),
    sequence integer not null,
    other columns ...
)

This mechanism is always implemented in the application code, and it
requires a bit of a twist to the mindset during design and coding.
Personally, I find that it simplifies matters but not all programmers
agree with me on that. The only enforcement possible in the RDBMS is
that no permissions are granted for delete on any table.

The other approach is to keep parallel transaction tables that snapshot
the information being audited, plus the transaction that was being
performed. So an Order table would have a parallel OrderTransaction
table that contained all of the columns of the Order table that were
important to audit, plus columns to say who, when, and what transaction
was performed.

Note that the foreign key relationship between transaction and baseline
tables are conceptual, but never made concrete in the database schema.
This means that the base business tables may be acted upon in any way
without interference by the RDBMS' referential integrity. You will
probably choose to create indexes

create table Orders
(
    id integer not null,
    other columns ...
)

create table OrderLineItems
(
    id integer not null,
    parent_id not null references Orders(id),
    sequence integer not null,
    other columns ...
)

create table OrderTransactions
(
    id integer not null, -- transaction id
    created_at timestamp default curent_timestamp not null,
    trans_type varchar(20) not null constraint
('INSERT','UPDATE','DELETE'),
    trans_by_user VARCHAR(20) not null,
    order_id not null,
    other columns cloned from Order ...
)

create table OrderLineItemTransactions
(
    id integer not null, -- transaction id
    created_at timestamp default curent_timestamp not null,
    trans_type varchar(20) not null constraint
('INSERT','UPDATE','DELETE'),
    trans_by_user VARCHAR(20) not null,
    orderLineItem_id not null,
    other columns cloned from OrderLineItem ...
)

The parallel transaction table schema may be easily implemented in the
trigger/stored procedure language of your RDBMS, and become transparent
to your application code.

Does this help?

Ben Johnson wrote:

I forgot to define the primary keys ...

create table Orders
(
    id integer not null,
    created_at timestamp default curent_timestamp not null,
    inactivated_at timestamp,
    other columns ...,
    primary key (id, created_at)
);
create index order_inactive_at on Orders (inactivated_at);

create table OrderLineItems
(
    id integer not null,
    created_at timestamp default curent_timestamp not null,
    inactivated_at timestamp,
    parent_id not null references Orders(id),
    sequence integer not null,
    other columns ...
    primary key (id, created_at)
);
create index order_inactive_at on Orders (inactivated_at);

If you can reproduce the state of the order at any point in its history, no
matter how many or what changes the user made, the model is sound.

In the exampl DDL, I have made foreign keys explicit in the business
tables. "column type references table(col)" in the create table DDL is a
shortcut for "create foreign key on table1(col) references table2(col)" in a
separate DDL command.

However, in transaction tables you need to record events that would cause some
RI mechanisms to delete the transaction rows if explicit foreign keys were
used. The transaction tables should be indexed, but probably not explicitly
tied to the foreign key. For reporting purposes, there is no difference
between well-defined indexes and explicitly defined foreign key
relationships.

I shy away from explicit foreign keys for transaction tables because, with
explicit foreign keys, you have referential integrity and its side effects,
which vary between RDBMS'. Creating the necessary indexes for efficient
reporting is good and predictable, and in many cases more efficient than
foreign keys.

I hope this helps.
David Johnson