Friday, February 8, 2013

Building SOLID Databases: Liskov Substitution Weirdness

If the Open/Closed principle starts looking a little strange as applied to object-relational design, the Liskov Substitution Principle applies in almost the exactly opposite way in the database as in the applications.  The reason become only clear once exploring this principle, seeing the limits on it, and investigating the realities of database development.  The canonical examples of LSP violations in application programs do not violate the principle at all in the database.  As always the reason is that databases model different things than applications and so their constraints are different.

The Formal Definition of the Liskov Substitution Principle


Let  be a property provable about objects of type . Then should be provable for objects of type where is a subtype of .

The Liskov Substitution Principle in Application Programming


The Liskov Substitution Principle is important because it ensures provability of state and behavior for subtypes.  This means you have to look at the full constraints that operate on a class method (excluding constructors I think, which operate under a different set of constraints).

In general for the LSP to be satisfied, preconstraints cannot be weakened, post-constraints cannot be strengthened, and invariants must be maintained not only in inherited methods but also non-inherited ones (this is broad enough to include the history constraint).  The typical problem this addresses is the square-rectangle problem, which is a well-understood problem in object-oriented programming, and one which calls into question certain promises of OOP as a whole.

The square-rectangle problem establishes a problem of irreducible complexity in object-oriented programming.  The typical examples are either "A square is-a rectangle" or "A circle is-a elipse."  Mathematically both of these statements are true but implementing them in an object-oriented framework adds a great deal of complexity.  The reason is that the mathematical hierarchy has no concept of a change of state which preserves invariances, and so the object hierarchy ends up being complicated by the need to shim in what is notably lacking in the geometric hierarchy.  This leads to either an explosion of abstract classes, an explosion of object metadata (can this rectangle be stretched -- always 'no' for a square)?  Can it be scaled?), or a loss of the sort of functionality we typically want from inheritance.

On the database level, these problems mostly don't exist in relational designs but they do exist in object relational designs, only differently (more on that below).

The Liskov Substitution Principle in application programming primarily operates primarily to preserve assumptions regarding state changes in an object when an interface is called.  If a square (defined as a rectangle where X and Y measurements are the same) can be stretched, it is no longer a square.  On the other hand, if we can't stretch it, it may not fill the contract of the rectangle.

When A Square Is-A Rectangle Subtype


The square-rectangle problem can be turned on its head to some extent with the question of when this is-a relationship is valid.  In fact the is-a relationship is valid for immutable objects.  It is perfectly valid to have a class ImmutableSquare be a subclass of ImmutableRectangle.

Secondly the subtyping can be valid if sufficient information is attached to the class to specify the invariants.  For example we might have a class with attributes "can_stretch" and "can_scale" where setting can_stretch off (as it would always be in a square) ensures that length to width proportions are always preserved.  The problem here is that the subclass invariances require support in the superclass and this leads to a lot of complexity in implementing the superclass, as well as fundamental limits of what can then be subclassed.

A Look at Database Limitations


In a normal relational database, the LSP is always met.  Domain-level constraints only apply to storage and not to calculation outputs, for example.

In database systems where tables instantiate types and substitutability is not anticipated (Oracle, DB2), then the LSP applies in the same way it does in application programming.  In the cases of table inheritance (Informix, PostgreSQL), things start to get a little weird.

For the purpose of stored data, the LSP is generally satisfied because constraints are inherited and strengthened.  A subtype is simply a sub-domain of the parent type possibly with some additional attributes.

The Square-Rectangle Problem in Object-Relational Databases


The Square-Rectangle problem is only even a potential problem in object-relational databases.  Purely relational designs can never run into this issue.  In object-relational designs a few very specific issues can occur depending entirely on how object-relational functionality is implemented at the database level.  In databases like Oracle and DB2, the LSP applies pretty much as is.  In Informix and PostgreSQL, the problems faced are actually somewhat different and lead to new classes of anomilies that need to be considered.  These include most prominently update anomilies when parent tables are updated.  These do not necessarily have easy workarounds.

Consider the following table structure:

CREATE TABLE my_rectangle (
    id serial primary key,
    height numeric,
    width numeric
 );

 CREATE TABLE my_square ( 
    check (height = width)
 ) INHERITS (my_rectangle);

Now, we may insert a bunch of squares and rectangles, and we could have checks.  Moreover we could have custom triggers to verify referential integrity for rows referencing my_rectangle and my_square.

So suppose we use update on my_rectangle to double the height of every rectangle in the system:

UPDATE my_rectangle SET height = height * 2;

Oops, we can't do that.  While rows in my_rectangle will happily be adjusted, rows in the child table my_square will not, and you will get an error.  This error can be avoided through a few issues, each of which has problems:

  1. Disallow updates, only allow inserts and deletes.  This makes referential integrity harder to manage when the row must be deleted and re-inserted.
  2. Use a trigger to rewrite an update to be a delete plus insert into either the current or parent table depending on constraints.  This makes RI harder to manage in that the procedure MUST disable custom RI triggers before doing this.   This would require that the procedure be aware of all custom RI triggers in order to do this.

Conclusions


The Liskov Substitution Principle depends quite a bit on specifics of how an object-relational database system intersects tables and objects for how it applies.  In general in a purely relational design you will never need to worry about it, but in an object-relational design there are some nasty corner cases that can come up, particularly where a query may operate on heterogenous subtypes as a set.  This is, perhaps, the only one of the SOLID principles which is O-R specific and it hits the db in different ways than the application because the db  operates on different principles.

No comments:

Post a Comment