What Remains Of The Relational Model If We Drop The Relvars?

Abstract

This article discusses what will remain from the relational data model if we leave out individual relational variables.

The resulting hypothetical data model preserves some properties of the relational one and seems to feature simplified, more user-friendly, incremental data scheme design process and may probably be suitable for home applications and quick prototyping of classical relational databases.

The work is inspired by [C. J. Date].

The author has also been certainly influenced by Datalog query language.

Preface

All the statments made in this article about the proposed table-less data model are of a sketchy nature and implications are not clear yet. It is unknown if similar data model has already been discussed or prototyped. If not (or not enough) further research and work and building prototypes may be possible to find out it usefulness/uselessness. The author is more then willing to receive any possible feedback.

History

[C. J. Date] in Section 3.4 says approxmately

... a relation header represents a predicate

It used to be a grand problem for the author of this paper to understand this until finally he hadn't concluded that it would better be read this as

... a relation header defines a predicate type

(where predicate type is a relation header in application to a predicate)

But meditation over this puzzling abstract has also given burth to another idea.

The Idea

Intepretation of a relational database assigns meaning to relational variables.

What if we assign meaning to relation headers?

In practice this means dropping relational variables at all and making the whole database one large typeless variable. Data will be stored in the form of tuples each tuple as before being a set of <attribute-name, attribute-value> pairs. attribute-names become global to the database as well as the attribute-name to attribute-domain mapping.

Example 1.

Relational database table Loves

Subject	Object
'John'	'Jane'
'Silvia'	'Jack'

Table-less form of recording the same information

Tuples stored in the database
Loves='John', Loved='Jane'
Loves='Silvia', Loved='Jack'

Details

Mapping Data Models Onto Each Other

Given a relational database we

create attributes with names of the form V.A for all attributes of all relvars in the original database, where V is the name of the relvar and A is the name of the attribute
store data into the table-less database only in full tuples corresponding to tuples of the relvars of the orginal database

Given a table-less database we generally should be able to create a corresponding relational database but two important moments should be considered:

we may have to create a very large number of tables (depending on the number of types of tuples that are present in the original database)
dependencies between predicates storable in a table-less database (see the section called “Hide And Seek With Predicates And Relations” may be impossible to model in a relational database.

Relations And Predicates

Relational variable is a storage unit in a relational database. Consistency constraints and queries are expressed either in the form of relational algebra operations over the values of these relational variables or in the terms of predicate calculus. The latter operates on predicates implicitly backing up these relational variable. ^[1]

Similar facilities are available in the table-less data model.

loves='John', loved='Jane', how='passionately'

loves='Silvia', loved='Jack'

while the next one is

loves='Jahn', loved='Jane'

loves='Silvia', loved='Jack'

Empty tupset is uniform by definition.

Uniform tupset is just a classical relational value.

Projection

Let us define the projection operation: projection of a tupset over attribute set {a_1,a_2,...a_n} is the result of the following steps:

choose only those tuples from the tupset that have values for all of the {a_1,a_2,...a_n} attributes
in every tuple exclude all other attributes except {a_1,a_2,...a_n}

As we always operate on sets of tuples, exclusion of repeating tuples is implied on every step where it is necessary.

We can apply the projection operation to the whole database or to any intermidiate tupset. Even if the source tupset is not a uniform the result of projection is guaranteed to be uniform.

All the relational algebra operations are performable over uniform tupsets (aka relvals), projections are uniform tupsets, so it should be easy to prove that the DML we're sketching now for our table-less data model will be relationally full.

Hide And Seek With Predicates And Relations

Given some table-less database with N attributes defined we can build 2**N-1 ^[2] subsets of the whole attribute set and respectively build 2**N-1 projections of the contents of the database.

These are 2**N-1 relations that may be considered to be stored in the table-less database. They are a close analog of the relational variables from the relational database world.

These are also the 2**N-1 implicit predicates of our table-less database.

We shall call them stored relations and stored predicates further on.

These stored predicates/relations are naturally named and denoted by their attribute sets. The relation header (also called predicate type in this paper) coming from the relational data model has become the name of table-less relvars and their implicit predicates analog.

These relations/predicates are not independent however, the following rule holds:

{name_1=val_1, name_2=val_2,... name_n=val_n} ⇒ {name_i_1=val_i_1, name_i_2=val_i_2,... name_i_m=val_i_m}

where {i_1,i_2,...i_m} is any subset of {1,2,...n}.

While in relational data model consistency constraints are associated to relvars in the table-less data model they may be associated to the stored relations/predicates.

All kinds of constraints known for relational databases should be possible, while additional kinds are also foreseen. For example, we may prohibit to store a tuple of the form

loves='John'

without storing the tuple

loves='John', loved='Jane'

These 2**N-1 implicit predicates of the database are also the entity that informal predicates may be associated to.

Functional Dependencies, Unique Keys And Normalization

It should be possible to specify consistency constraints similar to unique key constraints in relational data model. Of course it may be done by a formal predicate of the following sort:

{'pc1'=x1, 'pc2'=x2, 'v'=y}, {'pc1'=x1, 'pc2'=x2, 'v'=z} ⇒ y==z

where x1, x2, y and z are variables (implicitly universally quantified) and pc1, pc2, v are constants denoting the names of attributes; this constraints says that in the {pc1,pc2,v} relation pc1,pc2 pair serves as a unique key.

But we may also go for games of another sort. Imagine we have system attributes named 'attr' and 'func_depends_upon' and 'mult_depends_upon'. Imagine we use these attributes to store functional and multiple dependecy meta-information directly into the database (this becomes part of the catalog). Let for example the following information about attributes name 'v1','v2' and 'v3' be stored:

{'attr'='v3','funct_depends_upon'='v2'}

{'attr'='v2','funct_depends_upon'='v1'}

That is we have a dependency chain: v1->v2->v3. Then, whenever user requests to insert into the database tuple of the form

{'v1'='a1', 'v2'='a2', 'v3'='a3'}

two tuples are inserted instead:

{'v1'='a1', 'v2'='a2'}

{'v2'='a2', 'v3'='a3'}

{'v1'=x, 'v2'=y}, {'v2'=y, 'v3'=z} ⇒ {'v1'=x, 'v2'=y, 'v3'=z}

(again, x, y, z are universally quantified variables.)

This is what may be called “automatic normalization”. As for any other part of this paper implications of the proposed ideas are note clear yet.

Derived Relations

The design of the table-less data model certainly implies having duductive-database style features to define derived relations (aka VIEW in SQL). A Datalog style mechanism [Ceri] will certainly fit well into the model.

DDL, DML

Any of these has not been designed yet for the model, but possibly the DML could have a Datalog style subset ^[3] [Ceri] for defining derived relations, consistency constraints, and probably database queries.

The author also feels that probably no special DDL will be needed if all the manipulations will be performable by updating the catalog of the database via regular DML operations.

As the original Datalog does not provide any update operations it has not yet been decided on what the whole DML for the model might look like. The question is wide open (as many others too).

Questions Open

Great many thins are unclear with the model. It is anticipated that it may have both underwater (unevident) and above-water (evedent) rocks to wreck any further advance.

What currently is mostly unclear is how to organize inserting and deleting tuples from the database. For example, if there is already a

{'n1'='v1', 'n2'='v2'}

tuple in the database, and a new

{'n1'='v1', 'n2'='v2', 'n3'='v3'}

tuple is being inserted, should the original one be elimiated? What if later the

{'n1'='v1', 'n2'='v2', 'n3'='v3'}

tuple is removed, should then the original tuple be restored?

Expected Benefits

It is possible that users will find it easier to design data schemes in the new data model because

there are less identifiers - designers required to invent only attribute identifiers, not table ones
there is less structure - no tables with table-less data model ^[4] - sometimes clay is better then marble
the design process is incremental - attributes can be added one-by-one
less normalization concerns - provided that dependency information is entered into the catalog it looks possible to provide automated nomalization support - see the section called “Functional Dependencies, Unique Keys And Normalization”

Possible Applications

It is possible that DBMS built according to this technology may be good for data management applications that

deal with weakly structured data, ^[5] or a rapidly evolving and mutating data scheme
have the data scheme built or maintained by inexperienced users not specializing in IT

Here is a list of some (possibly overlapping) application domains each meeting several these criteria:

organizing home or personal archieves
catalogues of all kinds, for example annotated file archieves with structural search capabilities
taxonomies and classifications - in biology, chemestry, genetics
storing scientific experimental data
prototyping - one possible scenario is to design classical realtional database by careful examination of table-less database created and tested by application domain experts

Implementation Considerations

Performance And Maintenance

While the relational model did imply at least a naive implementation technique (one relvar - one file or file fraction) implementations of table-less data model will need to carefully guess the physical data schema if one is not advised by the user (for example because the use is not an IT professional).

It is anticipated that either direct setup of phisical data schema by user or adaptive DBMS self-tune up will significantly influence performance with medium and large sized databases.

However at small and tiny data volumes even the most naive implementations are expected to deliver acceptable performance.

Choice Of Lock Objects For Concurrent Access

Concurrent data access implies acquiring implicit read and write locks for data.

The relational data model provides a good choice of objects that may be used for locking: relvars, tuple implementation-dependent groups (pages), individual tuples.

Realizations of table-less data model will need special efforts to develop an adequate locking model.

Future Work

This article will be published as https://tagunov.tripod.com/table-less/

Future actions may include:

finding out if similar ideas have aready been discussed
setting up a forum and/or mail list (where is it best to do so?)
public discussion
definition of insert/delete behavior
sketching a DML
setting up an open-source project if there is interest
developing a naive prototype
inventing a name better then “table-less” both for the data model and possible future prototype

the order of these actions is insignificant ;-)

Currently the author would like to thank everybody for the attention and ask to send him any feedback possible! :-)

Brief Glossary

Relational Database Terminlogy

After [C. J. Date]:

relation header

relation type

a set of <attribute-name, attribute-domain> pairs.

attribute-domain is INT, CHAR, user-defined-type, etc.

tuple

a set of <attribute-name, attribute-value> pairs.

relational value

relval

a combination of a relation header and a finit set of conforming tuples, called relation body

relational variable

relvar

a durable mutable database entity, such as SQL table, with an associated relation type that can be assigned a relational value of the same type

alternatively - a relational algebra expression (or an equivalent) over other relational variables, VIEW in SQL

three following entitites may be (or always are) associated with a relvar in a relational database:

implicit predicate
informal predicate
formal predicate

- similarly named but deeply different in nature.

relational database

a set of relational variables

Database Interpretation

database interpretation

a mapping from the space of database states to the space of sets of propositions about real-world objects

relational database state

set of values of all the relational variables in the relational database

relational database interpretation

a mapping for every element of every attribute domain used to a distinct object of the real world
an informal predicate associated with every relational variable

given a relational database interpretation any database state is interpreted in the following way: every tuple of every relational variable is mapped via the relvar's informal predicate to a proposition about real world objects

After [C. J. Date]

informal predicate

a statement (that is either true or false) about objects of the real world that the attributes of the relational variable map to

the meaning associated to a relvar

After [C. J. Date]

Consistency Constraints

formal predicate

for a relational variable - conjunction of all consistency constraints local to this relvar: unique keys, etc
for a relational database - conjunction of all consistency constraints defined for the database: individual relvar constraints, foreign keys, other inter-relvar constraints, etc.

formal predicate is the statment that all these constraints are satisfied

probably expressed as a logical formula (WFF) over the implicit predicates of the database relational variables [C. J. Date]

Additional Relational Database Terminology

This paper uses a couple of non-standard terms:

predicate type

same as relation header but in application to a predicate

implicit predicate

a logical function defines and is defined by the set of argument combinations for which it is true

predicate (as a logical function) and relation are dual views of the same entity

this paper calls the predicate implicitly associated with a relational variable its implicit predicate

this predicate changes as the value of the relvar changes

implicit predicate may be used to formulate formal predicats or to query database in a predicate caluculus-based DML [C. J. Date]

although not done in this paper, if needed one might also speak about an implicit predicate of relational value; then the implicit predicate of relational variable would be the implicit predicate of its current value

References

[C. J. Date] C. J. Date. An Intorduction to Database Systems, 7th Edition Addison Wesley

[Kleene] Stephen Cole Kleene Mathematical Logic John Wiley & Sons, Inc. 1967

[Ceri] Ceri S., Gottlob G., Tanca L. Logic Programming adn Databases New York, N.Y.: Springer-Verlag, 1990

^[1] SQL is known to have both relational algebra and predicate calculus features.

^[2]2**N-1 because we're not interested in the empty attribute set. The ** operation here denotes raising to power, notation probably coming from BASIC :-)

^[3] Except that the author fills more comfortable with having the target on the right of the expression :-)

^[4] Grouping attributes by tables is not only about the normalization: imagine that we have functional dependencies a->b, a->c, then we may group them either as

Table ab

a	b

Table ac

a	c

or as

Table abc

a	b	c

^[5] Dealing with “weakly structured data” here stands for storing diverse interrelated data entities each characterizable in a relational fashion but such that the set of attributes significantly changes from one to another and is generally unpredictable.