Table of Contents
This article discusses what will remain from the relational data model if we leave out individual relational variables.
The resulting hypothetical data model preserves some properties of the relational one and seems to feature simplified, more user-friendly, incremental data scheme design process and may probably be suitable for home applications and quick prototyping of classical relational databases.
The work is inspired by [C. J. Date].
The author has also been certainly influenced by Datalog query language.
All the statments made in this article about the proposed table-less data model are of a sketchy nature and implications are not clear yet. It is unknown if similar data model has already been discussed or prototyped. If not (or not enough) further research and work and building prototypes may be possible to find out it usefulness/uselessness. The author is more then willing to receive any possible feedback.
[C. J. Date] in Section 3.4 says approxmately
... a relation header represents a predicate
It used to be a grand problem for the author of this paper to understand this until finally he hadn't concluded that it would better be read this as
... a relation header defines a predicate type
(where predicate type is a relation header in application to a predicate)
But meditation over this puzzling abstract has also given burth to another idea.
Intepretation of a relational database assigns meaning to relational variables.
What if we assign meaning to relation headers?
In practice this means dropping relational variables at all and making the whole database one large typeless variable. Data will be stored in the form of tuples each tuple as before being a set of <attribute-name, attribute-value> pairs. attribute-names become global to the database as well as the attribute-name to attribute-domain mapping.
Given a relational database we
create attributes with names of the form V.A for all attributes of all relvars in the original database, where V is the name of the relvar and A is the name of the attribute
store data into the table-less database only in full tuples corresponding to tuples of the relvars of the orginal database
Given a table-less database we generally should be able to create a corresponding relational database but two important moments should be considered:
we may have to create a very large number of tables (depending on the number of types of tuples that are present in the original database)
dependencies between predicates storable in a table-less database (see the section called “Hide And Seek With Predicates And Relations” may be impossible to model in a relational database.
Relational variable is a storage unit in a relational database. Consistency constraints and queries are expressed either in the form of relational algebra operations over the values of these relational variables or in the terms of predicate calculus. The latter operates on predicates implicitly backing up these relational variable. [1]
Similar facilities are available in the table-less data model.
Set Of Tuples (possibly with different attribute sets) is the main data entity in the table-less data model (like relational value is the main data entity in the relational data model).
As this concept will be so common in the following discussion let us use tupset as the abbrevation for “set of tuples”.
Let us call the set of attributes present in a tuple the tuple type.
Mapping of attributes to domains is global to the database and therefore not included into the tuple type (in contrast to relation type in the relational data model).
The whole database is just a single variable that has an almost arbitrary tupset as its value.
Just like a relational DBMS our table-less DBMS needs a catalong as a way to query and possibly modify metadata such as
association of domains to attributes
consistency constraints (unique keys, foreign keys, etc.)
information on user-defined domains (?)
security information
Like a catalog of a relational database that is visible to users as a set of special “system” relvars a catalog in table-less database should probably be visible as a set of “system” attributes that are used to build “system tuples”.
Probably user shouldn't be allowed to store arbitrary tuples with “system” attributes to the database. Probably there should be a predefined fixed number of “system” typle types and “system” attributes shouldn't be allowed to be stored into the database as part of any other tuple types.
If a consistency constraints system able to enforce similar limitations over user-defined attributes will be built into the data model, then the limitaions imposed on “system” attributes would be visible (but not modifiable) via the catalog too.
On the other hand at the current time it is not clear to the author if such limitation should be imposed on system attributes at all. It may prove usefull to attach special meaning to certain tuple tupes but not limit the user in other usages of system attributes except the key uniquness consistency constraints.
Let us say that a tupset is uniform iff the sets of attributes in all the tuples of the tupset are the same (tuples contain exactly the same attributes). For example the following tupset is not uniform:
loves='John', loved='Jane', how='passionately' |
loves='Silvia', loved='Jack' |
while the next one is
loves='Jahn', loved='Jane' |
loves='Silvia', loved='Jack' |
Empty tupset is uniform by definition.
Uniform tupset is just a classical relational value.
Let us define the projection operation: projection of a tupset over attribute set {a_1,a_2,...a_n} is the result of the following steps:
choose only those tuples from the tupset that have values for all of the {a_1,a_2,...a_n} attributes
in every tuple exclude all other attributes except {a_1,a_2,...a_n}
As we always operate on sets of tuples, exclusion of repeating tuples is implied on every step where it is necessary.
We can apply the projection operation to the whole database or to any intermidiate tupset. Even if the source tupset is not a uniform the result of projection is guaranteed to be uniform.
All the relational algebra operations are performable over uniform tupsets (aka relvals), projections are uniform tupsets, so it should be easy to prove that the DML we're sketching now for our table-less data model will be relationally full.
Given some table-less database with N attributes defined we can build 2**N-1 [2] subsets of the whole attribute set and respectively build 2**N-1 projections of the contents of the database.
These are 2**N-1 relations that may be considered to be stored in the table-less database. They are a close analog of the relational variables from the relational database world.
These are also the 2**N-1 implicit predicates of our table-less database.
We shall call them stored relations and stored predicates further on.
These stored predicates/relations are naturally named and denoted by their attribute sets. The relation header (also called predicate type in this paper) coming from the relational data model has become the name of table-less relvars and their implicit predicates analog.
These relations/predicates are not independent however, the following rule holds:
{name_1=val_1, name_2=val_2,... name_n=val_n} ⇒ {name_i_1=val_i_1, name_i_2=val_i_2,... name_i_m=val_i_m}
where {i_1,i_2,...i_m} is any subset of {1,2,...n}.
While in relational data model consistency constraints are associated to relvars in the table-less data model they may be associated to the stored relations/predicates.
All kinds of constraints known for relational databases should be possible, while additional kinds are also foreseen. For example, we may prohibit to store a tuple of the form
loves='John'
without storing the tuple
loves='John', loved='Jane'
These 2**N-1 implicit predicates of the database are also the entity that informal predicates may be associated to.
It should be possible to specify consistency constraints similar to unique key constraints in relational data model. Of course it may be done by a formal predicate of the following sort:
{'pc1'=x1, 'pc2'=x2, 'v'=y}, {'pc1'=x1, 'pc2'=x2, 'v'=z} ⇒ y==z
where x1, x2, y and z are variables (implicitly universally quantified) and pc1, pc2, v are constants denoting the names of attributes; this constraints says that in the {pc1,pc2,v} relation pc1,pc2 pair serves as a unique key.
But we may also go for games of another sort. Imagine we have system attributes named 'attr' and 'func_depends_upon' and 'mult_depends_upon'. Imagine we use these attributes to store functional and multiple dependecy meta-information directly into the database (this becomes part of the catalog). Let for example the following information about attributes name 'v1','v2' and 'v3' be stored:
{'attr'='v3','funct_depends_upon'='v2'} |
{'attr'='v2','funct_depends_upon'='v1'} |
That is we have a dependency chain: v1->v2->v3. Then, whenever user requests to insert into the database tuple of the form
{'v1'='a1', 'v2'='a2', 'v3'='a3'} |
two tuples are inserted instead:
{'v1'='a1', 'v2'='a2'} |
{'v2'='a2', 'v3'='a3'} |
Similarly, the following “joining implication” is considered to be true (and used when evaluating any requests, etc):
{'v1'=x, 'v2'=y}, {'v2'=y, 'v3'=z} ⇒ {'v1'=x, 'v2'=y, 'v3'=z}
(again, x, y, z are universally quantified variables.)
This is what may be called “automatic normalization”. As for any other part of this paper implications of the proposed ideas are note clear yet.
The design of the table-less data model certainly implies having duductive-database style features to define derived relations (aka VIEW in SQL). A Datalog style mechanism [Ceri] will certainly fit well into the model.
Any of these has not been designed yet for the model, but possibly the DML could have a Datalog style subset [3] [Ceri] for defining derived relations, consistency constraints, and probably database queries.
The author also feels that probably no special DDL will be needed if all the manipulations will be performable by updating the catalog of the database via regular DML operations.
As the original Datalog does not provide any update operations it has not yet been decided on what the whole DML for the model might look like. The question is wide open (as many others too).
Great many thins are unclear with the model. It is anticipated that it may have both underwater (unevident) and above-water (evedent) rocks to wreck any further advance.
What currently is mostly unclear is how to organize inserting and deleting tuples from the database. For example, if there is already a
{'n1'='v1', 'n2'='v2'} |
tuple in the database, and a new
{'n1'='v1', 'n2'='v2', 'n3'='v3'} |
tuple is being inserted, should the original one be elimiated? What if later the
{'n1'='v1', 'n2'='v2', 'n3'='v3'} |
tuple is removed, should then the original tuple be restored?
It is possible that users will find it easier to design data schemes in the new data model because
there are less identifiers - designers required to invent only attribute identifiers, not table ones
there is less structure - no tables with table-less data model [4] - sometimes clay is better then marble
the design process is incremental - attributes can be added one-by-one
less normalization concerns - provided that dependency information is entered into the catalog it looks possible to provide automated nomalization support - see the section called “Functional Dependencies, Unique Keys And Normalization”
It is possible that DBMS built according to this technology may be good for data management applications that
deal with weakly structured data, [5] or a rapidly evolving and mutating data scheme
have the data scheme built or maintained by inexperienced users not specializing in IT
Here is a list of some (possibly overlapping) application domains each meeting several these criteria:
organizing home or personal archieves
catalogues of all kinds, for example annotated file archieves with structural search capabilities
taxonomies and classifications - in biology, chemestry, genetics
storing scientific experimental data
prototyping - one possible scenario is to design classical realtional database by careful examination of table-less database created and tested by application domain experts
While the relational model did imply at least a naive implementation technique (one relvar - one file or file fraction) implementations of table-less data model will need to carefully guess the physical data schema if one is not advised by the user (for example because the use is not an IT professional).
It is anticipated that either direct setup of phisical data schema by user or adaptive DBMS self-tune up will significantly influence performance with medium and large sized databases.
However at small and tiny data volumes even the most naive implementations are expected to deliver acceptable performance.
Concurrent data access implies acquiring implicit read and write locks for data.
The relational data model provides a good choice of objects that may be used for locking: relvars, tuple implementation-dependent groups (pages), individual tuples.
Realizations of table-less data model will need special efforts to develop an adequate locking model.
This article will be published as https://tagunov.tripod.com/table-less/
Future actions may include:
finding out if similar ideas have aready been discussed
setting up a forum and/or mail list (where is it best to do so?)
public discussion
definition of insert/delete behavior
sketching a DML
setting up an open-source project if there is interest
developing a naive prototype
inventing a name better then “table-less” both for the data model and possible future prototype
the order of these actions is insignificant ;-)
Currently the author would like to thank everybody for the attention and ask to send him any feedback possible! :-)
After [C. J. Date]:
a set of <attribute-name, attribute-domain> pairs.
attribute-domain is INT, CHAR, user-defined-type, etc.
a set of <attribute-name, attribute-value> pairs.
a combination of a relation header and a finit set of conforming tuples, called relation body
a durable mutable database entity, such as SQL table, with an associated relation type that can be assigned a relational value of the same type
alternatively - a relational algebra expression (or an equivalent) over other relational variables, VIEW in SQL
three following entitites may be (or always are) associated with a relvar in a relational database:
- similarly named but deeply different in nature.
a set of relational variables
a mapping from the space of database states to the space of sets of propositions about real-world objects
set of values of all the relational variables in the relational database
a mapping for every element of every attribute domain used to a distinct object of the real world
an informal predicate associated with every relational variable
given a relational database interpretation any database state is interpreted in the following way: every tuple of every relational variable is mapped via the relvar's informal predicate to a proposition about real world objects
After [C. J. Date]
a statement (that is either true or false) about objects of the real world that the attributes of the relational variable map to
the meaning associated to a relvar
After [C. J. Date]
for a relational variable - conjunction of all consistency constraints local to this relvar: unique keys, etc
for a relational database - conjunction of all consistency constraints defined for the database: individual relvar constraints, foreign keys, other inter-relvar constraints, etc.
formal predicate is the statment that all these constraints are satisfied
probably expressed as a logical formula (WFF) over the implicit predicates of the database relational variables [C. J. Date]
This paper uses a couple of non-standard terms:
same as relation header but in application to a predicate
a logical function defines and is defined by the set of argument combinations for which it is true
predicate (as a logical function) and relation are dual views of the same entity
this paper calls the predicate implicitly associated with a relational variable its implicit predicate
this predicate changes as the value of the relvar changes
implicit predicate may be used to formulate formal predicats or to query database in a predicate caluculus-based DML [C. J. Date]
although not done in this paper, if needed one might also speak about an implicit predicate of relational value; then the implicit predicate of relational variable would be the implicit predicate of its current value
[1] SQL is known to have both relational algebra and predicate calculus features.
[2] 2**N-1 because we're not interested in the empty attribute set. The ** operation here denotes raising to power, notation probably coming from BASIC :-)
[3] Except that the author fills more comfortable with having the target on the right of the expression :-)
[4] Grouping attributes by tables is not only about the normalization: imagine that we have functional dependencies a->b, a->c, then we may group them either as
Table ab |
| ||
---|---|---|---|
Table ac |
|
or as
Table abc |
|
---|
[5] Dealing with “weakly structured data” here stands for storing diverse interrelated data entities each characterizable in a relational fashion but such that the set of attributes significantly changes from one to another and is generally unpredictable.