Login | Register
My pages Projects Community openCollabNet

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Catacomb] best table schema to support BIND ?



Hello,

While digging around the archives, I found an interesting email [1]
from Jim Whitehead in which he refers to a "binding-like referential
containment model" for supporting the BIND specification.  I would
like to explore what would be a good table schema / containment model
for supporting the BIND specification.

In _The_Art_of_SQL_, Stephane Faroult mentions three ways to represent a tree:

1. Adjacency model. -- a row in the table for every edge
2. Materlialize path model -- a row for every path
3. Nested set model -- every node has a pair of numbers. all
descendants of that node have pairs of numbers which lie between the
ancestor's two

Catacomb currently uses a materialized path model.  I'm guessing this
is for performance reasons -- otherwise for every lookup, you have to
do a join for every edge in the path.

Now what if we were to no longer have just a tree, but a graph?  Well,
if cycles are allowed (I think they are by the BIND spec), then there
could be an infinite number of paths.  Even if we only allowed a dag,
there may still be multiple paths to a node which would be wasteful.
If we were to use a materialized path model (still allowing only a
dag), the first change would be to have a separate table for all the
file contents, live properties, etc.  But you would still have
multiple rows in the paths table, which can still get wasteful with
multiple binds per resource.

An adjacency model would be less wasteful (only 1 row per bind), but I
wonder how it would perform.  Faroult claims better performance with
the adjacency model compared to the materialized path model when using
Oracle's CONNECT BY operator.  I don't believe MySQL has anything like
that?

I assume the nested set model performs poorly when a fair number of
inserts is happening. It looks like there's O(n) operations per insert
(n is total number of resources already in the database).

Does anyone know of any other ways to represent a graph or dag?

In general, what are some ideas people have about attacking this problem?

thanks,
Tim

[1] http://mailman.webdav.org/pipermail/catacomb/2002-September/000071.html