[Catacomb] FIXED: Bug in collection locking, possibly corrupts database
OK, I fixed this. Quite tricky, spent a good part of last night tracking it
down. It's a two line fix, but requires a longer explanation.
You can easily see the problem if you recreate the steps outlined in the
previous mail, and list the contents of the dasl_lock table after each cadaver
Let's say you add a resource /repos/folder/file to a locked collection
/repos/folder (supplying appropriate locktokens of course). RFC 2518 says that
the new resource should become part of the lock held on the collection (i.e.
inherit the lock of the parent). Catacomb implements this by introducing two
types of locks, direct and indirect.
So, in this case, /repos/folder should have a direct lock, while adding
/repos/folder/file should cause lock inheritance and create an indirect lock
pointing to the direct lock for /repos/folder. But if you list the dasl_lock
table, you will notice that /repos/folder/file actually has TWO indirect locks
on it, and if you look even closer you will find that /repos/folder also has two
locks, one direct and one indirect.
If you now try removing the lock on /repos/folder, one stale indirect lock on
/repos/folder/file is left behind, and it points to a now non-existent lock for
/repos/folder, which causes a 500 ISE when accessing this resource later.
The real culprit is not in the lock inheritance code, but in the primary lock
code which mistakenly creates both a direct and an indirect lock for
/repos/folder. This tricks the lock inheritance code into creating double locks
when new child resources are added later.
The cause of all this is a dubious piece of code in repos.c:dav_repos_walk()
which adds a trailing slash to a resource if it is of collection type. This
later trips up the collection locking code, when the collection is locked and
then traversed to add indirect locks to any children. The function
util_lock.c:dav_lock_walker() is called for every item during this traversal,
and it uses hooks->is_same_resource() to make sure it does not add an indirect
lock to the target collection which already has a direct lock. However, the
passed in collection URI has a trailing slash, while the collection URI in the
database doesn't, so they are treated as different resources, causing a second
indirect lock to be blindly added to the collection.
As far as I can see, resource URIs are stored in the database without trailing
slashes, and none of the SQL queries require a trailing slash for collections.
In fact, dbms.c:dav_repos_no_trail() is called in several places to ensure the
URI does NOT have a trailing slash before constructing the SQL queries.
Therefore, adding a trailing slash to collections seems completely unnecessary
and only causes problems.
An alternative fix would be modifying repos.c:dav_repos_is_same_resource() to
ignore trailing slashes when comparing URIs.
P.S. There is one more bug (not caused by this patch) which creates a stale null
lock (in table dasl_locknull) when a collection is created inside a locked
collection. It doesn't seem to cause functionality problems for me, but it might
interfere if you actually use null locks. I'll look into this later, I need some
sleep now :)
Makefiles not war
----- Original Message -----
From: "Ivan Todoroski" <firstname.lastname@example.org>
Sent: Thursday, November 13, 2003 5:00 PM
Subject: [Catacomb] Bug in collection locking, possibly corrupts database
> Test done using latest CVS, on a fresh database:
> $ cadaver localhost/repos
> dav:/repos/> mkcol test
> Creating `test': succeeded.
> dav:/repos/> lock test
> Locking collection `test': succeeded.
> dav:/repos/> mkcol test/asd
> Creating `test/asd': succeeded.
> dav:/repos/> unlock test
> Unlocking `test': failed:
> 500 Internal Server Error [there is no message in error_log at this point]
> dav:/repos/> ls
> Listing collection `/repos/': succeeded.
> Coll: test 0 Nov 13 16:41
> dav:/repos/> cd test
> dav:/repos/test/> ls
> Listing collection `/repos/test/': failed:
> XML parse error at line 1: no element found
> The last operation causes the following messages in error_log:
> Provider encountered an error while streaming a multistatus PROPFIND response.
> [500, #0]
> The lock database was found to be corrupt. An indirect lock's direct lock
> not be found. [500, #402]
> Makefiles not war
> Catacomb mailing list
Description: Binary data