One of the nice features of Sicstus Prolog is the support for storing Terms externally in a Berkeley DB database. Since we can control how the terms are indexed in the database, it is possible to store and retrieve a large amount of Terms (the limit is 2^32-1) fairly efficiently. This can be useful when we work in a memory-constrained environment. The interface to Berkeley DB is provided by the library “bdb”.
As a matter of interest, the Prolog version of my “iLexicon” words database currently contains 2,453,333 facts (close to 2.5 million) and Sicstus Prolog loads this entire data in 4 seconds flat! Of course, I have 32 GB RAM on my machine and that can explain the good performance. Therefore, as of now I have not felt the need to use Berkeley DB as the backend, but who knows, it might come in handy later on.
In today’s article, I want to explore the BDB library. For more details, you may want to consult the Sicstus Prolog manual.
My Prolog environment: Sicstus Prolog 4.5.1, 64 bit Windows edition.
It is important to note that the current version of Sicstus Prolog only supports Berkeley DB version 6.2.38. So I had to download this version and install on my Windows machine in order to use the BDB library.
Creating a new database
To create a new database, we have to give it a “name” and also mention the various “functors” and their respective indexing specifications. Additionally, the DB must be opened in “update” mode. In this case, the BDB engine creates a directory with the same name in the current working directory. In my example, instead of storing one fact at a time, I am going to read facts from an existing Prolog file and store them in the database.
Here are the facts (stored in the file “remedies.pro”):
Here is the code to create a new Berkeley DB and populate it with the facts from the given file:
As the code above shows, after the database is created and populated with sample data, it is closed immediately.
The above predicate is executed in Sicstus Prolog IDE thus:
If you look at the third argument, you can see the indexing specification of the various functors. The “+” denotes that the corresponding argument is to be indexed. In the case of “abbrev” functor, we have separately indexed both the arguments and there is also an un-indexed version. The latter is required if we decide to retrieve all matches of “abbrev” without specifying the arguments (i.e., non-ground, e.g. abbrev(X, Y) ).
Updating the database
When we update an existing database, for example, by adding a new fact, the database must be opened in ‘update’ mode and more importantly, we have to again supply the same predicate specification we gave at the time of creating. The latter looks like an avoidable repetition, but it is needed.
Here is the code to update:
It is executed as follows:
This adds a new fact abbrev(‘acon.’, ‘aconitum napellus’) into the DB. As in the earlier case, this operation closes the DB after update. This is not optimal if we want to add many facts to the DB at the same time, but it is easy to extend the code to handle that case.
Opening in ‘read’ mode
If we do not need to add or delete entries from the DB, we can safely open it in ‘read’ mode. In practice, after opening the DB, we might perform several operations before closing it at the end. In such a situation, it is a good idea to assert the DB handle in the current working memory so that it is available for subsequent operations. Here is the code to illustrate this:
The above predicate uses the functor ‘dbinfo’ to remember the current DB name and its handle. You can also see that I am not passing the original indexing specification here; it is not needed in ‘read’ mode.
Fetching one or more facts
Once the DB is open for reading, we can retrieve matching terms based on unification. Here is a simple predicate for doing that:
We can use this predicate in our remedies database to look for all abbreviations:
Notice how the predicate can backtrack to return multiple bindings for X and Y. This is made possible by the indexing specification “abbrev(-,-)”. Let us see what happens when we try this with the “group” functor:
If you recall, the index specification in this case is “group(+,-)”, which says that the first argument must be grounded because it is used for indexing. Since we supplied a variable as the first argument, the fetch operation could not succeed.
The following, however, works:
Iterating over all facts
Instead of fetching specific terms, we can also iterate over all (or select) items in the DB.
The ‘traverse_db’ predicate takes the DB name and another ‘Action’ predicate (which must take a single argument) and applies the ‘Action’ on each item in the DB. The traversal order is not guaranteed.
Here is a use case:
Closing the DB
Once all database operations are complete, it is necessary to close the DB. Here is a predicate that does that:
It uses the supplied DB name to look up the corresponding handle and then uses that handle to close the DB. In the process, it removes information about the current DB from the working memory.
Exporting and Importing
The contents of a Berkeley DB can be exported to a text file. The predicate “db_export” does this. The exported file contains extra meta information about the database and is in Prolog format. This file can be “imported”, if needed, using the predicate “db_import”.
One aspect of the database I have ignored is the “Environment”. This is optional and is only needed if multiple processes wish to share access to the same database. I urge you to go through the manual if you are interested to learn more about this.
As you would have observed, using the BDB library to access an external Berkeley DB is quite intuitive in Sicstus Prolog. As hinted earlier, this feature will come in handy when working with a large fact base in a memory-limited environment.
You can download my Prolog file here. The sample data is in this file.
Have a nice weekend!
Recent Comments