Using Berkeley DB with Sicstus Prolog

Written by on August 18, 2019 in Programming, Prolog with 0 Comments

One of the nice features of Sicstus Prolog is the support for storing Terms externally in a Berkeley DB database. Since we can control how the terms are indexed in the database, it is possible to store and retrieve a large amount of Terms (the limit is 2^32-1) fairly efficiently. This can be useful when we work in a memory-constrained environment. The interface to Berkeley DB is provided by the library “bdb”.

As a matter of interest, the Prolog version of my “iLexicon” words database currently contains 2,453,333 facts (close to 2.5 million) and Sicstus Prolog loads this entire data in 4 seconds flat! Of course, I have 32 GB RAM on my machine and that can explain the good performance. Therefore, as of now I have not felt the need to use Berkeley DB as the backend, but who knows, it might come in handy later on.

In today’s article, I want to explore the BDB library. For more details, you may want to consult the Sicstus Prolog manual.

My Prolog environment: Sicstus Prolog 4.5.1, 64 bit Windows edition. 

It is important to note that the current version of Sicstus Prolog only supports Berkeley DB version 6.2.38. So I had to download this version and install on my Windows machine in order to use the BDB library.

Creating a new database

To create a new database, we have to give it a “name” and also mention the various “functors” and their respective indexing specifications. Additionally, the DB must be opened in “update” mode. In this case, the BDB engine creates a directory with the same name in the current working directory. In my example, instead of storing one fact at a time, I am going to read facts from an existing Prolog file and store them in the database.

Here are the facts (stored in the file “remedies.pro”):

Sample Facts about Homeo Remedies

Sample Facts about Homeo Remedies

Here is the code to create a new Berkeley DB and populate it with the facts from the given file:

Code to Create the DB

Code to Create the DB

As the code above shows, after the database is created and populated with sample data, it is closed immediately.

The above predicate is executed in Sicstus Prolog IDE thus:

Creating the DB

Creating the DB

If you look at the third argument, you can see the indexing specification of the various functors. The “+” denotes that the corresponding argument is to be indexed. In the case of “abbrev” functor, we have separately indexed both the arguments and there is also an un-indexed version. The latter is required if we decide to retrieve all matches of “abbrev” without specifying the arguments (i.e., non-ground, e.g. abbrev(X, Y) ).

Updating the database

When we update an existing database, for example, by adding a new fact, the database must be opened in update’ mode and more importantly, we have to again supply the same predicate specification we gave at the time of creating. The latter looks like an avoidable repetition, but it is needed.

Here is the code to update:

Code to Update

Code to Update

It is executed as follows:

Executing Update in the IDE

Executing Update in the IDE

This adds a new fact abbrev(‘acon.’, ‘aconitum napellus’) into the DB. As in the earlier case, this operation closes the DB after update. This is not optimal if we want to add many facts to the DB at the same time, but it is easy to extend the code to handle that case.

Opening in ‘read’ mode

If we do not need to add or delete entries from the DB, we can safely open it in ‘read’ mode. In practice, after opening the DB, we might perform several operations before closing it at the end. In such a situation, it is a good idea to assert the DB handle in the current working memory so that it is available for subsequent operations. Here is the code to illustrate this:

Opening in 'read' Mode

Opening in ‘read’ Mode

The above predicate uses the functor ‘dbinfo’ to remember the current DB name and its handle. You can also see that I am not passing the original indexing specification here; it is not needed in ‘read’ mode.

Fetching one or more facts

Once the DB is open for reading, we can retrieve matching terms based on unification. Here is a simple predicate for doing that:

Retrieving Facts

Retrieving Facts

We can use this predicate in our remedies database to look for all abbreviations:

Fetching Terms

Fetching Terms

Notice how the predicate can backtrack to return multiple bindings for X and Y. This is made possible by the indexing specification “abbrev(-,-)”. Let us see what happens when we try this with the “group” functor:

Index Specification Mismatch

Index Specification Mismatch

If you recall, the index specification in this case is “group(+,-)”, which says that the first argument must be grounded because it is used for indexing. Since we supplied a variable as the first argument, the fetch operation could not succeed.

The following, however, works:

Satisfying Index Specification

Satisfying Index Specification

Iterating over all facts

Instead of fetching specific terms, we can also iterate over all (or select) items in the DB.

Iterating Over the DB

Iterating Over the DB

The ‘traverse_db’ predicate takes the DB name and another ‘Action’ predicate (which must take a single argument) and applies the ‘Action’ on each item in the DB. The traversal order is not guaranteed.

Here is a use case:

Iterating Over the DB

Example: Iterating Over the DB

Closing the DB

Once all database operations are complete, it is necessary to close the DB. Here is a predicate that does that:

Closing the DB

Closing the DB

It uses the supplied DB name to look up the corresponding handle and then uses that handle to close the DB. In the process, it removes information about the current DB from the working memory.

Exporting and Importing

The contents of a Berkeley DB can be exported to a text file. The predicate “db_export” does this. The exported file contains extra meta information about the database and is in Prolog format. This file can be “imported”, if needed, using the predicate “db_import”.

One aspect of the database I have ignored is the “Environment”. This is optional and is only needed if multiple processes wish to share access to the same database. I urge you to go through the manual if you are interested to learn more about this.

As you would have observed, using the BDB library to access an external Berkeley DB is quite intuitive in Sicstus Prolog. As hinted earlier, this feature will come in handy when working with a large fact base in a memory-limited environment.

You can download my Prolog file here. The sample data is in this file.

Have a nice weekend!

 

Tags: ,

Subscribe

If you enjoyed this article, subscribe now to receive more just like it.

Subscribe via RSS Feed

Leave a Reply

Your email address will not be published.

Top