What's That Noise?! [Ian Kallen's Weblog]

All | LAMP | Music | Java | Ruby | The Agilist | Musings | Commute | Ball
Main | Next day (Feb 9, 2006) »

20060208 Wednesday February 08, 2006

BerkeleyDB's "Tied Hash" for Java

One of the really wonderful and evil things about Perl is the tie interface. You get a persistent hash without writing a boat load of code. With Sleepycat's BerkeleyDB Java Edition you can do something very similar.

Here's a quick re-cap: I've mentioned fiddling with BerkeleyDB-JE before with a crude "hello world" app. You can use the native code version with Perl with obscene simplicity, too. In years past, I enjoyed excellent performance with older versions of BerkeleyDB that used a class called "DB_File" -- today, the thing to use is the "BerkeleyDB" library off of CPAN (note, you need db4.x+ something for this to work). Here's a sample that writes to a BDB:

#!/usr/bin/perl

use BerkeleyDB;
use Time::HiRes qw(gettimeofday);
use strict;

my $filename = '/var/tmp/bdbtest';
my %hash = ();
tie(%hash, 'BerkeleyDB::Hash', { -Filename => $filename, -Flags => DB_CREATE });
$hash{'539'} = "d\t" . join('',@{[gettimeofday]}) . "\tu\thttp://www.sifry.com/alerts";
$hash{'540'} = "d\t" . join('',@{[gettimeofday]}) . "\tu\thttp://epeus.blogspot.com";
$hash{'541'} = "d\t" . join('',@{[gettimeofday]}) . "\tu\thttp://http://joi.ito.com";
untie(%hash);
Yes, I'm intentionally using plain old strings, not Storable, FreezeThaw or any of that stuff.
To prove that our hash was really persistent, we might do this:
#!/usr/bin/perl

use BerkeleyDB;
use strict;

my $filename = '/var/tmp/bdbtest';
my %hash = ();
tie(%hash, 'BerkeleyDB::Hash', { -Filename => $filename, -Flags => DB_RDONLY });
for my $bid (keys %hash) {
    my %blog = split(/\t/,$hash{$bid});
    print "$bid:\n";
    while(my($k,$v) = each(%blog)) {
        print "\t$k => $v\n";
    }
    
}
untie(%hash); 
Which would render output like this:
541:
        u => http://http://joi.ito.com
        d => 1139388034903283
539:
        u => http://www.sifry.com/alerts
        d => 1139388034902888
540:
        u => http://epeus.blogspot.com
        d => 1139388034903227 

Java has no tie operator (that's probably a good thing). But Sleepycat has incorporated a Collections framework that's pretty cool and gets you pretty close to tied hash functionality. Note however that it's not entirely compatible with the interfaces in the Java Collections Framework but if you know those APIs, you'll immediately know the Sleepycat APIs.

com.sleepycat.collections.StoredMap implements java.util.Map with the folloing cavaets:

  1. It doesn't know how big it is, so don't call the size() method unless you want to see a UnsupportedOperationException
  2. You can't just abandon java.util.Iterators that have been working on a StoredMap, you have to use com.sleepycat.collections.StoredIterator's .close(Iterator) method to tidy up.
But that's no big deal.

So what does the code look like? Well, let's say you wanted to store a bunch of these vanilla beans in the database:

public final class ImmutableBlog implements Serializable {

    private static final long serialVersionUID = -7882532723565612191L;
    private long lastmodified;
    private String url;
    private int id;
    
    public ImmutableBlog(final int id, final long lastmodified, final String url) {
        this.id = id;
        this.lastmodified = lastmodified;
        this.url = url;
    }
    public int getId() {
        return id;
    }
    public long getLastmodified() {
        return lastmodified;
    }
    public String getUrl() {
        return url;
    }
    
    public boolean equals(Object o) {
        if (!(o instanceof ImmutableBlog))
            return false;
        if (o == this)
            return true;
        ImmutableBlog other = (ImmutableBlog)o;
        return other.getId() == this.getId() &&
            other.getLastmodified() == this.getLastmodified() &&
            other.getUrl().equals(this.getUrl());
    }
    
    public int hashCode() {
        return (int) (id * 51 + url.hashCode() * 17 + lastmodified * 29);
    }
    
    public String toString() {
        StringBuffer sb = new StringBuffer(this.getClass().getName());
        sb.append("[id=")
        .append(id)
        .append(",lastmodified=")
        .append(lastmodified)
        .append(",url=")
        .append(url)
        .append("]");
        return sb.toString();
    }
}
note that it implements java.io.Serializable
This is a class that knows how to persist ImmutableBlogs and provides a method to fetch the Map:
public class StoredBlogMap  {
    
    private StoredMap blogMap;
    public StoredBlogMap() throws Exception {
        init();
    }
    
    protected void init() throws Exception {
        File dir = new File(System.getProperty("java.io.tmpdir") +
                File.separator + "StoredBlogMap");
        dir.mkdirs();
        EnvironmentConfig envConfig = new EnvironmentConfig();
        envConfig.setAllowCreate(true);
        Environment env = new Environment(dir, envConfig);
        DatabaseConfig dbConfig = new DatabaseConfig();
        dbConfig.setAllowCreate(true);
        Database blogsdb = env.openDatabase(null, "blogsdb", dbConfig);
        Database classdb = env.openDatabase(null, "classes", dbConfig);
        StoredClassCatalog catalog = new StoredClassCatalog(classdb);
        blogMap = new StoredMap(blogsdb,
                new IntegerBinding(), new SerialBinding(catalog, 
                        ImmutableBlog.class), true);
    }
    
    public Map getBlogMap() {
        return blogMap;
    }
}
The majority of the code is just plumbing for setting up the underlying database and typing the keys and values.
Here's a unit test:
public class StoredBlogMapTest extends TestCase {

    private static Map testMap;
    static {
        testMap = new HashMap();
        testMap.put(new Integer(539), 
                new ImmutableBlog(539, System.currentTimeMillis(), 
                        "http://www.sifry.com/alerts"));
        testMap.put(new Integer(540), 
                new ImmutableBlog(540, System.currentTimeMillis(), 
                        "http://epeus.blogspot.com"));
        testMap.put(new Integer(541), 
                new ImmutableBlog(541, System.currentTimeMillis(), 
                        "http://www.arachna.com/roller/page/spidaman"));        
    };
    private StoredBlogMap blogMap;
    
    protected void setUp() throws Exception {
        super.setUp();
        blogMap = new StoredBlogMap();
    }
    
    public void testWriteBlogs() throws Exception {
        Map blogs = blogMap.getBlogMap();
        for (Iterator iter = testMap.entrySet().iterator(); iter.hasNext();) {
            Map.Entry ent = (Map.Entry) iter.next();
            blogs.put((Integer)ent.getKey(), (ImmutableBlog)ent.getValue());
        }
        int i = 0;
        for (Iterator iter = blogMap.getBlogMap().keySet().iterator(); iter.hasNext();) {
            iter.next();
            i++;
        }
        assertEquals(testMap.size(), i);
    }
    
    public void testReadBlogs() throws Exception {
        Map blogs = blogMap.getBlogMap();
        Iterator iter = blogs.entrySet().iterator();
        while (iter.hasNext()) {
            Map.Entry ent = (Map.Entry) iter.next();
            ImmutableBlog test = (ImmutableBlog) testMap.get(ent.getKey());
            ImmutableBlog stored = (ImmutableBlog) ent.getValue();
            assertEquals(test, stored);
        }
        StoredIterator.close(iter);
    }

    public static void main(String[] args) {
        junit.textui.TestRunner.run(StoredBlogMapTest.class);
    }
}
These assertions all succeed, so assigning to and fetching from a persistent Map works! One of the notable things about the BDB library, it will allocate generous portions of the heap if you let it. The upside is that you get very high performance from the BDB cache. The downside is... using up heap that other things want. This is tunable, in the StoredBlogMap ctor, add this:
// cache size is the number of bytes to allow Sleepycat to nail up
envConfig.setCacheSize(cacheSize);
// ... now setup the Environment

The basic stuff here functions very well, however I haven't run the any production code that uses Sleepycat's Collections yet. My last project with BDB needed to run an asynchronous database entry remover, so I wanted to remove as much "padding" as possible.

( Feb 08 2006, 12:22:21 AM PST ) Permalink