Notes on Python’s dbm.sqlite3 module
Python provides a simple key-value database API in its
dbm namespace, and since version 3.13 this
includes an SQLite backend. As SQLite is neither simple nor merely a key-value database, there are a few quirks to
this dbm.sqlite3 library that I’m documenting in this post.
Schema and data types
You can see the SQL schema of a dbm.sqlite3 database at the top of CPython’s
Lib/dbm/sqlite3.py. In summary:
-
Dict table
- key column (unique, non-null)
- value column (non-null)
The Dict table is created with default settings. In SQLite that means the table has
rowid and does not have
STRICT typing.
The key and value columns are declared with BLOB
affinity, and because the table is
non-STRICT, that means in theory they can contain any SQLite data type (null excluded due to the
NON NULL declaration). However, dbm.sqlite3 queries always coerce keys and values to
BLOBs first; as long as you always edit the database using dbm.sqlite3, all keys and
values will only contain BLOBs. This can lead to surprises, for example due to differences in
floating-point precision:
>>> db = dbm.sqlite3.open("test.sqlite", "c")
>>> num = 2.000000000000002
>>> num
2.000000000000002
>>> db[num] = num
>>> dict(db)
{b'2.0': b'2.0'}
In practice, you should probably serialise keys and values yourself into Python bytes objects (which
round-trip to SQLite BLOBs), because otherwise the database will do it for you in a way that you may
not expect. Handling the serialisation yourself also makes you less likely to forget to deserialise the values you
get back. If you use Python’s
shelve module as a higher-level
abstraction to the dbm database, value (de)serialisation is done automatically for you using
pickle.
Using the database from multiple threads
Python’s sqlite3 module (which
dbm.sqlite3 is built on top of) has a connection-level check_same_thread option that, by
default, prevents the connection from being used from multiple threads. This option is not exposed by
dbm.sqlite3. In other words, dbm.sqlite3 connections can only be used from the thread it
is created on.
If your use case needs check_same_thread disabled, the only solution available at the moment is to
copy the dbm.sqlite3 source code to your project and modify it there.
Depending on the value of the
sqlite3.threadsafety
constant, and especially when check_same_thread is disabled, you may need to regulate access to the
database using a mutex. Refer to the sqlite3.threadsafety documentation to see the level of locking
you need to perform. In practice, if you’re on a mainstream Linux distribution, SQLite is likely compiled with
Serialized threading mode enabled, making it safe to use from multiple threads.
The simplest way to handle the threading issue is to check that sqlite3.threadsafety is
3 (Serialized)—raising an exception otherwise—and hope that none of your users encounter this
limitation. But that’s up to you; feel free to handle it according to your needs.