Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| type(index_t), | intent(in) | :: | ix |
Index slot to test |
Live (not dropped)