A (possibly composite) secondary index. The key is the member
column bytes concatenated in declared order; a single-column index
is just arity 1. unique enforces that no two live rows share a
key.
One open table: schema, derived layout, open units and index set.
One undo record captured before a transaction overwrites part of a
base file. A REGION record stores the original bytes of an in-place
overwrite (rollback writes them back); an EXTEND record stores only
the original file length (rollback truncates appended bytes away).
Module-private — exposed only as a component of journal_t.
Pre-transaction snapshot of one table's in-memory counters. The undo
journal restores file bytes; these cached values (high-water row id, live
count, blob append position) are advanced in memory by row mutations and
are not on disk per-write, so a rollback restores them from here — the
record analogue of bt_reload for the index trees. Module-private —
exposed only as a component of journal_t.
Per-database rollback journal. Opaque to callers, carried as a
component of db_t; driven by the txn_* / jrnl_* procedures. The
file <db>/_journal.dat is a reusable sidecar — a hot (valid) journal
exists iff a transaction is in flight or a crash interrupted one.
An open database handle. Obtain with db_open; release with
db_close. A handle is bound to one directory for its lifetime.
A forward (ascending) cursor over the live rows of a table in the key
order of one of its single-column indices. Obtain it from
db_open_cursor (the whole index) or db_find_range (an inclusive
[lo,hi] band), then pull rows with db_cursor_next until it reports
exhaustion — the pull complement to the db_scan callback.
CONTRACT: the cursor rides on the table's already-open index, so there
is nothing to close; but it is invalidated by any mutating call on the
handle (db_insert / db_update / db_delete / db_compact, and the
structural db_create_table / db_drop_table, which can shift table
slots) — re-open it after mutating. This is enforced: the cursor
snapshots db%generation at creation and db_cursor_next returns
SQR_INVALID (rather than reading a stale/own slot) if it has since
changed. The component layout is exposed for transfer-free storage
only; callers should treat it as opaque.
Context passed to bt_journal_adapter — the bridge that turns a
b_tree's pre-write hook (see bt_set_journal_hook) into rollback-journal
captures. It names the database whose journal receives the undo records
and the tree's on-disk file relative to the database directory. The
db target must out-live every tree the adapter is installed on (it is so
by construction: the trees are components of db%tables).
Create a secondary index. Accepts either a single column name or a
rank-1 array of member column names (composite key), each with an
optional unique=.
Drop a secondary index. Accepts either a single column name or a rank-1
array of member column names (the same shape that created it).
Open an ascending cursor over the live rows whose indexed value lies
in the inclusive band [lo, hi]. Typed on the column: int32,
real64 (where a tolerance match belongs — lo = x-eps,
hi = x+eps — never fuzzy equality), or DT_CHAR (NUL-padded to the
column width). col_name may be an exact single-column index or the
leading member of a composite index (a leading-prefix range, so no
redundant single-column index is needed). Pull rows with
db_cursor_next. lo > hi yields an empty cursor; NULL-member rows are
excluded.
| Type | Visibility | Attributes | Name | Initial | |||
|---|---|---|---|---|---|---|---|
| integer, | public, | parameter | :: | DT_INT | = | 1 |
32-bit integer column (4 B) |
| integer, | public, | parameter | :: | DT_REAL | = | 2 |
64-bit real column (8 B)
Fixed-width character column (1..65536 B). Stored NUL-padded and
read back up to the first NUL (see |
| integer, | public, | parameter | :: | DT_CHAR | = | 3 | |
| integer, | public, | parameter | :: | DT_TEXT | = | 4 |
Arbitrary-length text; bytes in In-row descriptor size for a |
| integer, | public, | parameter | :: | SQR_TEXT_DESC | = | 12 | |
| integer, | public, | parameter | :: | SQR_OK | = | 0 |
Success |
| integer, | public, | parameter | :: | SQR_NOT_FOUND | = | 1 |
No such table / row / index / key |
| integer, | public, | parameter | :: | SQR_DUP | = | 2 |
Duplicate table or unique-key violation |
| integer, | public, | parameter | :: | SQR_ERR | = | 3 |
I/O or filesystem failure |
| integer, | public, | parameter | :: | SQR_VERSION | = | 4 |
Unsupported on-disk format version |
| integer, | public, | parameter | :: | SQR_INVALID | = | 5 |
Bad argument or corrupt on-disk metadata |
| integer, | public, | parameter | :: | SQR_READONLY | = | 6 |
Write attempted on a read-only open |
| integer, | public, | parameter | :: | SQR_LOCKED | = | 7 |
Database held by another connection Current on-disk format version. There is a single format: composite
index records (ncols, member names, key_size, unique) with each
index stored as a generic on-disk B+-tree. No migration path — a
schema whose version differs is rejected with |
| integer, | public, | parameter | :: | SQR_SCHEMA_VERSION | = | 1 | |
| integer(kind=int8), | public, | parameter | :: | ROW_ALIVE | = | 1_int8 |
Live row |
| integer(kind=int8), | public, | parameter | :: | ROW_TOMBSTONE | = | 2_int8 |
Deleted row (space reclaimed by |
| integer, | public, | parameter | :: | SQR_NAME_LEN | = | 32 |
Max table/column name length (bytes) |
| character(len=4), | public, | parameter | :: | SQR_MAGIC | = | 'SQRT' |
Schema-file magic Byte-order mark written into the catalog and schema headers (just
after the magic). An asymmetric native |
| integer(kind=int32), | public, | parameter | :: | SQR_BOM | = | int(z'01020304', int32) |
Sanity cap on a fixed record (status byte + all column bytes). Used both to reject over-large schemas at create time and as a corruption guard when reading a schema back from disk. |
| integer, | public, | parameter | :: | SQR_MAX_RECORD | = | 1024*1024 |
One column definition. Width and offset are derived at
create-table time by |
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(in) | :: | db |
Database handle |
||
| character(len=*), | intent(in) | :: | name |
Table name to look up |
Slot in db%tables, 0 if absent
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| type(index_t), | intent(in) | :: | ix |
Index slot to test |
Live (not dropped)
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| character(len=*), | intent(in) | :: | buf |
Row buffer |
Status byte value
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| character(len=*), | intent(in) | :: | buf |
Row buffer |
||
| type(column_t), | intent(in) | :: | col |
Column to test |
.true. if the column's NULL bit is set
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| character(len=*), | intent(in) | :: | buf |
Row buffer |
||
| type(column_t), | intent(in) | :: | col |
Source |
Decoded value
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| character(len=*), | intent(in) | :: | buf |
Row buffer |
||
| type(column_t), | intent(in) | :: | col |
Source |
Decoded value
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| character(len=*), | intent(in) | :: | buf |
Row buffer |
||
| type(column_t), | intent(in) | :: | col |
Source |
Decoded string
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(in) | :: | db |
Database handle |
A hot journal is present
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(out) | :: | db |
Database handle (overwritten) |
||
| character(len=*), | intent(in) | :: | dir |
Database directory name |
||
| integer, | intent(out), | optional | :: | stat |
|
|
| character(len=*), | intent(inout), | optional | :: | errmsg |
Human-readable failure detail |
|
| logical, | intent(in), | optional | :: | readonly |
Open read-only (default |
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle |
||
| integer, | intent(out), | optional | :: | stat |
First flush failure, else |
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle |
||
| integer, | intent(out), | optional | :: | stat |
|
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle |
||
| character(len=*), | intent(in) | :: | name |
New table name |
||
| type(column_t), | intent(in) | :: | cols(:) |
Column definitions (name/dtype/csize) |
||
| integer, | intent(out), | optional | :: | stat |
|
|
| character(len=*), | intent(inout), | optional | :: | errmsg |
Human-readable failure detail |
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle |
||
| character(len=*), | intent(in) | :: | name |
Table to drop |
||
| integer, | intent(out), | optional | :: | stat |
|
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle |
||
| character(len=*), | intent(in) | :: | table_name |
Table to compact |
||
| integer, | intent(out), | optional | :: | stat |
|
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle |
||
| character(len=*), | intent(in) | :: | table_name |
Target table |
||
| type(column_t), | intent(in) | :: | col |
New column (name/dtype/csize) |
||
| integer, | intent(out), | optional | :: | stat |
|
|
| character(len=*), | intent(inout), | optional | :: | errmsg |
Human-readable failure detail |
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle |
||
| character(len=*), | intent(in) | :: | table_name |
Target table |
||
| character(len=*), | intent(in) | :: | col_name |
Column to drop |
||
| integer, | intent(out), | optional | :: | stat |
|
|
| character(len=*), | intent(inout), | optional | :: | errmsg |
Human-readable failure detail |
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(in) | :: | db |
Database handle |
||
| character(len=SQR_NAME_LEN), | intent(out), | allocatable | :: | names(:) |
Table names |
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle |
||
| character(len=*), | intent(in) | :: | table_name |
Target table |
||
| character(len=*), | intent(in) | :: | buf |
Row buffer to insert |
||
| integer(kind=int32), | intent(out) | :: | row_id |
Assigned row id (0 on failure) |
||
| integer, | intent(out), | optional | :: | stat |
|
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle |
||
| character(len=*), | intent(in) | :: | table_name |
Target table |
||
| integer(kind=int32), | intent(in) | :: | row_id |
Row id to fetch |
||
| character(len=*), | intent(out) | :: | buf |
Receives the record buffer |
||
| integer, | intent(out), | optional | :: | stat |
|
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle |
||
| character(len=*), | intent(in) | :: | table_name |
Target table |
||
| integer(kind=int32), | intent(in) | :: | row_id |
Row id to rewrite |
||
| character(len=*), | intent(in) | :: | buf |
New record buffer |
||
| integer, | intent(out), | optional | :: | stat |
|
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle |
||
| character(len=*), | intent(in) | :: | table_name |
Target table |
||
| integer(kind=int32), | intent(in) | :: | row_id |
Row id to delete |
||
| integer, | intent(out), | optional | :: | stat |
|
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle |
||
| character(len=*), | intent(in) | :: | table_name |
Target table |
||
| procedure(scan_cb) | :: | cb |
Per-row callback |
|||
| class(*), | intent(inout) | :: | ctx |
Opaque context threaded to |
||
| integer, | intent(out), | optional | :: | stat |
|
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle |
||
| character(len=*), | intent(in) | :: | table_name |
Target table |
||
| integer(kind=int32), | intent(in) | :: | row_id |
Row id |
||
| character(len=*), | intent(in) | :: | col_name |
|
||
| character(len=*), | intent(in) | :: | text |
New text value |
||
| integer, | intent(out), | optional | :: | stat |
|
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle |
||
| character(len=*), | intent(in) | :: | table_name |
Target table |
||
| integer(kind=int32), | intent(in) | :: | row_id |
Row id |
||
| character(len=*), | intent(in) | :: | col_name |
|
||
| character(len=:), | intent(out), | allocatable | :: | text |
Receives the text value |
|
| integer, | intent(out), | optional | :: | stat |
|
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle |
||
| character(len=*), | intent(in) | :: | table_name |
Target table |
||
| character(len=*), | intent(in) | :: | bufs(:) |
Row buffers to insert |
||
| integer(kind=int32), | intent(out) | :: | row_ids(:) |
Assigned ids (0 on failure) |
||
| integer, | intent(out), | optional | :: | stat |
|
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle |
||
| character(len=*), | intent(in) | :: | table_name |
Table to check |
||
| integer, | intent(out), | optional | :: | stat |
|
|
| character(len=*), | intent(inout), | optional | :: | errmsg |
First inconsistency detail |
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle |
||
| character(len=*), | intent(in) | :: | table_name |
Target table |
||
| character(len=*), | intent(in) | :: | col_names(:) |
Unique index member columns |
||
| character(len=*), | intent(in) | :: | keyrow |
Row-shaped buffer holding the key columns |
||
| character(len=*), | intent(out) | :: | buf |
Receives the matched record |
||
| integer, | intent(out), | optional | :: | stat |
|
|
| integer(kind=int32), | intent(out), | optional | :: | row_id |
Resolved row id (0 if unresolved) |
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle |
||
| character(len=*), | intent(in) | :: | table_name |
Target table |
||
| character(len=*), | intent(in) | :: | col_names(:) |
Unique index member columns |
||
| character(len=*), | intent(in) | :: | keyrow |
Row-shaped buffer holding the key columns |
||
| character(len=*), | intent(in) | :: | newrow |
New record buffer |
||
| integer, | intent(out), | optional | :: | stat |
|
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle |
||
| character(len=*), | intent(in) | :: | table_name |
Target table |
||
| character(len=*), | intent(in) | :: | col_names(:) |
Unique index member columns |
||
| character(len=*), | intent(in) | :: | keyrow |
Row-shaped buffer holding the key columns |
||
| integer, | intent(out), | optional | :: | stat |
|
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle |
||
| character(len=*), | intent(in) | :: | table_name |
Target table |
||
| character(len=*), | intent(in) | :: | col_name |
Indexed column |
||
| integer(kind=int32), | intent(in) | :: | key |
Value to match |
||
| integer(kind=int32), | intent(out) | :: | row_id |
Matched row id (0 if none) |
||
| integer, | intent(out), | optional | :: | stat |
|
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle |
||
| character(len=*), | intent(in) | :: | table_name |
Target table |
||
| character(len=*), | intent(in) | :: | col_name |
Indexed column |
||
| real(kind=real64), | intent(in) | :: | key |
Value to match (exact) |
||
| integer(kind=int32), | intent(out) | :: | row_id |
Matched row id (0 if none) |
||
| integer, | intent(out), | optional | :: | stat |
|
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle |
||
| character(len=*), | intent(in) | :: | table_name |
Target table |
||
| character(len=*), | intent(in) | :: | col_name |
Indexed column |
||
| character(len=*), | intent(in) | :: | key |
Value to match |
||
| integer(kind=int32), | intent(out) | :: | row_id |
Matched row id (0 if none) |
||
| integer, | intent(out), | optional | :: | stat |
|
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle |
||
| character(len=*), | intent(in) | :: | table_name |
Target table |
||
| character(len=*), | intent(in) | :: | col_name |
Indexed column to order by |
||
| type(db_cursor_t), | intent(out) | :: | cur |
Positioned cursor |
||
| integer, | intent(out), | optional | :: | stat |
|
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle |
||
| type(db_cursor_t), | intent(inout) | :: | cur |
Cursor (advanced) |
||
| integer(kind=int32), | intent(out) | :: | row_id |
Yielded row id (0 if none) |
||
| character(len=*), | intent(out) | :: | buf |
Receives the record buffer |
||
| logical, | intent(out) | :: | ok |
|
||
| integer, | intent(out), | optional | :: | stat |
|
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| character(len=:), | intent(out), | allocatable | :: | buf |
Allocated, zero-filled buffer |
|
| integer, | intent(in) | :: | n |
Buffer size in bytes |
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| character(len=*), | intent(inout) | :: | buf |
Buffer to clear |
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| character(len=*), | intent(inout) | :: | buf |
Row buffer |
||
| integer(kind=int8), | intent(in) | :: | s |
New status byte |
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| character(len=*), | intent(inout) | :: | buf |
Row buffer |
||
| type(column_t), | intent(in) | :: | col |
Column to mark NULL |
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| character(len=*), | intent(inout) | :: | buf |
Row buffer |
||
| type(column_t), | intent(in) | :: | col |
Column to mark not-NULL |
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| character(len=*), | intent(inout) | :: | buf |
Row buffer |
||
| type(column_t), | intent(in) | :: | col |
Target |
||
| integer(kind=int32), | intent(in) | :: | val |
Value to store |
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| character(len=*), | intent(inout) | :: | buf |
Row buffer |
||
| type(column_t), | intent(in) | :: | col |
Target |
||
| real(kind=real64), | intent(in) | :: | val |
Value to store |
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| character(len=*), | intent(inout) | :: | buf |
Row buffer |
||
| type(column_t), | intent(in) | :: | col |
Target |
||
| character(len=*), | intent(in) | :: | val |
Value to store |
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout), | target | :: | db |
Database handle |
|
| integer, | intent(out), | optional | :: | stat |
|
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle |
||
| integer, | intent(out), | optional | :: | stat |
|
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle |
||
| integer, | intent(out), | optional | :: | stat |
|
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout), | target | :: | db |
Database handle |
|
| integer, | intent(out), | optional | :: | stat |
|
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle (transaction active) |
||
| character(len=*), | intent(in) | :: | path |
Base file, relative to the db directory |
||
| integer(kind=int64), | intent(in) | :: | offset |
1-based byte offset of the region |
||
| integer(kind=int64), | intent(in) | :: | length |
Region length in bytes |
||
| character(len=*), | intent(in), | optional | :: | bytes |
Caller-supplied pre-image (overrides re-read) |
|
| integer, | intent(out), | optional | :: | stat |
|
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle (transaction active) |
||
| character(len=*), | intent(in) | :: | path |
Base file, relative to the db directory |
||
| integer, | intent(out), | optional | :: | stat |
|
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle (transaction active) |
||
| integer, | intent(out), | optional | :: | stat |
|
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle (transaction active) |
||
| integer, | intent(out), | optional | :: | stat |
|
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle (transaction active) |
||
| integer, | intent(out), | optional | :: | stat |
|
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
Database handle |
||
| integer, | intent(out), | optional | :: | stat |
|
Open (or create) a database directory.
A read-write open creates the directory if needed; a read-only open requires an already-initialised database.
CONTRACT: db is intent(out), so any state from a prior open
is discarded before db_open can act on it. The caller MUST
db_close an open handle before reopening it (or opening a
different db into it): the old data/index/blob unit numbers
would otherwise be leaked with the files left open. db_open
cannot defend against this internally — the handle is already
wiped on entry.
Close a database handle: flush schema/catalog (read-write
opens), close all units, and mark the handle closed. Optional
stat reports the first flush failure (schema counters are
persisted only here, so a failed close is where recent data is
lost); the handle is still fully closed regardless.
Demote an open read-write handle to read-only: subsequent writes
return SQR_READONLY, and the exclusive lock is downgraded to a
shared one so other read-only connections may attach. Refused
(SQR_INVALID) on a closed handle or while a transaction is live;
a no-op on a handle already read-only. A failure to downgrade the
lock leaves the handle safely read-only but reports SQR_ERR.
Create a new table from a column-definition array. Fails with
SQR_DUP if the table already exists, SQR_INVALID for a bad
name or column set.
Drop a table and delete all of its files (data, schema,
indices, blob).
Reclaim space for one table: drop tombstoned rows, copy only
the blob bytes still referenced by live rows, renumber the
survivors 1..live_count, and rebuild every index off the
compacted data.
CONTRACT: row_ids are not stable across a compaction —
every surviving row is renumbered, so any row_id a caller holds
across this call is invalid afterward. (Stable handles are the
natural-key feature: db_get_by_key and friends.) Requires a
read-write open db; a read-only open is rejected with
SQR_READONLY.
On-disk consistency is preserved on any failure
(build-then-swap). But if the post-swap reopen of the
compacted data/blob fails, that table's in-memory handle is
left wedged (units = -1) for the rest of the session even
though the on-disk state is the correct compacted file: stat
reports the error, and the caller should db_close and
db_open afresh rather than keep using the handle.
Add a column to an existing table (schema evolution by table
rewrite). col carries the new column's name, dtype and (for
DT_CHAR) csize, exactly as for db_create_table; offset and
null_bit are derived. The column is appended after the existing
ones and every live and tombstoned record is rewritten into the
wider layout with the new column NULL — so existing values read
back unchanged and the new column reads as absent until written.
CONTRACT: row_ids are preserved (unlike db_compact, which
renumbers) — a row_id held across this call stays valid. Existing
secondary indices are untouched: their keys and row_ids do not
change, so no index is rebuilt or dropped. Adding a DT_TEXT
column to a table that had none creates its blob file. Fails with
SQR_NOT_FOUND (no such table), SQR_INVALID (bad column
definition, or a name already in the table), or SQR_READONLY.
On-disk consistency is build-then-swap as in db_compact: the
rewritten data file is renamed in and the schema rewritten back to
back; a hard crash strictly between those two steps is the
documented pre-journal residual window.
Drop a column from an existing table (schema evolution by table
rewrite). Every record is rewritten without the column's bytes and
the surviving columns repacked. CASCADE: any secondary index
that includes the dropped column is dropped too (its slot
tombstoned, its file deleted); indices that do not reference the
column are kept, their keys and row_ids unchanged.
CONTRACT: row_ids are preserved. Dropping the last DT_TEXT
column deletes the table's blob file. Fails with SQR_NOT_FOUND
(no such table or column), SQR_INVALID (the column is the table's
only one — a table must keep at least one column), or SQR_READONLY.
Same build-then-swap durability as db_add_column.
Return the names of all tables in the database.
1-based index of name in db%tables, or 0 if not found.
.true. if an index slot is live; .false. if it has been dropped
(tombstoned with ncols = 0). Callers walking table_t%indices
must skip dead slots — their columns array is deallocated.
Insert a row. buf is a row-shaped buffer filled via the
row_set_* helpers; DT_TEXT columns are zeroed here and
populated afterwards with db_set_text. A unique-index
violation fails with SQR_DUP and writes no row.
Fetch a live row by id into buf. A tombstoned or
out-of-range row returns SQR_NOT_FOUND.
Rewrite an existing live row in place. Records are fixed-size
so the on-disk slot never changes; index entries are maintained
for any indexed column whose key bytes change. DT_TEXT
descriptors are preserved from the stored row (text is changed
via db_set_text, as for insert).
Tombstone a live row. Space is not reclaimed until
db_compact.
Iterate every live row, invoking cb for each until it sets
stop or the table is exhausted.
Set (or replace) the text of a DT_TEXT column on a live row.
Bytes are appended to <table>.blob and the in-row descriptor
updated.
Read the text of a DT_TEXT column from a live row. Returns
an empty string for an empty value.
Single-column overload of db_create_index.
Composite overload of db_create_index. Member columns form
the key in the given order.
Single-column overload of db_drop_index.
Drop the secondary index whose member columns exactly match
col_names. The index file is deleted and the slot tombstoned —
slot numbers stay stable so the __i<slot> file naming of surviving
indices is undisturbed, and a later db_create_index simply appends a
fresh slot. SQR_NOT_FOUND if no index covers exactly those columns.
Insert a batch of rows in one call, deferring index maintenance to a
single rebuild per index (the bulk-load path) rather than a
per-row tree insert. bufs(k) is the row buffer for row k (filled
like db_insert's buf); row_ids(k) receives its assigned id.
All rows are validated (NULL-member skip, NaN reject, uniqueness
against the existing index and within the batch) before anything is
written, so a SQR_DUP / SQR_INVALID violation rejects the whole
batch with nothing inserted (row_ids = 0). row_ids must be at
least size(bufs) long.
Walk a table's on-disk structures and check they agree: the live-row
recount matches live_count, next_id covers every written record,
every live non-NULL-member row is present in each index, every index
entry points at a live row whose key matches, and a unique index has
no duplicate live keys. Read-only. SQR_OK if consistent,
SQR_INVALID (with errmsg describing the first problem) otherwise.
Fetch a row by natural key. Resolves the unique index over
col_names, finds the live row whose key columns in keyrow
match, and copies it into buf. keyrow is a row-shaped
buffer the caller filled with just the key columns via the
row_set_* helpers. row_id optionally returns the resolved
live row's id (0 if not resolved) so the caller can follow up
with row-id-keyed operations such as db_get_text.
Update a row by natural key (resolve via the unique index,
then delegate to db_update).
Delete a row by natural key (resolve via the unique index,
then delegate to db_delete).
Equality lookup of the first live row whose indexed int32
column equals key.
Equality lookup on an indexed real64 column.
Exact, bit-for-bit equality — deliberately no epsilon. Storage
is a pure binary transfer with no decimal round-trip, so the
same real64 value that was inserted matches; a value the
caller recomputes differently (0.1+0.2 vs a stored 0.3)
will not — that is inherent to floating point. Tolerance
matching is a range query, not an equality lookup.
Equality lookup on an indexed DT_CHAR column. The key is
NUL-padded to the column width before comparison.
Open an ascending cursor over every live row, in the key order of an
index on col_name: an exact single-column index if one exists,
otherwise a composite index whose leading member is col_name
(its B+-tree order is primarily by that member). The whole-index
complement to db_find_range; pull rows with db_cursor_next. Fails
with SQR_NOT_FOUND if the table has no such index. NULL-member rows
are not in the index and so are never yielded.
int32 band overload of db_find_range.
real64 band overload of db_find_range.
DT_CHAR band overload of db_find_range (bounds NUL-padded to
the column width).
Yield the next live row at or after the cursor, in ascending key
order, advancing past it. ok is .false. (with stat == SQR_OK)
when the cursor is exhausted — for db_find_range, when the band's
upper bound is passed — and row_id/buf are then unset.
Allocate a zeroed row buffer of n bytes.
Zero an existing row buffer in place.
Read the status byte (ROW_ALIVE / ROW_TOMBSTONE).
Write the status byte.
Mark col NULL in the row's bitmap. A NULL column reads back as
absent and is omitted from any index it is a member of (a row with
any NULL index member is simply not in that index).
Clear col's NULL bit (mark it as carrying a value). The
row_set_int / row_set_real / row_set_char helpers do this
implicitly, so this is only needed to un-NULL without writing a value.
.true. if col is NULL in this row.
Pack an int32 value into a DT_INT column slot.
Unpack an int32 value from a DT_INT column slot.
Pack a real64 value into a DT_REAL column slot.
Unpack a real64 value from a DT_REAL column slot.
Store a string into a DT_CHAR column slot (NUL-padded,
truncated to the column width).
Read a string from a DT_CHAR column slot (up to the first
NUL).
Open an explicit transaction. Thin façade over txn_begin that
also marks the in-flight txn as user-owned so the auto-commit
brackets leave it open and so re-entry is detected. No nesting in
v1: a db_begin while a transaction is already in flight fails
SQR_INVALID. Maps onto SQL BEGIN.
Commit the explicit transaction opened by db_begin, keeping every
change and discarding the undo set. Fails SQR_INVALID if no
explicit transaction is in flight. Maps onto SQL COMMIT.
Roll back the explicit transaction opened by db_begin, restoring
every base file and in-memory counter to its pre-db_begin state.
Fails SQR_INVALID if no explicit transaction is in flight. Maps
onto SQL ROLLBACK.
Begin a transaction. Clears the in-memory undo set and marks the
journal header invalid (reusing the file). Lazily creates and
pre-sizes <db>/_journal.dat on the first transaction of a
session. Fails SQR_READONLY on a read-only handle.
Also installs the rollback journal hook on every live index tree, so
their B+-tree page writes capture undo records. db is target so
each hook context can hold a lasting pointer back to the handle — the
caller's db_t must therefore have the target attribute for
journalling to work.
Capture the original bytes of an in-place overwrite before the
caller performs it. Idempotent per (path, offset, length) within
a transaction. path is relative to the database directory.
When bytes is supplied it is taken as the pre-image directly (the
caller already holds a consistent view of the region, e.g. read via
the same unit it is about to write); otherwise the region is read
back from the file. When bytes is present length is ignored and
len(bytes) is used.
Capture a file's original length before the caller appends to or
grows it; rollback truncates the appended bytes away. Idempotent
per path within a transaction.
Arm the journal (make it hot): serialise the undo set to the file,
write a valid header with count + checksum, and fsync. Must be
called after all jrnl_log_* and before any base-file write, so a
crash between here and commit is recoverable.
Commit: the durable commit point. Zeroes the journal header and
fsyncs it, so recovery sees nothing to do. The caller must have
already fsynced its base-file writes.
Roll back the active transaction from the in-memory undo set:
restore captured regions, truncate extended files, fsync, then
invalidate the journal. Used on a same-process failure path.
Recover at open: if a hot (valid) journal exists, replay its undo
records in reverse to restore the pre-transaction state, fsync,
then invalidate it. A missing, empty, invalidated or corrupt
journal is a no-op success.
.true. if a hot (valid, un-committed) journal is present on disk —
a read-only probe that writes nothing, used by a read-only db_open
to refuse a database that needs recovery it cannot perform. An
absent, voided or unreadable journal reports .false..
bt_journal_hook implementation that records a B+-tree page write in
the rollback journal. Install it on a tree with bt_set_journal_hook,
passing a bt_jhook_ctx_t as the context. An in-place overwrite
(is_new = .false.) is captured as a region with the tree's own
pre-image old_bytes (a consistent view — see jrnl_log_region's
bytes); a freshly allocated page (is_new = .true.) is captured as
an extend of the tree file. A non-SQR_OK journal result (or a
foreign context) returns a non-zero stat, which aborts the page
write so an un-recorded overwrite never reaches disk.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(*), | intent(in) | :: | ctx |
A |
||
| integer(kind=int64), | intent(in) | :: | offset |
1-based byte position of the page |
||
| character(len=*), | intent(in) | :: | old_bytes |
Page pre-image (empty if |
||
| logical, | intent(in) | :: | is_new |
Page newly allocated this txn |
||
| integer, | intent(out) | :: | stat |
|
Signature of a db_scan callback. Invoked once per live row;
set stop to .true. to end the scan early. The scanning db
is passed through so the callback can resolve DT_TEXT columns
for the current row via db_get_text(db, table, row_id, ...) —
the in-row buf holds only the blob descriptor, not the text.
The callback must not make structural changes to db (create or
drop a table) during the scan, as that would invalidate the scan
in progress; reading rows / text and mutating row data are fine.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(db_t), | intent(inout) | :: | db |
The database being scanned (for TEXT resolution) |
||
| integer(kind=int32), | intent(in) | :: | row_id |
Row id of the current row |
||
| character(len=*), | intent(in) | :: | buf |
The row's record buffer (read-only) |
||
| class(*), | intent(inout) | :: | ctx |
Opaque caller context, threaded through unchanged |
||
| logical, | intent(out) | :: | stop |
Set |
| Type | Visibility | Attributes | Name | Initial | |||
|---|---|---|---|---|---|---|---|
| character(len=SQR_NAME_LEN), | public | :: | name | = | '' |
Column name |
|
| integer, | public | :: | dtype | = | 0 |
One of |
|
| integer, | public | :: | csize | = | 0 |
Bytes on disk |
|
| integer, | public | :: | offset | = | 0 |
1-based byte offset within the record |
|
| integer, | public | :: | null_bit | = | 0 |
0-based bit ordinal in the per-row NULL bitmap |
| Type | Visibility | Attributes | Name | Initial | |||
|---|---|---|---|---|---|---|---|
| integer, | public | :: | ncols | = | 0 |
Number of member columns (1 = single-column) |
|
| character(len=SQR_NAME_LEN), | public, | allocatable | :: | columns(:) |
Ordered member names |
||
| integer, | public, | allocatable | :: | col_idx(:) |
Index of each member into the owning |
||
| integer, | public, | allocatable | :: | key_off(:) |
1-based offset of each member within the key |
||
| integer, | public | :: | key_size | = | 0 |
Sum of member |
|
| integer, | public | :: | nentries | = | 0 |
Cached live-entry count, mirrored from |
|
| type(btree_t), | public | :: | bt |
On-disk B+-tree mapping the key to the |
|||
| logical, | public | :: | unique | = | .false. |
Enforce no duplicate live keys |
|
| class(*), | public, | pointer | :: | jctx | => | null() |
Heap-owned |
| Type | Visibility | Attributes | Name | Initial | |||
|---|---|---|---|---|---|---|---|
| character(len=SQR_NAME_LEN), | public | :: | name | = | '' |
Table name |
|
| integer, | public | :: | ncols | = | 0 |
Number of columns |
|
| type(column_t), | public, | allocatable | :: | cols(:) |
Column definitions |
||
| integer, | public | :: | record_size | = | 0 |
Fixed record size in bytes |
|
| integer, | public | :: | next_id | = | 1 |
Next row_id to assign |
|
| integer, | public | :: | live_count | = | 0 |
Number of non-tombstoned rows |
|
| integer, | public | :: | schema_version | = | 0 |
On-disk format version of this table |
|
| integer, | public | :: | unit | = | -1 |
Open unit for |
|
| integer, | public | :: | nindices | = | 0 |
Number of secondary indices |
|
| type(index_t), | public, | allocatable | :: | indices(:) |
Secondary indices |
||
| integer, | public | :: | blob_unit | = | -1 |
Open unit for |
|
| integer(kind=int64), | public | :: | blob_next | = | 1_int64 |
Next blob append position (1-based) |
| Type | Visibility | Attributes | Name | Initial | |||
|---|---|---|---|---|---|---|---|
| character(len=:), | public, | allocatable | :: | path |
|
||
| integer, | public | :: | unit | = | -1 |
Open stream unit, -1 if not open |
|
| logical, | public | :: | active | = | .false. |
A transaction is in flight |
|
| logical, | public | :: | explicit | = | .false. |
The in-flight txn was opened by |
|
| logical, | public | :: | armed | = | .false. |
Undo records are durable (journal is hot) |
|
| logical, | public | :: | sized | = | .false. |
File created + pre-sized this session |
|
| integer(kind=int64), | public | :: | capacity | = | 0 |
Pre-allocated size in bytes |
|
| integer, | public | :: | nrec | = | 0 |
Live undo-record count for the current txn |
|
| type(undo_rec_t), | public, | allocatable | :: | recs(:) |
In-memory undo set for the current txn |
||
| type(tbl_snap_t), | public, | allocatable | :: | snaps(:) |
Per-table counter snapshot, by table position, for the current txn |
Object-oriented spelling of the db_* operations: call db%insert(...)
is exactly call db_insert(db, ...). The free db_* procedures remain
public and callable unchanged; these bindings are a thin alternative
face on the same module procedures (which is why the passed-object
db argument is class(db_t) throughout).
| Type | Visibility | Attributes | Name | Initial | |||
|---|---|---|---|---|---|---|---|
| character(len=:), | public, | allocatable | :: | dir |
Database directory path |
||
| type(table_t), | public, | allocatable | :: | tables(:) |
Open tables |
||
| integer, | public | :: | ntables | = | 0 |
Number of open tables |
|
| logical, | public | :: | opened | = | .false. |
|
|
| logical, | public | :: | readonly | = | .false. |
|
|
| integer, | public | :: | generation | = | 0 |
Bumped by every mutating call; cursors snapshot it |
|
| integer(kind=c_int64_t), | public | :: | lock_tok | = | -1 |
Advisory-lock token held while open (-1 = none) |
|
| type(journal_t), | public | :: | jrnl |
Rollback journal state |
| procedure, public :: open => db_open | |
| procedure, public :: close => db_close | |
| procedure, public :: set_readonly => db_set_readonly | |
| procedure, public :: create_table => db_create_table | |
| procedure, public :: drop_table => db_drop_table | |
| procedure, public :: add_column => db_add_column | |
| procedure, public :: drop_column => db_drop_column | |
| procedure, public :: compact => db_compact | |
| procedure, public :: list_tables => db_list_tables | |
| procedure, public :: table_index => db_table_index | |
| procedure, public :: insert => db_insert | |
| procedure, public :: insert_many => db_insert_many | |
| procedure, public :: get => db_get | |
| procedure, public :: update => db_update | |
| procedure, public :: delete => db_delete | |
| procedure, public :: scan => db_scan | |
| procedure, public :: verify => db_verify | |
| procedure, public :: set_text => db_set_text | |
| procedure, public :: get_text => db_get_text | |
| procedure, public :: find_by_int => db_find_by_int | |
| procedure, public :: find_by_real => db_find_by_real | |
| procedure, public :: find_by_char => db_find_by_char | |
| procedure, public :: get_by_key => db_get_by_key | |
| procedure, public :: update_by_key => db_update_by_key | |
| procedure, public :: delete_by_key => db_delete_by_key | |
| procedure, public :: open_cursor => db_open_cursor | |
| procedure, public :: cursor_next => db_cursor_next | |
| procedure, public :: begin => db_begin | |
| procedure, public :: commit => db_commit | |
| procedure, public :: rollback => db_rollback | |
| generic, public :: create_index => create_index_1, create_index_m | |
| generic, public :: drop_index => drop_index_1, drop_index_m | |
| generic, public :: find_range => find_range_int, find_range_real, find_range_char |
| Type | Visibility | Attributes | Name | Initial | |||
|---|---|---|---|---|---|---|---|
| integer, | public | :: | ti | = | 0 |
Owning table slot in |
|
| integer, | public | :: | j | = | 0 |
Index slot in the owning table's |
|
| type(bt_cursor_t), | public | :: | bt |
Underlying B+-tree cursor position |
|||
| logical, | public | :: | bounded | = | .false. |
|
|
| character(len=:), | public, | allocatable | :: | hikey |
Inclusive upper-bound key bytes |
||
| logical, | public | :: | active | = | .false. |
|
|
| integer, | public | :: | gen | = | -1 |
|
| Type | Visibility | Attributes | Name | Initial | |||
|---|---|---|---|---|---|---|---|
| type(db_t), | public, | pointer | :: | db | => | null() |
Database whose journal logs the undo |
| character(len=:), | public, | allocatable | :: | rel |
Tree file, relative to |