Skalex v4: Architecture

Internal reference for contributors. Describes the current alpha.2 design of every module, data structure, and subsystem.

Directory Layout
Design Principles
Module Responsibilities
Typed Error Hierarchy
Document Shape
Return Shapes
Storage Adapter Interface
Collection Store Object
CollectionRegistry
MutationPipeline
PersistenceManager
IndexEngine
Query Engine
Schema Validator
TTL Engine
TransactionManager
AI Adapter Factory
MigrationEngine
Auto-Connect
Namespace
Build Pipeline
Test Strategy
Embedding Adapter Interface
Vector Storage & Stripping
Vector Search Engine

1. Directory Layout

src/
  index.js                  - Skalex class (database entry point)
  index.d.ts                - TypeScript declarations (source of truth)
  engine/
    collection.js           - Collection class (per-collection CRUD)
    query.js                - matchesFilter + presortFilter
    indexes.js              - IndexEngine (secondary field + compound indexes)
    validator.js            - parseSchema, validateDoc, inferSchema, stripInvalidFields
    ttl.js                  - computeExpiry, sweep
    migrations.js           - MigrationEngine
    vector.js               - cosineSimilarity, stripVector
    utils.js                - generateUniqueId, logger, resolveDotPath
    errors.js               - Typed error hierarchy (SkalexError and subclasses)
    persistence.js          - PersistenceManager (load, save, dirty tracking, write coalescing)
    transaction.js          - TransactionManager (lazy snapshots, timeout, rollback)
    registry.js             - CollectionRegistry (store creation, instance caching, inspection)
    pipeline.js             - MutationPipeline (shared pre/post mutation lifecycle)
    adapters.js             - AI adapter factory functions (embedding + LLM)
  features/
    aggregation.js          - count, sum, avg, groupBy
    changelog.js            - ChangeLog class (append-only mutation log)
    events.js               - EventBus (cross-runtime pub/sub)
    memory.js               - Memory class (agent episodic store)
    ask.js                  - natural language -> filter via LLM
    plugins.js              - PluginEngine (pre/post hooks)
    session-stats.js        - SessionStats (per-session read/write tracking)
    query-log.js            - SlowQueryLog (threshold-based ring buffer)
  connectors/
    storage/
      base.js               - StorageAdapter abstract class
      fs.js                 - FsAdapter (Node.js file system, atomic writes, gz/json)
      local.js              - LocalStorageAdapter (browser localStorage)
      encrypted.js          - EncryptedAdapter (AES-256-GCM wrapper)
      bun-sqlite.js         - BunSQLiteAdapter (bun:sqlite)
      d1.js                 - D1Adapter (Cloudflare D1 / Workers)
      libsql.js             - LibSQLAdapter (LibSQL / Turso)
    embedding/
      base.js               - EmbeddingAdapter abstract class
      openai.js             - OpenAIEmbeddingAdapter (text-embedding-3-small default)
      ollama.js             - OllamaEmbeddingAdapter (nomic-embed-text default)
    llm/
      base.js               - LLMAdapter abstract class
      openai.js             - OpenAILLMAdapter (gpt-4o-mini default)
      anthropic.js          - AnthropicLLMAdapter (claude-haiku-4-5 default)
      ollama.js             - OllamaLLMAdapter (llama3.2 default)
    mcp/
      index.js              - SkalexMCPServer (stdio + HTTP/SSE)
      protocol.js           - JSON-RPC 2.0 helpers
      tools.js              - MCP tool definitions
      transports/
        stdio.js            - StdioTransport
        http.js             - HttpTransport (HTTP + SSE)

dist/                       - generated by `npm run build`, not committed to git
  skalex.esm.js             - ESM build (readable)
  skalex.esm.min.js         - ESM build (minified)
  skalex.cjs                - CJS build (readable)
  skalex.min.cjs            - CJS build (minified)
  skalex.browser.js         - Browser ESM build (node:* built-ins stubbed)
  skalex.d.ts               - TypeScript declarations (copied from src/index.d.ts)

tests/
  helpers/
    MemoryAdapter.js        - In-memory StorageAdapter for CI (no I/O)
    MockEmbeddingAdapter.js - Deterministic 4-dim vectors for unit tests
    MockLLMAdapter.js       - Configurable nlQuery -> filter map
    MockTransport.js        - In-memory MCP transport for unit tests
  unit/
    aggregation.test.js
    ask.test.js
    changelog.test.js
    encryption.test.js
    events.test.js
    indexes.test.js
    mcp.test.js
    memory.test.js
    migrations.test.js
    plugins.test.js
    query.test.js
    session-stats.test.js
    ttl.test.js
    utils.test.js
    validator.test.js
    vector.test.js
  integration/
    collection-features.test.js
    correctness-hardening.test.js
    data-integrity.test.js
    engine-overhaul.test.js
    persistence-coherence.test.js
    skalex-core.test.js
    skalex.test.js
  smoke/
    node.test.cjs           - CJS dist smoke test (Node.js >=18)
    bun.test.js             - ESM dist smoke test (Bun)
    bun-sqlite.test.js      - BunSQLiteAdapter smoke test
    deno.test.js            - ESM dist smoke test (Deno 2.x)
    browser.test.js         - Headless Chromium runner (Playwright)
    browser.html            - Browser smoke test page
    browser-umd.html        - UMD browser smoke test page

scripts/
  mcp-server.js             - Runnable MCP server for Claude Desktop / Cursor
  run-deno.js               - Cross-platform deno binary resolver

2. Design Principles

Zero dependencies in core: the engine modules (src/engine/) install nothing. All imports are internal or node:* built-ins. Only devDependencies in package.json.
Adapter-isolated I/O: no module in src/ imports fs or localStorage directly. All I/O is routed through the injected StorageAdapter.
ESM source, dual dist: source files use import/export (native ESM, "type":"module"). Rollup produces ESM, CJS, and browser dist artifacts for consumers.
In-memory first: all data lives in plain JS arrays and Maps. The storage adapter is called on connect(), disconnect(), explicit saveData(), or when { save: true } is passed to a mutation (or autoSave is enabled).
Auto-connect: the first operation on a Skalex instance automatically calls connect() if it hasn't been called yet. connect() is idempotent via a shared _connectPromise.
Per-collection write queue with durability guarantee: each collection store tracks isSaving, _pendingSave, and _dirty. Concurrent saves to the same collection are serialised - the second writer creates a waiter promise (_saveWaiters) and the first writer re-runs _saveOne after its own flush completes, so no write is ever dropped.
Typed errors: every engine throw uses a subclass of SkalexError with a stable code property (ERR_SKALEX_<SUBSYSTEM>_<SPECIFIC>). Consumers can handle errors programmatically without parsing message strings.
Consistent return shapes: insertOne/updateOne/deleteOne/upsert return doc or null. insertMany/updateMany/deleteMany return doc[]. find returns { docs: [] } with optional { page, totalDocs, totalPages } when paginated.

3. Module Responsibilities

Module	File	Responsibility
Skalex	`src/index.js`	Lifecycle (`connect`/`disconnect`), collection facade, migrations, transactions, seeding, namespaces, import/export, debug logging, serialization (BigInt/Date-safe)
Collection	`src/engine/collection.js`	All CRUD operations, upsert, find with sort/pagination/populate/select, export, search, similar, index maintenance around mutations
CollectionRegistry	`src/engine/registry.js`	Store creation, lazy instance caching, renames, inspection, schema access, index building
MutationPipeline	`src/engine/pipeline.js`	Shared mutation lifecycle: ensureConnected, txSnapshot, beforePlugin, mutate, markDirty, save, changelog, sessionStats, event, afterPlugin
PersistenceManager	`src/engine/persistence.js`	Load orchestration, save/saveDirty/saveAtomic, dirty tracking, write coalescing, flush sentinel, orphan temp-file cleanup, save mutex
TransactionManager	`src/engine/transaction.js`	Transaction scope, lazy copy-on-first-write snapshots, timeout/abort, stale proxy detection, deferred side effects, rollback
QueryEngine	`src/engine/query.js`	Filter evaluation (`matchesFilter`), filter key ordering (`presortFilter`), deep equality for plain objects
IndexEngine	`src/engine/indexes.js`	Secondary field indexes, compound indexes, unique constraint enforcement, batch validation
Validator	`src/engine/validator.js`	Schema parsing, document validation, schema inference, field stripping
TTL	`src/engine/ttl.js`	TTL parsing, expiry computation, expired-document sweep
Vector	`src/engine/vector.js`	`cosineSimilarity(a, b)`, `stripVector(doc)`
Errors	`src/engine/errors.js`	Typed error hierarchy: `SkalexError`, `ValidationError`, `UniqueConstraintError`, `TransactionError`, `PersistenceError`, `AdapterError`, `QueryError`
Adapters	`src/engine/adapters.js`	Factory functions `createEmbeddingAdapter()` and `createLLMAdapter()` - pure config-to-instance mappers
Migrations	`src/engine/migrations.js`	Migration registration, version ordering, pending-migration execution, status reporting
Utils	`src/engine/utils.js`	`generateUniqueId()` (27-char timestamp+random), `logger()`, `resolveDotPath()` (with prototype-pollution guard)
StorageAdapter	`src/connectors/storage/base.js`	Abstract class: `read`, `write`, `delete`, `list`, `writeAll`
FsAdapter	`src/connectors/storage/fs.js`	Node.js adapter: gz-compressed or raw JSON files, atomic temp-then-rename writes
LocalStorageAdapter	`src/connectors/storage/local.js`	Browser adapter: `localStorage` with namespaced keys
EmbeddingAdapter	`src/connectors/embedding/base.js`	Abstract class: `embed(text) -> number[]`
LLMAdapter	`src/connectors/llm/base.js`	Abstract class: `generate(schema, nlQuery) -> filter`

4. Typed Error Hierarchy

File: src/engine/errors.js

All engine errors extend SkalexError, which extends Error and adds code (string) and details (object) properties.

SkalexError
  +-- ValidationError       - Schema parsing or document validation failure
  +-- UniqueConstraintError - Insert or update violates a unique field constraint
  +-- TransactionError      - Transaction timeout, abort, or rollback failure
  +-- PersistenceError      - Load, save, serialization, or flush failure
  +-- AdapterError          - Storage or AI adapter misconfiguration
  +-- QueryError            - Query filter, operator, or execution failure

Code convention: ERR_SKALEX_<SUBSYSTEM>_<SPECIFIC> (e.g. ERR_SKALEX_VALIDATION_REQUIRED, ERR_SKALEX_TX_TIMEOUT).

Error types are exposed as static properties on the Skalex class (Skalex.ValidationError, etc.) for CJS/UMD consumers, and as named exports for ESM consumers.

5. Document Shape

Every document stored by Skalex has the following reserved fields:

Field	Type	Set by
`_id`	`string` (27 chars: hex timestamp + random bytes)	`insertOne` / `insertMany`. User-supplied `_id` is preserved.
`createdAt`	`Date`	`insertOne` / `insertMany`. Always overwritten - user values are replaced.
`updatedAt`	`Date`	`insertOne` / `insertMany` / `applyUpdate`. Always overwritten - user values are replaced.
`_expiresAt`	`Date` \| `undefined`	`insertOne` / `insertMany` when `{ ttl }` or `defaultTtl` is set
`_version`	`number` \| `undefined`	`insertOne` (starts at `1`), incremented by `applyUpdate` - only when `versioning: true`
`_deletedAt`	`Date` \| `undefined`	`deleteOne` / `deleteMany` - only when `softDelete: true`
`_vector`	`number[]` \| `undefined`	`insertOne` / `insertMany` when `{ embed }` or `defaultEmbed` is set - never returned to callers

User-supplied fields spread after _id, createdAt, updatedAt. The _id field respects caller-supplied values (allowing custom IDs), but createdAt and updatedAt are always engine-controlled.

6. Return Shapes

All mutation and query methods return plain objects. No raw documents are returned directly.

Method	Success return	Not-found return
`insertOne`	`doc`	-
`insertMany`	`doc[]`	-
`updateOne`	`doc`	`null`
`updateMany`	`doc[]`	`[]`
`deleteOne`	`doc`	`null`
`deleteMany`	`doc[]`	`[]`
`findOne`	`doc` (projected copy)	`null`
`find` (no limit)	`{ docs: doc[] }`	`{ docs: [] }`
`find` (with limit)	`{ docs, page, totalDocs, totalPages }`	same, empty `docs`
`upsert`	`doc`	- (always inserts or updates)
`upsertMany`	`doc[]`	- (always inserts or updates)
`restore`	`doc`	`null`

7. Storage Adapter Interface

File: src/connectors/storage/base.js

All storage backends implement the StorageAdapter abstract class:

class StorageAdapter {
  async read(name)              // -> string | null
  async write(name, data)       // -> void
  async delete(name)            // -> void
  async list()                  // -> string[]
  async writeAll(entries)       // -> void  (batch write)
}

name is a plain collection identifier (no path separators). The adapter maps it to whatever its storage scheme requires (file path, localStorage key prefix, etc.).

`writeAll(entries)`

Batch write method. The base class provides a sequential fallback (for...of write()). Adapters that support atomic batches (SQL-backed) override this with a single transaction. Used by PersistenceManager.saveAtomic() during transaction commit.

FsAdapter extras

FsAdapter additionally exposes helpers used by Collection.export and Skalex.import:

join(dir, file)             // path.join equivalent
ensureDir(dir)              // mkdir -p equivalent (sync)
async writeRaw(path, data)  // write raw string to an arbitrary path
async readRaw(path)         // read raw string from an arbitrary path

These are not part of the StorageAdapter contract; they are FsAdapter-specific and guarded by duck-typing at the call site.

Writing a custom adapter

Extend StorageAdapter and pass the instance to the constructor:

import Skalex from "skalex";

class MyAdapter extends StorageAdapter {
  async read(name) { ... }
  async write(name, data) { ... }
  async delete(name) { ... }
  async list() { ... }
  async writeAll(entries) { ... }  // optional: override for atomic batch
}

const db = new Skalex({ adapter: new MyAdapter() });

8. Collection Store Object

Skalex.collections[name] is a plain object - the single source of truth for a collection's in-memory state:

{
  collectionName: string,
  data:           object[],          // ordered array of all documents
  index:          Map<_id, doc>,     // O(1) _id lookup
  isSaving:       boolean,           // per-collection write lock
  _pendingSave:   boolean,           // a second save was requested while isSaving
  _dirty:         boolean,           // needs persistence (set by markDirty, cleared after save)
  _saveWaiters:   Promise[],         // callers waiting for the current write to finish
  schema:         { fields: Map, uniqueFields: string[] } | null,
  rawSchema:      object | null,     // original schema definition before parsing
  fieldIndex:     IndexEngine | null,

  // optional collection-level behaviours (from createCollection opts):
  softDelete:     boolean,
  versioning:     boolean,
  strict:         boolean,
  onSchemaError:  "throw"|"warn"|"strip",
  defaultTtl:     string | number | null,
  defaultEmbed:   string | null,
  maxDocs:        number | null,
  changelog:      boolean,
}

Collection instances hold a reference to this object (this._store) and expose it via getters. Multiple Collection instances for the same name share the same store object; mutations are immediately visible across references.

The rawSchema field preserves the original schema definition so it can be round-tripped through persistence. The _dirty flag is set by PersistenceManager.markDirty() after every mutation and cleared after a successful save.

9. CollectionRegistry

File: src/engine/registry.js

Owns collection definitions, store creation, instance caching, renames, inspection, and metadata access.

get(name, db): returns a cached Collection instance or lazily creates one (and its backing store).
create(name, options, db): defines a collection with schema, indexes, and behaviour options. Calls createStore() then wraps in a Collection instance.
createStore(name, options): allocates the raw store object with all fields initialised (including _dirty: false, rawSchema, and _pendingSave: false). Parses the schema, creates an IndexEngine if indexes or unique fields are declared.
rename(from, to): renames a collection in-memory. Updates the store's collectionName, re-keys the stores map, and migrates the cached instance.
buildIndex(data, keyField): builds a Map<keyField, doc> from a data array.
schema(name): returns the parsed schema or infers one from the first document.
inspect(name), stats(name), dump(): metadata and diagnostic methods.

10. MutationPipeline

File: src/engine/pipeline.js

Extracts the shared pre/post mutation lifecycle so each CRUD method only defines its operation-specific logic. Every write operation (insertOne, updateOne, deleteOne, etc.) delegates to pipeline.execute().

Lifecycle

ensureConnected
  -> txSnapshot (copy-on-first-write if inside a transaction)
  -> assertTxAlive (reject stale continuations from aborted transactions)
  -> beforePlugin hook
  -> [mutation] (caller-defined: the actual data change)
  -> markDirty
  -> save (if { save: true } or autoSave)
  -> changelog entry
  -> sessionStats.recordWrite
  -> event emission (via EventBus)
  -> afterPlugin hook

Transaction awareness

The pipeline checks _activeTxId on the Collection instance to determine whether a write is part of the active transaction. Two guards prevent stale mutations:

entryTxId: the transaction ID active when execute() was called.
_createdInTxId: the transaction ID active when the Collection instance was created via the proxy.

If either ID appears in TransactionManager._abortedIds, the mutation throws ERR_SKALEX_TX_ABORTED.

Side effects (events, after-hooks, changelog) during a transaction are deferred via txManager.defer() and flushed only after successful commit.

11. PersistenceManager

File: src/engine/persistence.js

Owns all load/save orchestration, dirty tracking, write-queue coalescing, flush sentinel management, and orphan temp-file cleanup.

Loading (`loadAll`)

Lists all stored collection names via adapter.list().
Reads and deserialises each collection in parallel.
Merges persisted config with pre-existing createCollection config (in-memory config takes precedence).
Builds the _id index and field indexes from loaded data.
Detects incomplete flushes via the flush sentinel.
Cleans orphan temp files left by interrupted atomic writes.

The lenientLoad option allows skipping corrupt collections instead of throwing.

Saving

Three save strategies:

Method	Semantics	Used by
`save(collections, name?)`	Best-effort: each collection written independently. One failure does not block others.	`db.saveData()`, `{ save: true }` mutations
`saveDirty(collections)`	Same as `save` but only writes collections with `_dirty === true`.	Implicit flush paths
`saveAtomic(collections, names)`	Batch write via `adapter.writeAll()`. Includes `_meta` with flush sentinel. Serialised via `_saveLock`.	Transaction commit

Dirty tracking

markDirty(collections, name) sets _dirty = true on the store. After a successful write, the flag is cleared. This ensures no mutation is silently lost.

Write coalescing

Concurrent saves for the same collection are serialised by the per-collection isSaving / _pendingSave mechanism. The second caller sets _pendingSave = true; when the first writer finishes, it re-runs _saveOne with the latest data. Callers can await the _saveWaiters promise to know their data has been flushed.

Save mutex (`_saveLock`)

saveAtomic() acquires the instance-level _saveLock (a promise chain) to serialize concurrent atomic saves. Regular save()/saveDirty() do not acquire this lock - they rely on per-collection coalescing.

Flush sentinel

Before an atomic batch write, a sentinel is written into the _meta collection. After a successful batch, the sentinel is cleared. On next load, if the sentinel is still present, the engine knows a previous flush was interrupted.

12. IndexEngine

File: src/engine/indexes.js

Maintains Map-based indexes for declared fields. Three index types:

_fieldIndexes:    Map<field, Map<value, Set<doc>>>  - non-unique + unique
_uniqueIndexes:   Map<field, Map<value, doc>>       - unique fields only
_compoundIndexes: Map<key, { fields, map }>         - multi-field compound indexes

Compound indexes

Declared as arrays in the indexes option: indexes: [["field1", "field2"]]. Values are encoded into a stable tuple key via encodeTuple() (type-tagged to prevent cross-type collisions). Compound index fields must be scalar values (string, number, or boolean) - non-scalar values (objects, arrays) are rejected with ERR_SKALEX_VALIDATION_COMPOUND_INDEX.

Dot-notation rejection

Index field names cannot contain dots. The index engine uses direct property access (doc[field]), not resolveDotPath(), so dot-path fields would produce false negatives. _validateFieldName() throws ERR_SKALEX_VALIDATION_INDEX_DOT_PATH if a field name contains ..

Unique enforcement

Single-document: _checkUnique(doc, excludeDoc) verifies no other document holds the same value for any unique field. Throws ERR_SKALEX_UNIQUE_VIOLATION.
Batch insert: assertUniqueBatch(newDocs) validates an entire batch against the existing index and within the batch itself (intra-batch duplicate detection).
Batch update: assertUniqueCandidates(oldDocs, newDocs) validates staged updates before any live index mutation. Excludes the documents being updated from the conflict set.

Index rollback

update(oldDoc, newDoc) wraps the index mutation in a try/catch. If indexing newDoc fails (e.g. unique violation), the old doc is restored in the index.

Lookup methods

lookup(field, value) - returns object[] (array materialised for public API).
_lookupIterable(field, value) - returns a read-only iterable wrapping the backing Set (avoids allocation in internal scan paths).
lookupCompound(fieldValues) - returns matching docs for a multi-field equality match.

Lifecycle

add(doc) - checks unique then indexes.
remove(doc) - removes from all three index types.
update(oldDoc, newDoc) - checks unique, removes old, indexes new (with rollback).
buildFromData(data) - clears and rebuilds all indexes from scratch (used after load and transaction rollback).

13. Query Engine

File: src/engine/query.js

`matchesFilter(item, filter)`

Evaluation order (short-circuits on first false):

Function filter: typeof filter === "function" - checked before the empty-filter guard because functions are objects with zero enumerable keys.
Empty filter: {} or null/undefined matches everything.
Logical operators (evaluated before field-level checks):
- $or - array of sub-filters, at least one must match.
- $and - array of sub-filters, all must match.
- $not - single sub-filter, must not match.
AND over all remaining keys: every field key must pass.

Per-key logic:

Filter value type	Evaluation
`RegExp`	`filterValue.test(String(itemValue))`
Object with `$`-keys	Operator dispatch: `$eq $ne $gt $gte $lt $lte $in $nin $regex $fn`
Plain object (no `$`-keys)	Deep structural equality via `deepEqual()`
Anything else	Strict equality `itemValue === filterValue`

`deepEqual(a, b)`

Structural deep equality for plain values. Handles: primitives, null, undefined, plain objects, arrays, Date, RegExp. Circular references are out of scope (engine data is JSON-serializable).

Dot-notation

Keys like "address.city" are resolved by resolveDotPath() from src/engine/utils.js, which splits on . and walks the object. Prototype-pollution keys (__proto__, constructor, prototype) return undefined.

`presortFilter(filter, indexedFields)`

Reorders filter keys for optimal short-circuit evaluation:

Indexed fields: O(1) lookup, often handled by _getCandidates before matchesFilter runs
Plain equality: fast strict comparison
Range operators: $gt $gte $lt $lte $ne $in $nin
Expensive: $regex $fn and RegExp values
Logical operators: $or $and $not (preserved at the end)

Returns a new object with keys in this order. Called in find() before the main scan loop.

14. Schema Validator

File: src/engine/validator.js

`parseSchema(schema)`

Normalises a user-supplied schema definition into an internal form:

// Input
{ name: "string", email: { type: "string", unique: true, required: true } }

// Output
{
  fields: Map<string, { type, required, unique, enum? }>,
  uniqueFields: string[]
}

Supported types: "string", "number", "boolean", "object", "array", "date", "any".

`validateDoc(doc, fields, strict = false)`

Checks a document against the parsed fields Map. Returns an array of error strings (empty = valid). Checks:

Required: field is undefined or null when required: true
Type: uses typeof, with special handling for Array -> "array" and Date -> "date"
Enum: value must be in the allowed list
Unknown fields (when strict = true): any non-_-prefixed key not declared in the schema is flagged

`stripInvalidFields(doc, fields)`

Used by the "strip" onSchemaError strategy. Returns a shallow copy retaining only _-prefixed system fields and declared fields that pass type/enum checks. Never mutates the original document.

`inferSchema(doc)`

Derives a simple { field: type } schema from a sample document. Skips fields starting with _. Used by db.inspect() and db.schema() when no explicit schema was declared.

15. TTL Engine

File: src/engine/ttl.js

TTL formats

Input	Meaning
`300` (number)	300 seconds
`"300ms"`	300 milliseconds
`"30s"`	30 seconds
`"30m"`	30 minutes
`"24h"`	24 hours
`"7d"`	7 days

All values are converted to milliseconds internally. Non-positive values throw ERR_SKALEX_VALIDATION_TTL.

`computeExpiry(ttl)`

Returns new Date(Date.now() + parseTtl(ttl)). Stored as _expiresAt on the document.

`sweep(data, idIndex, removeFromIndexes?)`

Iterates the data array backwards (safe splice), removes any document where _expiresAt <= Date.now(), deletes from the _id Map index, and optionally calls IndexEngine.remove. Returns the count of removed documents.

Called in two places:

On connect() - one-shot sweep of every loaded collection.
Periodic sweep via ttlSweepInterval - when set in the constructor, connect() starts a setInterval that calls _sweepTtl() on every collection. The timer handle calls .unref() so it does not keep the Node.js process alive. Cleared in disconnect().

16. TransactionManager

File: src/engine/transaction.js

Overview

db.transaction(fn, { timeout }) provides lazy copy-on-first-write rollback with deferred side effects.

Lifecycle

Lock: transactions are serialised via _txLock (promise chain). No concurrent transactions.
Context creation: allocates a TransactionContext with a unique id, snapshots Map, touchedCollections Set, deferredEffects array, and aborted flag. Records preExisting collection names.
Proxy: fn receives a Proxy around the db instance, not the raw db. The proxy:
- Stamps _activeTxId on every Collection obtained via useCollection().
- Blocks direct db.collections access (throws ERR_SKALEX_TX_DIRECT_ACCESS).
- Detects stale proxy usage after the transaction ends (throws ERR_SKALEX_TX_STALE_PROXY).
Lazy snapshots: only collections that receive a write are snapshotted, on first mutation (via _txSnapshotIfNeeded() in the pipeline). Uses structuredClone for deep copy of the data array. Snapshot includes the _dirty flag state.
Timeout: if timeout > 0, a setTimeout races against fn. On timeout, ctx.aborted = true and the transaction is rolled back.
Commit: persists only touched collections via PersistenceManager.saveAtomic(). After successful persistence, flushes all deferred side effects (events, after-hooks, changelog entries).
Rollback: on error, restores snapshotted collections via _applySnapshot() (which rebuilds _id index and field indexes from the cloned data). Collections created inside the transaction are deleted. The _dirty flag is restored to its pre-transaction state.

Deferred side effects

During fn(), calls to _emitEvent(), _runAfterHook(), and _logChange() check txManager.defer(). If a transaction is active, the effect is pushed to ctx.deferredEffects instead of executing immediately. Effects are flushed in order after commit, or discarded on rollback.

Stale continuation detection

When a transaction is aborted (timeout or error), its id is added to _abortedIds. Any subsequent mutation from a Collection stamped with that id (via _activeTxId or _createdInTxId) throws ERR_SKALEX_TX_ABORTED.

17. AI Adapter Factory

File: src/engine/adapters.js

Pure config-to-instance mappers extracted from the Skalex constructor.

createEmbeddingAdapter(ai): switches on ai.provider ("openai" | "ollama") and returns the appropriate EmbeddingAdapter subclass.
createLLMAdapter(ai): switches on ai.provider ("openai" | "anthropic" | "ollama") and returns the appropriate LLMAdapter subclass.

Both throw AdapterError for unknown providers. The Skalex constructor also accepts pre-built adapter instances via embeddingAdapter and llmAdapter options, bypassing the factory.

18. MigrationEngine

File: src/engine/migrations.js

Migrations are registered via db.addMigration({ version, description?, up }) and stored sorted by version. On connect():

_getMeta() reads applied versions from the _meta collection.
MigrationEngine.run() filters to pending versions and calls up(collection) for each in order.
_saveMeta() writes the updated applied-versions list back to _meta.

The _meta collection is a regular Skalex collection with a single document keyed "migrations":

{ _id: "migrations", appliedVersions: [1, 2, 3] }

Duplicate version registration throws immediately. Version numbers must be positive integers.

19. Auto-Connect

_ensureConnected() is called at the top of every public operation (via the MutationPipeline or directly).

async connect() {
  if (this._connectPromise) return this._connectPromise;
  this._connectPromise = this._doConnect();
  return this._connectPromise;
}

The _connectPromise field makes connect() idempotent. Multiple concurrent callers before connect() resolves all await the same promise - no double-connect race. After connect() resolves, isConnected = true and _ensureConnected() short-circuits immediately.

20. Namespace

db.namespace(id) returns a new Skalex instance with path set to {parent.dataDirectory}/{safeId}. The id is sanitised to allow only alphanumeric, dash, and underscore characters.

Config inherited from the parent: format, debug, ai, encrypt, slowQueryLog, queryCache, plugins, memory, logger, autoSave, ttlSweepInterval, regexMaxLength, idGenerator, serializer, deserializer, and pre-built adapter instances (when no ai config is present).

The namespaced instance is fully independent: separate collections map, separate adapter instance pointing at the subdirectory. Requires the default FsAdapter - throws ERR_SKALEX_ADAPTER_NAMESPACE_REQUIRES_FS if a custom adapter was configured.

21. Build Pipeline

Tool: Rollup with @rollup/plugin-node-resolve, @rollup/plugin-commonjs, @rollup/plugin-terser.

Config file: rollup.config.js (native ESM via "type":"module" in package.json).

Five outputs:

File	Format	Minified
`dist/skalex.esm.js`	ESM	No
`dist/skalex.esm.min.js`	ESM	Yes
`dist/skalex.cjs`	CJS	No
`dist/skalex.min.cjs`	CJS	Yes
`dist/skalex.browser.js`	ESM (browser)	No

All outputs include source maps. Node built-ins (node:fs, node:path, node:zlib, node:crypto, node:os) are marked external. The browser build stubs them with empty objects via the nodeBrowserStubs() Rollup plugin.

TypeScript declarations are hand-written in src/index.d.ts and copied to dist/skalex.d.ts as part of the build script.

22. Test Strategy

Runner: Vitest (kept over Bun test because LocalStorageAdapter tests require a jsdom/browser environment, which Bun test does not support as of v1.x).

715 tests across 24 test files.

MemoryAdapter (`tests/helpers/MemoryAdapter.js`)

In-memory StorageAdapter for CI; no disk I/O, no temp files. Implements read/write/delete/list backed by Map<name, string>, plus stubs for Collection.export and Skalex.import. All integration tests inject a MemoryAdapter instance. No test touches the real file system.

Test files

File	Coverage
`tests/unit/query.test.js`	`matchesFilter`, `presortFilter`: all operators, `$or`/`$and`/`$not`, deep equality, edge cases
`tests/unit/indexes.test.js`	`IndexEngine`: add/remove/update/lookup/unique constraint, compound indexes, batch validation, dot-notation rejection
`tests/unit/validator.test.js`	`parseSchema`, `validateDoc` (incl. strict mode), `inferSchema`, `stripInvalidFields`
`tests/unit/ttl.test.js`	`parseTtl`, `computeExpiry`, `sweep`
`tests/unit/utils.test.js`	`generateUniqueId`, `resolveDotPath`, `logger`
`tests/unit/vector.test.js`	`cosineSimilarity`, `stripVector`
`tests/unit/aggregation.test.js`	`count`, `sum`, `avg`, `groupBy`
`tests/unit/changelog.test.js`	`ChangeLog`: append, query, restore
`tests/unit/events.test.js`	`EventBus`: on/emit/off, wildcard `"*"` channel, listener isolation, error swallowing
`tests/unit/memory.test.js`	`Memory`: episodic store operations
`tests/unit/ask.test.js`	`QueryCache`, `processLLMFilter`, `validateLLMFilter`
`tests/unit/mcp.test.js`	`SkalexMCPServer`: tool definitions, protocol handling
`tests/unit/encryption.test.js`	`EncryptedAdapter`: encrypt/decrypt round-trip
`tests/unit/plugins.test.js`	`PluginEngine`: hook registration and execution
`tests/unit/session-stats.test.js`	`SessionStats`: read/write recording
`tests/unit/migrations.test.js`	`MigrationEngine`: registration, ordering, run, status
`tests/integration/skalex.test.js`	Full CRUD, schema, TTL, migrations, transactions, upsert, seed, dump, inspect, import/export, namespace
`tests/integration/skalex-core.test.js`	Core Skalex class integration
`tests/integration/collection-features.test.js`	autoSave, upsertMany, defaultTtl, defaultEmbed, soft deletes, capped collections, versioning, renameCollection, onSchemaError, strict mode, ttlSweepInterval, db.watch(), write queue
`tests/integration/engine-overhaul.test.js`	Engine overhaul: errors, persistence, transactions, pipeline, registry, query operators
`tests/integration/correctness-hardening.test.js`	Correctness hardening: deep equality, compound indexes, batch uniqueness, non-scalar rejection
`tests/integration/data-integrity.test.js`	Data integrity: crash recovery, dirty tracking, flush sentinel
`tests/integration/persistence-coherence.test.js`	Persistence coherence: write coalescing, save mutex, concurrent saves
`tests/smoke/node.test.cjs`	CJS dist smoke test (Node.js >=18)
`tests/smoke/bun.test.js`	ESM dist smoke test (Bun)
`tests/smoke/bun-sqlite.test.js`	BunSQLiteAdapter smoke test
`tests/smoke/deno.test.js`	ESM dist smoke test (Deno 2.x)
`tests/smoke/browser.test.js`	Headless Chromium runner (Playwright)

23. Embedding Adapter Interface

File: src/connectors/embedding/base.js

Single-method interface:

class EmbeddingAdapter {
  async embed(text) // -> number[]
}

embed() receives a plain string and returns a numeric array. Dimensionality is model-dependent (OpenAI text-embedding-3-small = 1536, Ollama nomic-embed-text = 768). Skalex itself is dimension-agnostic; cosineSimilarity works on any length, but all documents in a collection must use the same model to produce comparable vectors.

Configuration

The adapter is wired via the ai constructor option or via a pre-built embeddingAdapter instance:

// Via ai config (factory creates the adapter)
new Skalex({ ai: { provider: "openai", apiKey, model } })

// Via pre-built instance
new Skalex({ embeddingAdapter: new MyAdapter() })

Both built-in adapters use native fetch (Node >=18, Bun, Deno, browser; no extra dependency).

24. Vector Storage & Stripping

Vectors are stored inline on documents as _vector: number[]. This means:

No separate vector store or side-collection; one document, one file.
Vectors serialise to JSON as regular arrays.
On load, vectors remain as plain number[]; no reconstruction step needed.
_vector is treated as a system field, parallel to _id, createdAt, _expiresAt.

`stripVector(doc)` - `src/engine/vector.js`

Returns a shallow copy of the document with _vector removed. Short-circuits when no _vector key is present on the document - returns { ...doc } without destructuring overhead.

Every code path that returns a document to the caller passes through stripVector:

Method	Where stripped
`insertOne`	Return value
`insertMany`	Return value (mapped)
`findOne`	After projection
`find`	Inside the result loop
`search`	Result mapping
`similar`	Result mapping

The raw document inside _data always retains _vector for future similarity computations. stripVector never mutates the stored document.

25. Vector Search Engine

Files: src/engine/vector.js, src/engine/collection.js

`cosineSimilarity(a, b)`

dot(a, b) / (|a| x |b|)

Computed in a single loop, O(d) where d = vector dimensions. Returns 0 for zero-magnitude vectors to avoid NaN. Throws ERR_SKALEX_QUERY_VECTOR_MISMATCH on dimension mismatch.

`collection.search(query, { filter, limit, minScore })`

await this.database.embed(query) - produce a query vector via the configured adapter.
Get candidates: filter present -> _findAllRaw(filter) (structured pre-filter, leverages IndexEngine); no filter -> this._data.
For each candidate with a _vector, compute cosineSimilarity(queryVector, doc._vector).
Drop candidates below minScore.
Sort descending by score, slice to limit.
Return { docs: top.map(stripVector), scores: top.map(score) }.

This is hybrid search when filter is provided; the structured filter narrows candidates before the cosine ranking step.

`collection.similar(id, { limit, minScore })`

Resolve source document via this._index.get(id).
Early-return { docs: [], scores: [] } if not found or has no _vector.
Iterate this._data, skipping the source document and any doc without _vector.
Compute cosine similarity, apply minScore threshold.
Sort, slice, strip, return.

Complexity

Operation	Time	Notes
`search` (no filter)	O(n x d)	n = collection size, d = dimensions
`search` (with filter)	O(k x d)	k = filtered candidate count
`similar`	O(n x d)	Full scan minus one doc

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

Skalex v4: Architecture

Table of Contents

1. Directory Layout

2. Design Principles

3. Module Responsibilities

4. Typed Error Hierarchy

5. Document Shape

6. Return Shapes

7. Storage Adapter Interface

writeAll(entries)

FsAdapter extras

Writing a custom adapter

8. Collection Store Object

9. CollectionRegistry

10. MutationPipeline

Lifecycle

Transaction awareness

11. PersistenceManager

Loading (loadAll)

Saving

Dirty tracking

Write coalescing

Save mutex (_saveLock)

Flush sentinel

12. IndexEngine

Compound indexes

Dot-notation rejection

Unique enforcement

Index rollback

Lookup methods

Lifecycle

13. Query Engine

matchesFilter(item, filter)

deepEqual(a, b)

Dot-notation

presortFilter(filter, indexedFields)

14. Schema Validator

parseSchema(schema)

validateDoc(doc, fields, strict = false)

stripInvalidFields(doc, fields)

inferSchema(doc)

15. TTL Engine

TTL formats

computeExpiry(ttl)

sweep(data, idIndex, removeFromIndexes?)

16. TransactionManager

Overview

Lifecycle

Deferred side effects

Stale continuation detection

17. AI Adapter Factory

18. MigrationEngine

19. Auto-Connect

20. Namespace

21. Build Pipeline

22. Test Strategy

MemoryAdapter (tests/helpers/MemoryAdapter.js)

Test files

23. Embedding Adapter Interface

Configuration

24. Vector Storage & Stripping

stripVector(doc) - src/engine/vector.js

25. Vector Search Engine

cosineSimilarity(a, b)

collection.search(query, { filter, limit, minScore })

collection.similar(id, { limit, minScore })

Complexity

`writeAll(entries)`

Loading (`loadAll`)

Save mutex (`_saveLock`)

`matchesFilter(item, filter)`

`deepEqual(a, b)`

`presortFilter(filter, indexedFields)`

`parseSchema(schema)`

`validateDoc(doc, fields, strict = false)`

`stripInvalidFields(doc, fields)`

`inferSchema(doc)`

`computeExpiry(ttl)`

`sweep(data, idIndex, removeFromIndexes?)`

MemoryAdapter (`tests/helpers/MemoryAdapter.js`)

`stripVector(doc)` - `src/engine/vector.js`

`cosineSimilarity(a, b)`

`collection.search(query, { filter, limit, minScore })`

`collection.similar(id, { limit, minScore })`