Internal reference for contributors. Describes the current alpha.2 design of every module, data structure, and subsystem.
- Directory Layout
- Design Principles
- Module Responsibilities
- Typed Error Hierarchy
- Document Shape
- Return Shapes
- Storage Adapter Interface
- Collection Store Object
- CollectionRegistry
- MutationPipeline
- PersistenceManager
- IndexEngine
- Query Engine
- Schema Validator
- TTL Engine
- TransactionManager
- AI Adapter Factory
- MigrationEngine
- Auto-Connect
- Namespace
- Build Pipeline
- Test Strategy
- Embedding Adapter Interface
- Vector Storage & Stripping
- Vector Search Engine
src/
index.js - Skalex class (database entry point)
index.d.ts - TypeScript declarations (source of truth)
engine/
collection.js - Collection class (per-collection CRUD)
query.js - matchesFilter + presortFilter
indexes.js - IndexEngine (secondary field + compound indexes)
validator.js - parseSchema, validateDoc, inferSchema, stripInvalidFields
ttl.js - computeExpiry, sweep
migrations.js - MigrationEngine
vector.js - cosineSimilarity, stripVector
utils.js - generateUniqueId, logger, resolveDotPath
errors.js - Typed error hierarchy (SkalexError and subclasses)
persistence.js - PersistenceManager (load, save, dirty tracking, write coalescing)
transaction.js - TransactionManager (lazy snapshots, timeout, rollback)
registry.js - CollectionRegistry (store creation, instance caching, inspection)
pipeline.js - MutationPipeline (shared pre/post mutation lifecycle)
adapters.js - AI adapter factory functions (embedding + LLM)
features/
aggregation.js - count, sum, avg, groupBy
changelog.js - ChangeLog class (append-only mutation log)
events.js - EventBus (cross-runtime pub/sub)
memory.js - Memory class (agent episodic store)
ask.js - natural language -> filter via LLM
plugins.js - PluginEngine (pre/post hooks)
session-stats.js - SessionStats (per-session read/write tracking)
query-log.js - SlowQueryLog (threshold-based ring buffer)
connectors/
storage/
base.js - StorageAdapter abstract class
fs.js - FsAdapter (Node.js file system, atomic writes, gz/json)
local.js - LocalStorageAdapter (browser localStorage)
encrypted.js - EncryptedAdapter (AES-256-GCM wrapper)
bun-sqlite.js - BunSQLiteAdapter (bun:sqlite)
d1.js - D1Adapter (Cloudflare D1 / Workers)
libsql.js - LibSQLAdapter (LibSQL / Turso)
embedding/
base.js - EmbeddingAdapter abstract class
openai.js - OpenAIEmbeddingAdapter (text-embedding-3-small default)
ollama.js - OllamaEmbeddingAdapter (nomic-embed-text default)
llm/
base.js - LLMAdapter abstract class
openai.js - OpenAILLMAdapter (gpt-4o-mini default)
anthropic.js - AnthropicLLMAdapter (claude-haiku-4-5 default)
ollama.js - OllamaLLMAdapter (llama3.2 default)
mcp/
index.js - SkalexMCPServer (stdio + HTTP/SSE)
protocol.js - JSON-RPC 2.0 helpers
tools.js - MCP tool definitions
transports/
stdio.js - StdioTransport
http.js - HttpTransport (HTTP + SSE)
dist/ - generated by `npm run build`, not committed to git
skalex.esm.js - ESM build (readable)
skalex.esm.min.js - ESM build (minified)
skalex.cjs - CJS build (readable)
skalex.min.cjs - CJS build (minified)
skalex.browser.js - Browser ESM build (node:* built-ins stubbed)
skalex.d.ts - TypeScript declarations (copied from src/index.d.ts)
tests/
helpers/
MemoryAdapter.js - In-memory StorageAdapter for CI (no I/O)
MockEmbeddingAdapter.js - Deterministic 4-dim vectors for unit tests
MockLLMAdapter.js - Configurable nlQuery -> filter map
MockTransport.js - In-memory MCP transport for unit tests
unit/
aggregation.test.js
ask.test.js
changelog.test.js
encryption.test.js
events.test.js
indexes.test.js
mcp.test.js
memory.test.js
migrations.test.js
plugins.test.js
query.test.js
session-stats.test.js
ttl.test.js
utils.test.js
validator.test.js
vector.test.js
integration/
collection-features.test.js
correctness-hardening.test.js
data-integrity.test.js
engine-overhaul.test.js
persistence-coherence.test.js
skalex-core.test.js
skalex.test.js
smoke/
node.test.cjs - CJS dist smoke test (Node.js >=18)
bun.test.js - ESM dist smoke test (Bun)
bun-sqlite.test.js - BunSQLiteAdapter smoke test
deno.test.js - ESM dist smoke test (Deno 2.x)
browser.test.js - Headless Chromium runner (Playwright)
browser.html - Browser smoke test page
browser-umd.html - UMD browser smoke test page
scripts/
mcp-server.js - Runnable MCP server for Claude Desktop / Cursor
run-deno.js - Cross-platform deno binary resolver
- Zero dependencies in core: the engine modules (
src/engine/) install nothing. All imports are internal ornode:*built-ins. OnlydevDependenciesinpackage.json. - Adapter-isolated I/O: no module in
src/importsfsorlocalStoragedirectly. All I/O is routed through the injectedStorageAdapter. - ESM source, dual dist: source files use
import/export(native ESM,"type":"module"). Rollup produces ESM, CJS, and browser dist artifacts for consumers. - In-memory first: all data lives in plain JS arrays and Maps. The storage adapter is called on
connect(),disconnect(), explicitsaveData(), or when{ save: true }is passed to a mutation (orautoSaveis enabled). - Auto-connect: the first operation on a
Skalexinstance automatically callsconnect()if it hasn't been called yet.connect()is idempotent via a shared_connectPromise. - Per-collection write queue with durability guarantee: each collection store tracks
isSaving,_pendingSave, and_dirty. Concurrent saves to the same collection are serialised - the second writer creates a waiter promise (_saveWaiters) and the first writer re-runs_saveOneafter its own flush completes, so no write is ever dropped. - Typed errors: every engine throw uses a subclass of
SkalexErrorwith a stablecodeproperty (ERR_SKALEX_<SUBSYSTEM>_<SPECIFIC>). Consumers can handle errors programmatically without parsing message strings. - Consistent return shapes:
insertOne/updateOne/deleteOne/upsertreturndocornull.insertMany/updateMany/deleteManyreturndoc[].findreturns{ docs: [] }with optional{ page, totalDocs, totalPages }when paginated.
| Module | File | Responsibility |
|---|---|---|
| Skalex | src/index.js |
Lifecycle (connect/disconnect), collection facade, migrations, transactions, seeding, namespaces, import/export, debug logging, serialization (BigInt/Date-safe) |
| Collection | src/engine/collection.js |
All CRUD operations, upsert, find with sort/pagination/populate/select, export, search, similar, index maintenance around mutations |
| CollectionRegistry | src/engine/registry.js |
Store creation, lazy instance caching, renames, inspection, schema access, index building |
| MutationPipeline | src/engine/pipeline.js |
Shared mutation lifecycle: ensureConnected, txSnapshot, beforePlugin, mutate, markDirty, save, changelog, sessionStats, event, afterPlugin |
| PersistenceManager | src/engine/persistence.js |
Load orchestration, save/saveDirty/saveAtomic, dirty tracking, write coalescing, flush sentinel, orphan temp-file cleanup, save mutex |
| TransactionManager | src/engine/transaction.js |
Transaction scope, lazy copy-on-first-write snapshots, timeout/abort, stale proxy detection, deferred side effects, rollback |
| QueryEngine | src/engine/query.js |
Filter evaluation (matchesFilter), filter key ordering (presortFilter), deep equality for plain objects |
| IndexEngine | src/engine/indexes.js |
Secondary field indexes, compound indexes, unique constraint enforcement, batch validation |
| Validator | src/engine/validator.js |
Schema parsing, document validation, schema inference, field stripping |
| TTL | src/engine/ttl.js |
TTL parsing, expiry computation, expired-document sweep |
| Vector | src/engine/vector.js |
cosineSimilarity(a, b), stripVector(doc) |
| Errors | src/engine/errors.js |
Typed error hierarchy: SkalexError, ValidationError, UniqueConstraintError, TransactionError, PersistenceError, AdapterError, QueryError |
| Adapters | src/engine/adapters.js |
Factory functions createEmbeddingAdapter() and createLLMAdapter() - pure config-to-instance mappers |
| Migrations | src/engine/migrations.js |
Migration registration, version ordering, pending-migration execution, status reporting |
| Utils | src/engine/utils.js |
generateUniqueId() (27-char timestamp+random), logger(), resolveDotPath() (with prototype-pollution guard) |
| StorageAdapter | src/connectors/storage/base.js |
Abstract class: read, write, delete, list, writeAll |
| FsAdapter | src/connectors/storage/fs.js |
Node.js adapter: gz-compressed or raw JSON files, atomic temp-then-rename writes |
| LocalStorageAdapter | src/connectors/storage/local.js |
Browser adapter: localStorage with namespaced keys |
| EmbeddingAdapter | src/connectors/embedding/base.js |
Abstract class: embed(text) -> number[] |
| LLMAdapter | src/connectors/llm/base.js |
Abstract class: generate(schema, nlQuery) -> filter |
File: src/engine/errors.js
All engine errors extend SkalexError, which extends Error and adds code (string) and details (object) properties.
SkalexError
+-- ValidationError - Schema parsing or document validation failure
+-- UniqueConstraintError - Insert or update violates a unique field constraint
+-- TransactionError - Transaction timeout, abort, or rollback failure
+-- PersistenceError - Load, save, serialization, or flush failure
+-- AdapterError - Storage or AI adapter misconfiguration
+-- QueryError - Query filter, operator, or execution failure
Code convention: ERR_SKALEX_<SUBSYSTEM>_<SPECIFIC> (e.g. ERR_SKALEX_VALIDATION_REQUIRED, ERR_SKALEX_TX_TIMEOUT).
Error types are exposed as static properties on the Skalex class (Skalex.ValidationError, etc.) for CJS/UMD consumers, and as named exports for ESM consumers.
Every document stored by Skalex has the following reserved fields:
| Field | Type | Set by |
|---|---|---|
_id |
string (27 chars: hex timestamp + random bytes) |
insertOne / insertMany. User-supplied _id is preserved. |
createdAt |
Date |
insertOne / insertMany. Always overwritten - user values are replaced. |
updatedAt |
Date |
insertOne / insertMany / applyUpdate. Always overwritten - user values are replaced. |
_expiresAt |
Date | undefined |
insertOne / insertMany when { ttl } or defaultTtl is set |
_version |
number | undefined |
insertOne (starts at 1), incremented by applyUpdate - only when versioning: true |
_deletedAt |
Date | undefined |
deleteOne / deleteMany - only when softDelete: true |
_vector |
number[] | undefined |
insertOne / insertMany when { embed } or defaultEmbed is set - never returned to callers |
User-supplied fields spread after _id, createdAt, updatedAt. The _id field respects caller-supplied values (allowing custom IDs), but createdAt and updatedAt are always engine-controlled.
All mutation and query methods return plain objects. No raw documents are returned directly.
| Method | Success return | Not-found return |
|---|---|---|
insertOne |
doc |
- |
insertMany |
doc[] |
- |
updateOne |
doc |
null |
updateMany |
doc[] |
[] |
deleteOne |
doc |
null |
deleteMany |
doc[] |
[] |
findOne |
doc (projected copy) |
null |
find (no limit) |
{ docs: doc[] } |
{ docs: [] } |
find (with limit) |
{ docs, page, totalDocs, totalPages } |
same, empty docs |
upsert |
doc |
- (always inserts or updates) |
upsertMany |
doc[] |
- (always inserts or updates) |
restore |
doc |
null |
File: src/connectors/storage/base.js
All storage backends implement the StorageAdapter abstract class:
class StorageAdapter {
async read(name) // -> string | null
async write(name, data) // -> void
async delete(name) // -> void
async list() // -> string[]
async writeAll(entries) // -> void (batch write)
}name is a plain collection identifier (no path separators). The adapter maps it to whatever its storage scheme requires (file path, localStorage key prefix, etc.).
Batch write method. The base class provides a sequential fallback (for...of write()). Adapters that support atomic batches (SQL-backed) override this with a single transaction. Used by PersistenceManager.saveAtomic() during transaction commit.
FsAdapter additionally exposes helpers used by Collection.export and Skalex.import:
join(dir, file) // path.join equivalent
ensureDir(dir) // mkdir -p equivalent (sync)
async writeRaw(path, data) // write raw string to an arbitrary path
async readRaw(path) // read raw string from an arbitrary pathThese are not part of the StorageAdapter contract; they are FsAdapter-specific and guarded by duck-typing at the call site.
Extend StorageAdapter and pass the instance to the constructor:
import Skalex from "skalex";
class MyAdapter extends StorageAdapter {
async read(name) { ... }
async write(name, data) { ... }
async delete(name) { ... }
async list() { ... }
async writeAll(entries) { ... } // optional: override for atomic batch
}
const db = new Skalex({ adapter: new MyAdapter() });Skalex.collections[name] is a plain object - the single source of truth for a collection's in-memory state:
{
collectionName: string,
data: object[], // ordered array of all documents
index: Map<_id, doc>, // O(1) _id lookup
isSaving: boolean, // per-collection write lock
_pendingSave: boolean, // a second save was requested while isSaving
_dirty: boolean, // needs persistence (set by markDirty, cleared after save)
_saveWaiters: Promise[], // callers waiting for the current write to finish
schema: { fields: Map, uniqueFields: string[] } | null,
rawSchema: object | null, // original schema definition before parsing
fieldIndex: IndexEngine | null,
// optional collection-level behaviours (from createCollection opts):
softDelete: boolean,
versioning: boolean,
strict: boolean,
onSchemaError: "throw"|"warn"|"strip",
defaultTtl: string | number | null,
defaultEmbed: string | null,
maxDocs: number | null,
changelog: boolean,
}Collection instances hold a reference to this object (this._store) and expose it via getters. Multiple Collection instances for the same name share the same store object; mutations are immediately visible across references.
The rawSchema field preserves the original schema definition so it can be round-tripped through persistence. The _dirty flag is set by PersistenceManager.markDirty() after every mutation and cleared after a successful save.
File: src/engine/registry.js
Owns collection definitions, store creation, instance caching, renames, inspection, and metadata access.
get(name, db): returns a cachedCollectioninstance or lazily creates one (and its backing store).create(name, options, db): defines a collection with schema, indexes, and behaviour options. CallscreateStore()then wraps in aCollectioninstance.createStore(name, options): allocates the raw store object with all fields initialised (including_dirty: false,rawSchema, and_pendingSave: false). Parses the schema, creates anIndexEngineif indexes or unique fields are declared.rename(from, to): renames a collection in-memory. Updates the store'scollectionName, re-keys the stores map, and migrates the cached instance.buildIndex(data, keyField): builds aMap<keyField, doc>from a data array.schema(name): returns the parsed schema or infers one from the first document.inspect(name),stats(name),dump(): metadata and diagnostic methods.
File: src/engine/pipeline.js
Extracts the shared pre/post mutation lifecycle so each CRUD method only defines its operation-specific logic. Every write operation (insertOne, updateOne, deleteOne, etc.) delegates to pipeline.execute().
ensureConnected
-> txSnapshot (copy-on-first-write if inside a transaction)
-> assertTxAlive (reject stale continuations from aborted transactions)
-> beforePlugin hook
-> [mutation] (caller-defined: the actual data change)
-> markDirty
-> save (if { save: true } or autoSave)
-> changelog entry
-> sessionStats.recordWrite
-> event emission (via EventBus)
-> afterPlugin hook
The pipeline checks _activeTxId on the Collection instance to determine whether a write is part of the active transaction. Two guards prevent stale mutations:
entryTxId: the transaction ID active whenexecute()was called._createdInTxId: the transaction ID active when the Collection instance was created via the proxy.
If either ID appears in TransactionManager._abortedIds, the mutation throws ERR_SKALEX_TX_ABORTED.
Side effects (events, after-hooks, changelog) during a transaction are deferred via txManager.defer() and flushed only after successful commit.
File: src/engine/persistence.js
Owns all load/save orchestration, dirty tracking, write-queue coalescing, flush sentinel management, and orphan temp-file cleanup.
- Lists all stored collection names via
adapter.list(). - Reads and deserialises each collection in parallel.
- Merges persisted config with pre-existing
createCollectionconfig (in-memory config takes precedence). - Builds the
_idindex and field indexes from loaded data. - Detects incomplete flushes via the flush sentinel.
- Cleans orphan temp files left by interrupted atomic writes.
The lenientLoad option allows skipping corrupt collections instead of throwing.
Three save strategies:
| Method | Semantics | Used by |
|---|---|---|
save(collections, name?) |
Best-effort: each collection written independently. One failure does not block others. | db.saveData(), { save: true } mutations |
saveDirty(collections) |
Same as save but only writes collections with _dirty === true. |
Implicit flush paths |
saveAtomic(collections, names) |
Batch write via adapter.writeAll(). Includes _meta with flush sentinel. Serialised via _saveLock. |
Transaction commit |
markDirty(collections, name) sets _dirty = true on the store. After a successful write, the flag is cleared. This ensures no mutation is silently lost.
Concurrent saves for the same collection are serialised by the per-collection isSaving / _pendingSave mechanism. The second caller sets _pendingSave = true; when the first writer finishes, it re-runs _saveOne with the latest data. Callers can await the _saveWaiters promise to know their data has been flushed.
saveAtomic() acquires the instance-level _saveLock (a promise chain) to serialize concurrent atomic saves. Regular save()/saveDirty() do not acquire this lock - they rely on per-collection coalescing.
Before an atomic batch write, a sentinel is written into the _meta collection. After a successful batch, the sentinel is cleared. On next load, if the sentinel is still present, the engine knows a previous flush was interrupted.
File: src/engine/indexes.js
Maintains Map-based indexes for declared fields. Three index types:
_fieldIndexes: Map<field, Map<value, Set<doc>>> - non-unique + unique
_uniqueIndexes: Map<field, Map<value, doc>> - unique fields only
_compoundIndexes: Map<key, { fields, map }> - multi-field compound indexes
Declared as arrays in the indexes option: indexes: [["field1", "field2"]]. Values are encoded into a stable tuple key via encodeTuple() (type-tagged to prevent cross-type collisions). Compound index fields must be scalar values (string, number, or boolean) - non-scalar values (objects, arrays) are rejected with ERR_SKALEX_VALIDATION_COMPOUND_INDEX.
Index field names cannot contain dots. The index engine uses direct property access (doc[field]), not resolveDotPath(), so dot-path fields would produce false negatives. _validateFieldName() throws ERR_SKALEX_VALIDATION_INDEX_DOT_PATH if a field name contains ..
- Single-document:
_checkUnique(doc, excludeDoc)verifies no other document holds the same value for any unique field. ThrowsERR_SKALEX_UNIQUE_VIOLATION. - Batch insert:
assertUniqueBatch(newDocs)validates an entire batch against the existing index and within the batch itself (intra-batch duplicate detection). - Batch update:
assertUniqueCandidates(oldDocs, newDocs)validates staged updates before any live index mutation. Excludes the documents being updated from the conflict set.
update(oldDoc, newDoc) wraps the index mutation in a try/catch. If indexing newDoc fails (e.g. unique violation), the old doc is restored in the index.
lookup(field, value)- returnsobject[](array materialised for public API)._lookupIterable(field, value)- returns a read-only iterable wrapping the backingSet(avoids allocation in internal scan paths).lookupCompound(fieldValues)- returns matching docs for a multi-field equality match.
add(doc)- checks unique then indexes.remove(doc)- removes from all three index types.update(oldDoc, newDoc)- checks unique, removes old, indexes new (with rollback).buildFromData(data)- clears and rebuilds all indexes from scratch (used after load and transaction rollback).
File: src/engine/query.js
Evaluation order (short-circuits on first false):
- Function filter:
typeof filter === "function"- checked before the empty-filter guard because functions are objects with zero enumerable keys. - Empty filter:
{}ornull/undefinedmatches everything. - Logical operators (evaluated before field-level checks):
$or- array of sub-filters, at least one must match.$and- array of sub-filters, all must match.$not- single sub-filter, must not match.
- AND over all remaining keys: every field key must pass.
Per-key logic:
| Filter value type | Evaluation |
|---|---|
RegExp |
filterValue.test(String(itemValue)) |
Object with $-keys |
Operator dispatch: $eq $ne $gt $gte $lt $lte $in $nin $regex $fn |
Plain object (no $-keys) |
Deep structural equality via deepEqual() |
| Anything else | Strict equality itemValue === filterValue |
Structural deep equality for plain values. Handles: primitives, null, undefined, plain objects, arrays, Date, RegExp. Circular references are out of scope (engine data is JSON-serializable).
Keys like "address.city" are resolved by resolveDotPath() from src/engine/utils.js, which splits on . and walks the object. Prototype-pollution keys (__proto__, constructor, prototype) return undefined.
Reorders filter keys for optimal short-circuit evaluation:
- Indexed fields: O(1) lookup, often handled by
_getCandidatesbeforematchesFilterruns - Plain equality: fast strict comparison
- Range operators:
$gt $gte $lt $lte $ne $in $nin - Expensive:
$regex $fnandRegExpvalues - Logical operators:
$or $and $not(preserved at the end)
Returns a new object with keys in this order. Called in find() before the main scan loop.
File: src/engine/validator.js
Normalises a user-supplied schema definition into an internal form:
// Input
{ name: "string", email: { type: "string", unique: true, required: true } }
// Output
{
fields: Map<string, { type, required, unique, enum? }>,
uniqueFields: string[]
}Supported types: "string", "number", "boolean", "object", "array", "date", "any".
Checks a document against the parsed fields Map. Returns an array of error strings (empty = valid). Checks:
- Required: field is
undefinedornullwhenrequired: true - Type: uses
typeof, with special handling forArray->"array"andDate->"date" - Enum: value must be in the allowed list
- Unknown fields (when
strict = true): any non-_-prefixed key not declared in the schema is flagged
Used by the "strip" onSchemaError strategy. Returns a shallow copy retaining only _-prefixed system fields and declared fields that pass type/enum checks. Never mutates the original document.
Derives a simple { field: type } schema from a sample document. Skips fields starting with _. Used by db.inspect() and db.schema() when no explicit schema was declared.
File: src/engine/ttl.js
| Input | Meaning |
|---|---|
300 (number) |
300 seconds |
"300ms" |
300 milliseconds |
"30s" |
30 seconds |
"30m" |
30 minutes |
"24h" |
24 hours |
"7d" |
7 days |
All values are converted to milliseconds internally. Non-positive values throw ERR_SKALEX_VALIDATION_TTL.
Returns new Date(Date.now() + parseTtl(ttl)). Stored as _expiresAt on the document.
Iterates the data array backwards (safe splice), removes any document where _expiresAt <= Date.now(), deletes from the _id Map index, and optionally calls IndexEngine.remove. Returns the count of removed documents.
Called in two places:
- On
connect()- one-shot sweep of every loaded collection. - Periodic sweep via
ttlSweepInterval- when set in the constructor,connect()starts asetIntervalthat calls_sweepTtl()on every collection. The timer handle calls.unref()so it does not keep the Node.js process alive. Cleared indisconnect().
File: src/engine/transaction.js
db.transaction(fn, { timeout }) provides lazy copy-on-first-write rollback with deferred side effects.
- Lock: transactions are serialised via
_txLock(promise chain). No concurrent transactions. - Context creation: allocates a
TransactionContextwith a uniqueid,snapshotsMap,touchedCollectionsSet,deferredEffectsarray, andabortedflag. RecordspreExistingcollection names. - Proxy:
fnreceives aProxyaround thedbinstance, not the rawdb. The proxy:- Stamps
_activeTxIdon every Collection obtained viauseCollection(). - Blocks direct
db.collectionsaccess (throwsERR_SKALEX_TX_DIRECT_ACCESS). - Detects stale proxy usage after the transaction ends (throws
ERR_SKALEX_TX_STALE_PROXY).
- Stamps
- Lazy snapshots: only collections that receive a write are snapshotted, on first mutation (via
_txSnapshotIfNeeded()in the pipeline). UsesstructuredClonefor deep copy of the data array. Snapshot includes the_dirtyflag state. - Timeout: if
timeout > 0, asetTimeoutraces againstfn. On timeout,ctx.aborted = trueand the transaction is rolled back. - Commit: persists only touched collections via
PersistenceManager.saveAtomic(). After successful persistence, flushes all deferred side effects (events, after-hooks, changelog entries). - Rollback: on error, restores snapshotted collections via
_applySnapshot()(which rebuilds_idindex and field indexes from the cloned data). Collections created inside the transaction are deleted. The_dirtyflag is restored to its pre-transaction state.
During fn(), calls to _emitEvent(), _runAfterHook(), and _logChange() check txManager.defer(). If a transaction is active, the effect is pushed to ctx.deferredEffects instead of executing immediately. Effects are flushed in order after commit, or discarded on rollback.
When a transaction is aborted (timeout or error), its id is added to _abortedIds. Any subsequent mutation from a Collection stamped with that id (via _activeTxId or _createdInTxId) throws ERR_SKALEX_TX_ABORTED.
File: src/engine/adapters.js
Pure config-to-instance mappers extracted from the Skalex constructor.
createEmbeddingAdapter(ai): switches onai.provider("openai" | "ollama") and returns the appropriateEmbeddingAdaptersubclass.createLLMAdapter(ai): switches onai.provider("openai" | "anthropic" | "ollama") and returns the appropriateLLMAdaptersubclass.
Both throw AdapterError for unknown providers. The Skalex constructor also accepts pre-built adapter instances via embeddingAdapter and llmAdapter options, bypassing the factory.
File: src/engine/migrations.js
Migrations are registered via db.addMigration({ version, description?, up }) and stored sorted by version. On connect():
_getMeta()reads applied versions from the_metacollection.MigrationEngine.run()filters to pending versions and callsup(collection)for each in order._saveMeta()writes the updated applied-versions list back to_meta.
The _meta collection is a regular Skalex collection with a single document keyed "migrations":
{ _id: "migrations", appliedVersions: [1, 2, 3] }Duplicate version registration throws immediately. Version numbers must be positive integers.
_ensureConnected() is called at the top of every public operation (via the MutationPipeline or directly).
async connect() {
if (this._connectPromise) return this._connectPromise;
this._connectPromise = this._doConnect();
return this._connectPromise;
}The _connectPromise field makes connect() idempotent. Multiple concurrent callers before connect() resolves all await the same promise - no double-connect race. After connect() resolves, isConnected = true and _ensureConnected() short-circuits immediately.
db.namespace(id) returns a new Skalex instance with path set to {parent.dataDirectory}/{safeId}. The id is sanitised to allow only alphanumeric, dash, and underscore characters.
Config inherited from the parent: format, debug, ai, encrypt, slowQueryLog, queryCache, plugins, memory, logger, autoSave, ttlSweepInterval, regexMaxLength, idGenerator, serializer, deserializer, and pre-built adapter instances (when no ai config is present).
The namespaced instance is fully independent: separate collections map, separate adapter instance pointing at the subdirectory. Requires the default FsAdapter - throws ERR_SKALEX_ADAPTER_NAMESPACE_REQUIRES_FS if a custom adapter was configured.
Tool: Rollup with @rollup/plugin-node-resolve, @rollup/plugin-commonjs, @rollup/plugin-terser.
Config file: rollup.config.js (native ESM via "type":"module" in package.json).
Five outputs:
| File | Format | Minified |
|---|---|---|
dist/skalex.esm.js |
ESM | No |
dist/skalex.esm.min.js |
ESM | Yes |
dist/skalex.cjs |
CJS | No |
dist/skalex.min.cjs |
CJS | Yes |
dist/skalex.browser.js |
ESM (browser) | No |
All outputs include source maps. Node built-ins (node:fs, node:path, node:zlib, node:crypto, node:os) are marked external. The browser build stubs them with empty objects via the nodeBrowserStubs() Rollup plugin.
TypeScript declarations are hand-written in src/index.d.ts and copied to dist/skalex.d.ts as part of the build script.
Runner: Vitest (kept over Bun test because LocalStorageAdapter tests require a jsdom/browser environment, which Bun test does not support as of v1.x).
715 tests across 24 test files.
In-memory StorageAdapter for CI; no disk I/O, no temp files. Implements read/write/delete/list backed by Map<name, string>, plus stubs for Collection.export and Skalex.import. All integration tests inject a MemoryAdapter instance. No test touches the real file system.
| File | Coverage |
|---|---|
tests/unit/query.test.js |
matchesFilter, presortFilter: all operators, $or/$and/$not, deep equality, edge cases |
tests/unit/indexes.test.js |
IndexEngine: add/remove/update/lookup/unique constraint, compound indexes, batch validation, dot-notation rejection |
tests/unit/validator.test.js |
parseSchema, validateDoc (incl. strict mode), inferSchema, stripInvalidFields |
tests/unit/ttl.test.js |
parseTtl, computeExpiry, sweep |
tests/unit/utils.test.js |
generateUniqueId, resolveDotPath, logger |
tests/unit/vector.test.js |
cosineSimilarity, stripVector |
tests/unit/aggregation.test.js |
count, sum, avg, groupBy |
tests/unit/changelog.test.js |
ChangeLog: append, query, restore |
tests/unit/events.test.js |
EventBus: on/emit/off, wildcard "*" channel, listener isolation, error swallowing |
tests/unit/memory.test.js |
Memory: episodic store operations |
tests/unit/ask.test.js |
QueryCache, processLLMFilter, validateLLMFilter |
tests/unit/mcp.test.js |
SkalexMCPServer: tool definitions, protocol handling |
tests/unit/encryption.test.js |
EncryptedAdapter: encrypt/decrypt round-trip |
tests/unit/plugins.test.js |
PluginEngine: hook registration and execution |
tests/unit/session-stats.test.js |
SessionStats: read/write recording |
tests/unit/migrations.test.js |
MigrationEngine: registration, ordering, run, status |
tests/integration/skalex.test.js |
Full CRUD, schema, TTL, migrations, transactions, upsert, seed, dump, inspect, import/export, namespace |
tests/integration/skalex-core.test.js |
Core Skalex class integration |
tests/integration/collection-features.test.js |
autoSave, upsertMany, defaultTtl, defaultEmbed, soft deletes, capped collections, versioning, renameCollection, onSchemaError, strict mode, ttlSweepInterval, db.watch(), write queue |
tests/integration/engine-overhaul.test.js |
Engine overhaul: errors, persistence, transactions, pipeline, registry, query operators |
tests/integration/correctness-hardening.test.js |
Correctness hardening: deep equality, compound indexes, batch uniqueness, non-scalar rejection |
tests/integration/data-integrity.test.js |
Data integrity: crash recovery, dirty tracking, flush sentinel |
tests/integration/persistence-coherence.test.js |
Persistence coherence: write coalescing, save mutex, concurrent saves |
tests/smoke/node.test.cjs |
CJS dist smoke test (Node.js >=18) |
tests/smoke/bun.test.js |
ESM dist smoke test (Bun) |
tests/smoke/bun-sqlite.test.js |
BunSQLiteAdapter smoke test |
tests/smoke/deno.test.js |
ESM dist smoke test (Deno 2.x) |
tests/smoke/browser.test.js |
Headless Chromium runner (Playwright) |
File: src/connectors/embedding/base.js
Single-method interface:
class EmbeddingAdapter {
async embed(text) // -> number[]
}embed() receives a plain string and returns a numeric array. Dimensionality is model-dependent (OpenAI text-embedding-3-small = 1536, Ollama nomic-embed-text = 768). Skalex itself is dimension-agnostic; cosineSimilarity works on any length, but all documents in a collection must use the same model to produce comparable vectors.
The adapter is wired via the ai constructor option or via a pre-built embeddingAdapter instance:
// Via ai config (factory creates the adapter)
new Skalex({ ai: { provider: "openai", apiKey, model } })
// Via pre-built instance
new Skalex({ embeddingAdapter: new MyAdapter() })Both built-in adapters use native fetch (Node >=18, Bun, Deno, browser; no extra dependency).
Vectors are stored inline on documents as _vector: number[]. This means:
- No separate vector store or side-collection; one document, one file.
- Vectors serialise to JSON as regular arrays.
- On load, vectors remain as plain
number[]; no reconstruction step needed. _vectoris treated as a system field, parallel to_id,createdAt,_expiresAt.
Returns a shallow copy of the document with _vector removed. Short-circuits when no _vector key is present on the document - returns { ...doc } without destructuring overhead.
Every code path that returns a document to the caller passes through stripVector:
| Method | Where stripped |
|---|---|
insertOne |
Return value |
insertMany |
Return value (mapped) |
findOne |
After projection |
find |
Inside the result loop |
search |
Result mapping |
similar |
Result mapping |
The raw document inside _data always retains _vector for future similarity computations. stripVector never mutates the stored document.
Files: src/engine/vector.js, src/engine/collection.js
dot(a, b) / (|a| x |b|)
Computed in a single loop, O(d) where d = vector dimensions. Returns 0 for zero-magnitude vectors to avoid NaN. Throws ERR_SKALEX_QUERY_VECTOR_MISMATCH on dimension mismatch.
await this.database.embed(query)- produce a query vector via the configured adapter.- Get candidates:
filterpresent ->_findAllRaw(filter)(structured pre-filter, leveragesIndexEngine); no filter ->this._data. - For each candidate with a
_vector, computecosineSimilarity(queryVector, doc._vector). - Drop candidates below
minScore. - Sort descending by score, slice to
limit. - Return
{ docs: top.map(stripVector), scores: top.map(score) }.
This is hybrid search when filter is provided; the structured filter narrows candidates before the cosine ranking step.
- Resolve source document via
this._index.get(id). - Early-return
{ docs: [], scores: [] }if not found or has no_vector. - Iterate
this._data, skipping the source document and any doc without_vector. - Compute cosine similarity, apply
minScorethreshold. - Sort, slice, strip, return.
| Operation | Time | Notes |
|---|---|---|
search (no filter) |
O(n x d) | n = collection size, d = dimensions |
search (with filter) |
O(k x d) | k = filtered candidate count |
similar |
O(n x d) | Full scan minus one doc |