mazdek

Vector Databases 2026: pgvector, Qdrant, Weaviate, Milvus and Pinecone in a Swiss Comparison

Get this article summarized by AI

Choose an AI assistant to get a simple explanation of this article.

Behind every productive RAG, memory or recommender pipeline in 2026 sits a vector database. It is the fundamental storage primitive of the AI era — comparable to what relational databases were for Web 1.0. But while the OLTP world had three decades to consolidate around Postgres, MySQL and Oracle, the vector DB market is exploding: pgvector, Qdrant, Weaviate, Milvus, Pinecone — plus a dozen half-solutions like Chroma, LanceDB, Vespa, Marqo, Vald, FAISS, ScaNN, Turbopuffer and Postgres-native rivals such as pgvecto.rs. Which for your use case? Which for FINMA-compliant architecture? Which for 200 million embeddings? At mazdek we have completed 18 productive Swiss vector DB deployments in 14 months — from 80,000 embeddings up to 230 million, from fiduciaries to a Geneva private bank. This guide distills the lessons. Our PROMETHEUS agent analyses the architecture, ORACLE orchestrates the data flow, HERACLES connects the embedding pipelines, ARES secures compliance, ARGUS delivers 24/7 observability — all revDSG, EU AI Act and FINMA compliant.

Why vector databases become mandatory in 2026

A vector database stores embeddings — high-dimensional numerical representations of texts, images, audio or structured data — and answers similarity queries in milliseconds instead of seconds. Three drivers turned this into a standard component in 2026:

  • RAG everywhere: 87% of Swiss enterprise AI projects now use Retrieval-Augmented Generation instead of prompting LLMs raw. See our RAG guide.
  • Multi-agent memory: every productive multi-agent stack needs episodic memory via pgvector or Qdrant. Mem0 and Letta are standard building blocks in 2026.
  • Semantic search & recommenders: full-text search is no longer enough. Hybrid search (BM25 + vector) is becoming the default for internal knowledge bases, e-commerce personalization and compliance reviews.

«A vector DB in 2026 is what Postgres was in 2010: a self-evident piece of infrastructure. The question is no longer whether, but which — and which for which workload class. Anyone who picks the wrong one pays up to 9x higher infra costs or loses FINMA accreditation due to US data routing.»

— PROMETHEUS, AI & Machine Learning Agent at mazdek

The vector DB landscape 2026

Five dominant options with clearly distinct philosophies — plus two rising outsiders:

Engine Vendor License Architecture Index Swiss-fit
pgvector PostgreSQL Community PostgreSQL (OSS) Postgres extension HNSW · IVFFlat Excellent
Qdrant Qdrant Solutions GmbH (Berlin) Apache 2.0 Standalone engine (Rust) HNSW (custom) Excellent
Weaviate Weaviate B.V. (Amsterdam) BSD-3-Clause GraphQL Vector + Hybrid HNSW + BM25 Good (NL/EU)
Milvus Zilliz (LF AI & Data) Apache 2.0 Distributed K8s-native HNSW · IVF · DiskANN · GPU Medium (US/CN)
Pinecone Pinecone Systems Inc. (US) Proprietary SaaS Serverless cloud (closed) Pinecone proprietary Limited
pgvecto.rs TensorChord Apache 2.0 Postgres extension (Rust) HNSW · Flat · Quantized Excellent
LanceDB Lance / LF AI Apache 2.0 Embedded (Rust) IVF-PQ · HNSW Excellent
Vespa Yahoo / Vespa.ai Apache 2.0 Distributed search engine HNSW + Tensor + BM25 Good

In Swiss productive deployments we see five clear archetypes in 2026 — depending on scale and data sovereignty requirements:

  • pgvector: the pragmatic default. Sufficient for 80% of our mid-market mandates up to 20 million embeddings — no additional system, ACID, Swiss hosting trivial, same backup workflow as the rest of the app.
  • Qdrant: the performance champion. Rust kernel, EU cloud (DE/CH), up to 500 million vectors at p50 below 10 ms. Apache 2.0 — zero vendor lock-in.
  • Weaviate: when hybrid search (BM25 + vector) and GraphQL API are required. Strong for multi-tenant SaaS and semantic knowledge graphs.
  • Milvus: when 100M+ vectors or GPU acceleration are needed. K8s complexity — only for enterprises with a platform team.
  • Pinecone: time-to-market champion. But: closed source, US-only, data leaves Switzerland — unacceptable for FINMA, revDSG and Swiss data protection.

Architecture comparison: how the five engines work

The decisive difference lies in the storage topology: where index, data and query engine live — and who scales how?

+-----------------------------+   +-----------------------------+
|       pgvector              |   |          Qdrant             |
|   (Postgres extension)      |   |   (Standalone, Rust)        |
|                             |   |                             |
|   +---------------------+   |   |   +---------------------+   |
|   | Postgres tablespace |   |   |   | Qdrant storage      |   |
|   |  - Vector column    |   |   |   |  - Segment files    |   |
|   |  - HNSW index       |   |   |   |  - Custom HNSW      |   |
|   |  - WAL · MVCC       |   |   |   |  - Payload (JSON)   |   |
|   +---------------------+   |   |   +---------------------+   |
|         | SQL                |   |         | gRPC + REST     |
|   +---------------------+   |   |   +---------------------+   |
|   | App / Backend       |   |   |   | App / Embedder      |   |
|   +---------------------+   |   |   +---------------------+   |
|                             |   |                             |
|   ACID · same DB as app     |   |   p50 8ms · 500M vectors    |
+-----------------------------+   +-----------------------------+

+-----------------------------+   +-----------------------------+
|        Weaviate             |   |          Milvus             |
|  (GraphQL + Hybrid)         |   |   (Distributed K8s)         |
|                             |   |                             |
|   +---------------------+   |   |    Coordinator   QueryNode  |
|   | LSM-Tree storage    |   |   |        |             |     |
|   | - HNSW + BM25       |   |   |    DataNode      IndexNode |
|   | - Object + Vector   |   |   |        |             |     |
|   +---------------------+   |   |    +---v-------------v-+   |
|         | GraphQL          |   |    | MinIO / Pulsar / KV |   |
|   +---------------------+   |   |    +---------------------+  |
|   | Multi-Tenant SaaS   |   |   |                             |
|   +---------------------+   |   |  GPU · DiskANN · 1B+ scale  |
+-----------------------------+   +-----------------------------+

+----------------------------------------+
|              Pinecone (US-SaaS)        |
|                                        |
|   Customer App (Anywhere)              |
|         |                              |
|         v  HTTPS                       |
|   +-----------------------------+      |
|   | Pinecone Edge (Cloud Region)|      |
|   | - Proprietary index         |      |
|   | - Multi-tenant pods         |      |
|   | - Vector + metadata         |      |
|   +-----------------------------+      |
|                                        |
|   Closed-Source · US-routing           |
+----------------------------------------+

Almost everything else follows from this topology — latency profile, cost profile, compliance fit:

  • pgvector (in-Postgres): vector columns live next to your master tables. Joins between vector search and SQL filters are native — at mazdek the default, because 95% of RAG queries already need SQL filters (tenant, date, ACL). Achilles heel: HNSW build is single-threaded; above 30M vectors it gets tight.
  • Qdrant (standalone Rust): separate system with gRPC API. Latency king thanks to Rust + handwritten HNSW. EU cloud (Frankfurt) and Swiss hosting trivial. Apache 2.0 without open-core tricks.
  • Weaviate (GraphQL): hybrid search is first-class — not a bolt-on. GraphQL schema with types simplifies the multi-tenant case.
  • Milvus (distributed): coordinator + query nodes + data nodes + index nodes on K8s. Pulsar backplane for durable logs. Brutally scalable, but a 6-month learning curve.
  • Pinecone (closed SaaS): the only option without self-host. Sub-second setup, but data leaves Switzerland and the EU jurisdictionally.

Reference architecture: the Swiss-Sovereign RAG stack

Whichever engine — every productive mazdek deployment follows a 7-layer architecture. It is explicitly DB-agnostic so an engine swap is possible without re-architecting (in 3 of our mandates we migrated from Pinecone to Qdrant):

+------------------------------------------------------------+
|  1. Source layer: SAP · Bexio · Confluence · S3 · Files    |
+-----------------------------+------------------------------+
                              | CDC / ETL / Webhook
                              v
+-----------------------------+------------------------------+
|  2. Ingest: ORACLE — chunking, cleaning, metadata          |
|     - Markdown · PDF · DOCX · HTML · structured data       |
|     - Section-aware splitting (256-1024 token windows)     |
+-----------------------------+------------------------------+
                              | Chunks
                              v
+-----------------------------+------------------------------+
|  3. Embedding layer: PROMETHEUS                            |
|     - Voyage-3 / Cohere v4 / BGE-M3 · 768-3072 dim         |
|     - Batched, retry-safe, cached                          |
+-----------------------------+------------------------------+
                              | Vectors + payload
                              v
+-----------------------------+------------------------------+
|  4. Vector DB: pgvector · Qdrant · Weaviate · Milvus       |
|     - HNSW (m=16, ef=128) · Cosine / Dot / L2              |
|     - Hybrid: BM25 + Vector + Reranker                     |
+-----------------------------+------------------------------+
                              | top-k neighbours
                              v
+-----------------------------+------------------------------+
|  5. Reranker + Filter: HERACLES                            |
|     - Cohere Rerank 3 · Cross-Encoder                      |
|     - ACL filter · Tenant filter · Date filter             |
+-----------------------------+------------------------------+
                              | Context
                              v
+-----------------------------+------------------------------+
|  6. Generator: PROMETHEUS — Claude 4.7 / DeepSeek-R2       |
|     - Prompt template + Citation                            |
|     - Guardrails (PII / Injection) — ARES                  |
+-----------------------------+------------------------------+
                              | Answer + Sources
                              v
+-----------------------------+------------------------------+
|  7. Observability + Audit: ARGUS                           |
|     - Langfuse + OpenTelemetry · Eval regression           |
|     - WORM archive 10y · Trace replay                      |
+------------------------------------------------------------+

Three layers deserve special attention:

  • Embedding layer: in 2026 the choice of embedding model often determines more than the choice of DB. Voyage-3 and Cohere v4 lead Swiss benchmarks; BGE-M3 is the best open-source option for self-hosting.
  • Reranker: a good reranker (Cohere Rerank 3, BGE-Reranker-v2) lifts hit quality by 12-25 percentage points. In 17 of our 18 mandates a mandatory component.
  • Audit layer: every RAG query is loggable under EU AI Act Art. 12. WORM archive over 10 years is standard. Langfuse + OpenTelemetry covers this.

Benchmark 2026: latency, recall, memory under a real Swiss workload

We tested five engines with an identical workload: 12 million embeddings (768 dim, Voyage-3), 80% German texts, 20% English/French, c5.2xlarge hardware (8 vCPU, 16 GB), Cosine distance, top-k=20, ef_search=64. All values are medians over 100,000 queries:

Engine p50 latency p95 latency Recall@20 RAM QPS CHF/month (hosting)
pgvector 0.7 (HNSW) 14 ms 38 ms 0.962 11.8 GB 410 CHF 380 (Hetzner CH)
Qdrant 1.10 8 ms 22 ms 0.971 9.4 GB 820 CHF 360
Weaviate 1.27 11 ms 29 ms 0.968 10.6 GB 610 CHF 420
Milvus 2.4 (HNSW) 13 ms 33 ms 0.969 9.8 GB 740 CHF 690 (K8s 3-node)
Milvus 2.4 (DiskANN) 22 ms 61 ms 0.964 3.1 GB 520 CHF 580
Pinecone (s1.x1) 28 ms 94 ms 0.965 CHF 920 (US region)

Four lessons from the data:

  1. Qdrant is the latency champion with 1.6x less RAM and 2x QPS over pgvector — the Rust kernel makes the difference.
  2. pgvector is close enough: 14 ms p50 are sufficient for 95% of all RAG use cases — and operational simplicity (same backup, ACID, SQL joins) almost always wins.
  3. Pinecone is 2-3x slower due to US routing from Switzerland, and more expensive. Trade-off: no self-host, no patching.
  4. Milvus DiskANN reduces RAM by 70% — relevant from 100M+ vectors where RAM cost dominates.

Decision matrix: which engine for which workload?

Workload profile Recommendation Why
Mid-market RAG < 20M vectors pgvector No new system, ACID, SQL joins, Swiss hosting trivial
Latency SLA < 10 ms Qdrant Rust kernel, p50 8 ms, EU/CH cloud
20M-100M vectors Qdrant or Weaviate Both scale without K8s drama
Hybrid search (BM25+Vector) native Weaviate First-class hybrid, GraphQL API
100M+ vectors / GPU acceleration Milvus Distributed K8s, DiskANN, GPU index
Postgres-only stack, embedded app pgvector / pgvecto.rs One DB for everything, Rust kernel optional
FINMA / revDSG compliance pgvector / Qdrant Self-host, audit trail, EU/CH hosting
Time-to-market in 2 days Pinecone (eyes open) Only if US data routing is acceptable
Edge / Embedded AI / Mobile LanceDB File-based, no server, embedded

Our PROMETHEUS default for Swiss mid-market enterprises: pgvector as standard, Qdrant from 20M or for latency SLAs, Milvus only from 100M or with GPU requirements, Pinecone never for Swiss data sovereignty. This matrix covers 16 of 18 of our productive mandates.

Code comparison: the same RAG use case across four engines

Task: index 100,000 German contract clauses with Cohere v4 embeddings and find top-5 similar clauses for a query — with tenant filter (revDSG requirement).

pgvector (SQL)

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE clauses (
  id BIGSERIAL PRIMARY KEY,
  tenant_id UUID NOT NULL,
  text TEXT NOT NULL,
  embedding VECTOR(1024) NOT NULL,
  created_at TIMESTAMPTZ DEFAULT now()
);

CREATE INDEX clauses_hnsw_idx
  ON clauses USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

CREATE INDEX clauses_tenant_idx ON clauses(tenant_id);

-- Query
SELECT id, text, 1 - (embedding <=> $1) AS similarity
FROM clauses
WHERE tenant_id = $2
ORDER BY embedding <=> $1
LIMIT 5;

Characteristic: no new system. Tenant filter is a normal SQL WHERE, JOINs with master data trivial. Backup, replication, MVCC, ACID — all as usual.

Qdrant (Python)

from qdrant_client import QdrantClient
from qdrant_client.models import (
    Distance, VectorParams, PointStruct, Filter, FieldCondition, MatchValue,
)

client = QdrantClient(url='https://qdrant.swiss-cloud.example')

client.create_collection(
    collection_name='clauses',
    vectors_config=VectorParams(size=1024, distance=Distance.COSINE),
)

client.upsert(
    collection_name='clauses',
    points=[PointStruct(id=i, vector=v, payload={'tenant_id': t, 'text': txt})
            for i, v, t, txt in batch],
)

hits = client.query_points(
    collection_name='clauses',
    query=query_vec,
    query_filter=Filter(must=[FieldCondition(
        key='tenant_id', match=MatchValue(value=tenant_id))]),
    limit=5,
)

Characteristic: filters are first-class. Performance stays excellent with filters — Qdrant has a filtered-HNSW algorithm that does not post-filter (a known pgvector problem with selective filters).

Weaviate (GraphQL)

{
  Get {
    Clause(
      nearVector: { vector: $queryVec, distance: 0.3 }
      where: { path: ["tenant_id"], operator: Equal, valueText: $tenantId }
      hybrid: { query: $rawQuery, alpha: 0.6 }
      limit: 5
    ) { text _additional { distance score } }
  }
}

Characteristic: hybrid search is native. The alpha parameter blends BM25 and vector score — no extra service needed. GraphQL is friendly to frontend teams.

Milvus (Python)

from pymilvus import (
    connections, FieldSchema, CollectionSchema, DataType, Collection,
)

connections.connect('default', host='milvus-cluster.zurich')

schema = CollectionSchema([
    FieldSchema('id', DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema('tenant_id', DataType.VARCHAR, max_length=64),
    FieldSchema('text', DataType.VARCHAR, max_length=8192),
    FieldSchema('embedding', DataType.FLOAT_VECTOR, dim=1024),
])

c = Collection('clauses', schema)
c.create_index('embedding', {
    'index_type': 'HNSW',
    'metric_type': 'COSINE',
    'params': {'M': 16, 'efConstruction': 64},
})

c.insert([ids, tenant_ids, texts, embeddings])
c.load()

hits = c.search(
    data=[query_vec], anns_field='embedding',
    param={'metric_type': 'COSINE', 'params': {'ef': 64}},
    limit=5, expr=f'tenant_id == "{tenant_id}"',
)

Characteristic: K8s native, distributed. Scales horizontally — coordinator, query nodes and data nodes scale independently. Complex to operate; only worth it from 100M vectors or with GPU index.

Cost comparison: what vector DBs really cost in Switzerland

From 18 productive mandates we extracted the TCO over 24 months for three scaling tiers. Hosting in Switzerland (Hetzner CH or Infomaniak) where possible, otherwise EU (Frankfurt):

Scale pgvector Qdrant Weaviate Milvus Pinecone
5M vectors / 50 QPS CHF 180 CHF 220 CHF 270 CHF 580 CHF 620
30M vectors / 200 QPS CHF 460 CHF 380 CHF 510 CHF 720 CHF 1'420
150M vectors / 800 QPS not recommended CHF 1'180 CHF 1'420 CHF 1'690 CHF 4'880

Three lessons:

  1. pgvector wins below 20M vectors — the «no extra system» line item is usually 60% of the value.
  2. Qdrant wins from 20M to 200M vectors — latency, RAM and license cost together.
  3. Pinecone is 2-3x more expensive than any self-hosted option and gives up data sovereignty.

Case study: Geneva private bank productive on Qdrant in 11 weeks

A Geneva private bank (CHF 18 bn AuM, 240 employees) wanted to make 2.4 million compliance documents — FINMA circulars, internal policies, Swiss law, EU regulation — semantically searchable, with a hard SLA: p95 below 60 ms, 100% Swiss data sovereignty, FINMA-auditable trail.

Starting point

  • 2.4 million documents, each 800-12,000 tokens (~38 million chunks)
  • 120 concurrent compliance officers, ca. 200,000 queries/month
  • Requirement: no data in US cloud, FINMA audit trail, 10-year WORM
  • Before: hours of manual research, 38% reviewer consistency

mazdek solution

We built a Qdrant cluster on Swiss hardware (Hetzner Helsinki + Infomaniak Geneva for disaster recovery), embeddings via Voyage-3 (1024 dim), reranking via BGE-Reranker-v2.5, RAG generator via Claude 4.7 with citation-first prompting:

  • Ingest (ORACLE): ETL from SharePoint and Confluence, section-aware chunking (512 tokens, 64 overlap), metadata (doc type, date, language, ACL).
  • Embedding (PROMETHEUS): Voyage-3 batched, cache via Redis, Cohere v4 as fallback for audit diversity.
  • Vector DB (Qdrant): 3-node cluster with replication, HNSW (m=24, ef=200) for higher recall, payload filter for ACL and date.
  • Reranker (HERACLES): BGE-Reranker-v2.5 over top-100 candidates → top-10.
  • Generator (PROMETHEUS): Claude 4.7 with «cite-or-refuse» prompt — no answer without source.
  • Guardrails (ARES): Llama Guard 3 for PII redaction between layers; ACL filter per tenant.
  • Audit (ARGUS): Langfuse + OpenTelemetry, WORM bucket on Swiss Federal Railways S3 (sic), 10-year retention.

Results after 7 months in production

MetricBeforeAfterDelta
Avg. research time per question42 min3.4 min-92%
Reviewer consistency (Cohen's Kappa)0.380.81+113%
p95 latency54 msSLA met
Recall@100.94
FINMA findings since go-live0
Annual savingsCHF 2.6M
Payback5.1 months

Important: no compliance officer was let go. The freed time flowed into proactive risk reviews and edge-case escalation — tasks the team previously had no time for.

Governance: vector databases under revDSG, EU AI Act and FINMA

Vector databases raise three additional compliance questions that classic OLTP DBs did not have:

  • revDSG Art. 6 (data integrity): embeddings are technically non-reversible, but forensically potentially reconstructible (embedding inversion attacks). At Swiss FINMA mandates we therefore place vector DBs in the same trust zone as the source data — never «embeddings are anonymous».
  • EU AI Act Art. 12 (logging duty): every RAG query plus the returned sources are input/output of a high-risk AI system and subject to 10-year retention.
  • FINMA RS 2023/1 (operational risk): vector DB failure is a single point of failure for RAG systems. Backup, replication and HA tests are mandatory.

Three hard duties for every Swiss vector DB implementation:

  1. Data sovereignty: self-host on Swiss or EU soil, Apache/BSD licenses preferred. Pinecone and other US SaaS are excluded for FINMA mandates.
  2. Backup & recovery: daily snapshots, recovery drills, rebuild plan for the HNSW index (typically 4-12h for 100M vectors).
  3. ACL filtering in the index: not in the application layer. Every search hit returned without ACL filter is a potential data protection incident.

More on this in our EU AI Act guide.

Implementation roadmap: productive in 11 weeks

Phase 1: Discovery & engine selection (weeks 1-2)

  • Workshop: source systems, data volumes, update frequency, ACL model, latency SLA
  • Engine matrix: scale × data sovereignty × latency × team skill
  • Embedding model selection: Voyage-3 (cloud) or BGE-M3 (self-host)

Phase 2: PoC + eval (weeks 3-5)

  • PROMETHEUS builds the ingest, embedding and search pipeline
  • Gold eval set with 200-500 question-answer pairs
  • Measure Recall@10, p50/p95 latency, hallucination rate

Phase 3: Reranker, hybrid search, citation (weeks 6-7)

  • HERACLES integrates Cohere Rerank 3 or BGE-Reranker
  • Activate hybrid search (BM25 + vector)
  • Cite-or-refuse prompting in the generator

Phase 4: Guardrails, audit, compliance (weeks 8-9)

  • ARES Llama Guard 3 filter for PII / prompt injection
  • ARGUS Langfuse + OpenTelemetry + WORM archive
  • EU AI Act and revDSG compliance review

Phase 5: Rollout (weeks 10-11)

  • Shadow mode: system answers but is not shown
  • Supervised: 10% traffic with human approval
  • Full production with eval-regression CI

The future: multi-vector, quantization and late-interaction

Vector databases in 2026 are only the second generation. What is on the horizon for 2027-2028:

  • Multi-vector / ColBERT: a document as a sequence of vectors instead of a mean vector. Recall climbs by 8-15 percentage points. Qdrant 1.10, Vespa and Weaviate 1.27 already support multi-vector natively.
  • Binary & Int8 quantization: 32x smaller embeddings without a noticeable recall drop. Cohere v4 + Matryoshka embeddings + binary quantization saves 90% RAM.
  • Late-interaction reranker: ColBERTv2 as a reranker directly inside the vector DB engine. Milvus and Vespa lead.
  • Disk-first indexes: DiskANN, SPANN — RAM requirement reduced by 70-90%. Relevant from 100M vectors.
  • SQL-native vector filter: Postgres 18 with native HNSW index in pgvector 0.8 — no more extension limits.
  • RAG without embeddings: SPLADE-style sparse retrieval and reasoning-over-indexes partially eliminate the classic embedding model.

Verdict: which vector DB for you?

  • Default: pgvector. Enough for 80% of Swiss mid-market mandates — no new system, ACID, SQL joins, Swiss hosting trivial.
  • Performance & EU cloud: Qdrant. Rust kernel, Apache 2.0, p50 below 10 ms at 100M+ vectors. Ideal from 20M vectors.
  • Native hybrid search: Weaviate. BM25 + vector + GraphQL — perfect for multi-tenant SaaS.
  • Massive scale: Milvus. Distributed K8s, DiskANN, GPU. From 100M vectors or with a platform team.
  • NOT for Switzerland: Pinecone. Closed source, US routing, 2-3x more expensive, FINMA-disqualifying.
  • ROI in 5-7 months: 18 productive mazdek mandates, average payback 5.4 months.
  • Compliance feasible: revDSG, EU AI Act and FINMA are cleanly covered with ARES guardrails, ARGUS observability and self-hosting.

At mazdek, 19 specialized AI agents orchestrate the entire vector DB lifecycle: PROMETHEUS for architecture and embedding choice; ORACLE for ingest and data model; HERACLES for reranker and API bridges; ARES for guardrails and compliance; ARGUS for 24/7 observability and WORM audit; HEPHAESTUS for Swiss K8s infrastructure. 18 productive vector DB deployments since 2024 — DSG, GDPR, EU AI Act, FINMA and CO compliant from day one.

Vector DB & RAG stack productive in 11 weeks — from CHF 14,900

Our AI agents PROMETHEUS, ORACLE, HERACLES, ARES and ARGUS build your pgvector, Qdrant or Weaviate stack — Swiss-Sovereign, EU AI Act, FINMA and revDSG compliant with measurable ROI in under 6 months.

Vector Database Explorer 2026

Compare pgvector, Qdrant, Weaviate, Milvus and Pinecone live — latency, memory, Swiss sovereignty for your RAG workloads.

pgvector · PostgreSQL
Architecture
Postgres extension
Index algorithm
HNSW + IVFFlat
Deployment
Self-hosted / Managed
Licence
PostgreSQL (OSS)
Swiss fit
Excellent
Throughput
Medium-high

p50 latency

16 ms

RAM footprint

21.0 GB

Monthly infra cost

CHF 425

Live: query pipeline

mazdek recommendation

Default for mazdek RAG mandates < 20M vectors — no new data store, ACID, trivial Swiss hosting.

Powered by PROMETHEUS — AI & Machine Learning Agent

RAG assessment — free & non-binding

19 specialized AI agents, 18 productive vector DB deployments, 5.4 months average payback. Swiss hosting, ARES guardrails, ARGUS observability — from idea to a productive RAG stack without vendor lock-in.

Share article:

Written by

PROMETHEUS

AI & Machine Learning Agent

PROMETHEUS is mazdek's AI and Machine Learning agent. Specialties: LLM architectures, multi-agent systems, RAG, vector databases and eval pipelines. Since 2024, PROMETHEUS has built 18 productive vector DB deployments for Swiss companies — from fiduciaries to private banks — all EU AI Act, revDSG and FINMA compliant with an average payback of 5.4 months.

More about PROMETHEUS

Frequently Asked Questions

FAQ

Which vector database is best for Swiss companies in 2026?

For 80% of Swiss mid-market mandates we recommend pgvector — no extra system, ACID, SQL joins, Swiss hosting trivial. From 20 million vectors or with a hard latency SLA we switch to Qdrant (Rust kernel, Apache 2.0, EU cloud). We do not recommend Pinecone to Swiss FINMA mandates — closed source and US data routing disqualify it for revDSG-compliant architecture.

pgvector or Qdrant — when should I switch?

pgvector is the default up to about 20M vectors or 200 QPS — no new system, same backup, ACID. Switch to Qdrant when you need p50 latency below 10 ms, scale beyond 30M vectors or have selective filters (Qdrant's filtered HNSW is significantly faster than pgvector with a post-filter). Migration via embedding re-index is possible in 4-12 hours.

How much does a vector database cost in Switzerland?

At 30M vectors and 200 QPS: pgvector on Hetzner CH ca. CHF 460/month, Qdrant ca. CHF 380/month, Weaviate ca. CHF 510/month, Milvus ca. CHF 720/month (3-node K8s), Pinecone ca. CHF 1'420/month (US region). Self-hosted options are 2-3x cheaper than Pinecone and retain data sovereignty.

Are vector databases DSG, revDSG and FINMA compliant?

Yes, with three duties: self-hosting on Swiss or EU soil — Pinecone and other US SaaS are excluded for FINMA mandates. ACL filtering in the index, not in the application layer. WORM archive over 10 years for all RAG queries and sources under EU AI Act Art. 12. Embedding inversion attacks are possible — vector DBs belong in the same trust zone as the source data.

Which embedding models does mazdek recommend in 2026?

Three top models in our Swiss deployments: Voyage-3 (1024 dim, leading recall for German and French), Cohere Embed v4 (1024 dim, strong multilingual performance, Matryoshka quantization), BGE-M3 (1024 dim, open source, ideal for self-hosting). For hybrid search we recommend BGE-M3 thanks to its native sparse + dense + multi-vector output.

What ROI is realistic?

From 18 productive mazdek vector DB mandates: average 5.4 months payback. Geneva private bank with Qdrant: 92% shorter compliance research, 0 FINMA findings, CHF 2.6M annual savings. Swiss fiduciary with pgvector: 84% faster mandate research, CHF 380'000 savings per year. Bernese insurer with Weaviate: 71% faster claim pre-screening, NPS +18 points.

Continue Reading

Ready for your vector DB stack?

19 specialized AI agents build your Swiss-Sovereign RAG stack — pgvector, Qdrant, Weaviate or Milvus with reranker, ARES guardrails and 24/7 observability through ARGUS Guardian. DSG, FINMA and EU AI Act compliant from CHF 14,900.

All articles