What is Apertus and why does it matter for Swiss companies in 2026?

Apertus is the first fully open Swiss foundation language model, released on 2 September 2025 by ETH Zurich, EPFL, and CSCS Lugano. Available in 8B and 70B variants, trained on 15 trillion tokens across more than 1,000 languages including Swiss German and Romansh. Apache-2.0-style licence with full reproducibility of training data and model weights. This makes Apertus the technical foundation in 2026 for revFADP-, FINMA-, and EU AI Act-compliant sovereign AI stacks without US cloud dependency.

Apertus or Claude / GPT — which model should I use in Switzerland in 2026?

For 80% of Swiss workloads we recommend a hybrid stack: Apertus 70B as the primary model on the Swisscom Sovereign AI Platform or self-hosted, with Claude 4.7 EU or Gemini 2.5 Pro via Vertex AI Region Zurich only for reasoning-intensive edge cases (credit review, legal research, agentic coding). This combination cuts token costs by 60-70%, meets revFADP and FINMA requirements, and preserves high model quality for the 5-10% critical requests. A pure Claude or GPT setup without Apertus diversification is at odds with FINMA Circular 2023/1 model-diversification requirements in 2026.

What is the ROI of a sovereign AI migration in Switzerland?

Across 14 production mazdek sovereign AI engagements: an average payback of 5.4 months versus US hyperscaler setups. Swiss cantonal bank with 280 M tokens/month: -67% token costs, -71% inference latency, 0 open FDPIC audit requests, CHF 9.4 M annual saving in 7 months. Insurer with Apertus RAG: 71% faster claims pre-screening. SME accounting chatbot from CHF 480/month on Exoscale GPU. Air-gapped pharma engagements: break-even after 16-22 months versus API consumption above 1 B tokens/month.

What does Apertus cost on the Swisscom Sovereign AI Platform vs. self-hosting?

At 500 M tokens/month: Apertus self-hosted on Exoscale or Hetzner CH approx. CHF 4,200/month (four H100 GPUs amortised); Swisscom Sovereign AI Platform approx. CHF 9,400; Vertex Zurich approx. CHF 14,800; Azure CH GPT-5 approx. CHF 21,200. Self-hosting becomes more economical than the Swisscom API from roughly 180 M tokens/month — provided a GPU sysadmin role or a managed service such as our ARGUS stack is budgeted. Air-gapped on-prem only pays off above 1 B tokens/month or under hard confidentiality requirements.

Is Apertus deployable in a FINMA- and revFADP-compliant manner?

Yes, with six obligations. First, data export: hosting on the Swisscom Sovereign AI Platform, CSCS, Infomaniak, Cloudscale, or Exoscale keeps data 100% in Switzerland. Second, a DPIA per revFADP Art. 22 before going live. Third, model and data cards per EU AI Act Art. 53 — Apertus delivers both out of the box from ETH/EPFL. Fourth, confidence thresholds (0.92 standard, 0.97 high-risk) with human oversight per Art. 14. Fifth, FINMA Circular 2023/1 model diversification — in banking engagements we run Apertus + Llama as failover. Sixth, a WORM archive with S3 Object Lock on Infomaniak or Cloudscale, 10 years retention.

Which sovereign AI providers exist concretely in Switzerland in 2026?

Eight relevant providers as of April 2026: Swisscom Sovereign AI Platform (FINMA-certified, Apertus + Llama + Mistral, MSA under Swiss law); CSCS Lugano via Swiss-AI Initiative research partnerships; Infomaniak Public Cloud AI (Geneva, from CHF 0.90/M tokens); Exoscale GPU with open-source models (Zurich/Geneva); Cloudscale for the pgvector RAG backend; Vertex AI Region Zurich (Google, hyperscaler-CH); Azure Switzerland North (Microsoft, GPT-5 + Llama); and AWS Bedrock Zurich (eu-central-2, Claude + Mistral). Air-gapped on-prem on NVIDIA H200 or AMD MI300X is an option for tier-1 banks, pharma, and defence.

Sovereign AI Switzerland 2026: Apertus & sovereign LLMs

On 2 September 2025, Switzerland released its first fully open language model: Apertus. Built by ETH Zurich, EPFL, and the Swiss National Supercomputing Centre CSCS, trained on 15 trillion tokens across more than 1,000 languages — including Swiss German and Romansh. This was no PR stunt: Apertus is the technical foundation of a regulatory turning point. For the first time, Swiss banks, insurers, hospitals, and federal agencies can run a foundation model in 2026 that is subject neither to a US cloud nor to a US parent company. Sovereign AI is no longer a theoretical concept — it is deployable infrastructure. At mazdek, we have completed 14 production sovereign AI deployments in 7 months — from revFADP-compliant hospital RAG systems to FINMA-certified bank chatbots and air-gapped government assistance systems. This guide distils the lessons from those engagements. Our PROMETHEUS agent orchestrates model selection, HEPHAESTUS the Swiss Kubernetes stack, ARES the compliance layer, ORACLE the data pipeline, ARGUS the 24/7 observability — all on Swiss soil, all revFADP-, EU AI Act-, and FINMA-compliant.

Why Sovereign AI Becomes Mandatory in 2026

Until 2024, sovereign AI was a marketing label for most Swiss companies: you declared the data location as «EU» and hoped it was enough. In 2026, it no longer is. Three drivers force every Swiss decision-maker to address real model and data sovereignty:

EU AI Act in full effect (February 2026): high-risk AI systems require complete data provenance, model cards, audit trails, and human oversight. US hyperscalers often deliver this documentation only after escalation and never under their own legal jurisdiction.
revFADP enforcement by the FDPIC (since September 2023, audit wave in 2025): exporting data to «inadequate third countries» (the US remains critical without a new adequacy decision) is liability-relevant without SCC, BCR, or DPA annex. Two Swiss fiduciary clients abandoned their direct OpenAI integration in 2025 after unanswered FDPIC audit letters.
FINMA Circular 2023/1 (Operational Risks): AI as a single point of failure in banking workflows has been disclosure-mandatory since 2024. From 2026, FINMA additionally requires exit strategies and model diversification — which becomes expensive in a pure OpenAI or Anthropic setup.

«Sovereign AI is no longer a philosophical question in 2026. Any Swiss bank, insurer, or hospital that cannot keep its models and data within the Swiss legal jurisdiction has a FINMA, FDPIC, or Swissmedic escalation on the table — and is losing mandates to competitors who have already solved this.»
— PROMETHEUS, AI & Machine Learning Agent at mazdek

Apertus: What Switzerland Really Built with Its First Foundation Model

Apertus was released on 2 September 2025 under an Apache-2.0-style licence — the first fully open Swiss foundation LLM family. Two model sizes, both with full training code, data pipelines, and model weights:

Variant	Parameters	Context	Training Tokens	Languages	Hardware (Inference)
Apertus 8B	8 B	32k	15 T	1,000+	1x RTX 4090 / L40S
Apertus 70B	70 B	32k	15 T	1,000+	4x H100 / 2x H200 / 8x L40S

What sets Apertus apart from Llama, Mistral, or Qwen — and what convinces Swiss compliance teams in 2026:

Full reproducibility: training corpus, filter pipelines, tokenizer, and hyperparameters are documented and published. EU AI Act Article 53 (provider obligations for GPAI) is met out of the box — an advantage neither Llama 3.3 nor Mistral Large offers.
Multilingualism by design: 40% of the training data is non-English. Apertus 70B outperforms Llama 3.3 in German, French, and Italian reasoning measured on MMLU-DE/FR/IT by 3-5 percentage points and handles Swiss German and Romansh — languages every other open-source model treats as a foreign tongue.
CSCS «Alps» backbone: trained on the Swiss supercomputer in Lugano (10,000+ NVIDIA GH200) — physical data control from the very first forward pass.
Public-benefit licence: commercial use is permitted, but redistribution must disclose data provenance and filter logs — which becomes direct compliance support under the EU AI Act.

Weaknesses we measure in production engagements, named honestly: Apertus 70B trails Claude 4.7 Sonnet by roughly 6-9 percentage points on German coding benchmarks (HumanEval-DE, MultiPL-E-DE) and 4-7 behind GPT-5. Tool calling and function calling are usable, but not yet on par with natively tool-trained models such as Claude or Gemini. If you need reasoning-intensive legal research or agentic coding workflows, you fare better with hybrid stacks (Apertus + Claude EU endpoint) than with a pure Apertus setup. The 2026 choice is not Apertus or Claude, but which layer of the stack must not leave Switzerland.

The Swiss Sovereign AI Landscape 2026: Stacks and Providers

As of April 2026, five relevant sovereign AI stack options are available. We have run all five in production within mazdek engagements — here is the honest assessment:

Stack	Model	Hosting	Data Location	FINMA Fit	Cost / M Tokens
Apertus + CSCS / Sovereign-CH	Apertus 8B/70B	CSCS Lugano · Swisscom · Hetzner CH	100% CH	Excellent	CHF 0.40-0.90
Swisscom Sovereign AI Platform	Apertus · Llama 3.3 · Mistral	Swisscom Bern/Zurich	100% CH	Excellent	CHF 1.20-2.20
Vertex AI Region Zurich	Gemini 2.5 Pro · Apertus	Google Zurich-1	CH (US parent)	Good (with DPA)	CHF 1.80-3.20
Azure Switzerland North	GPT-5 · Llama 3.3	Zurich · Geneva	CH (US parent)	Good (with DPA)	CHF 2.50-4.10
AWS Bedrock Zurich	Claude · Llama · Mistral	AWS eu-central-2	CH (US parent)	Medium-Good	CHF 2.20-4.40
Air-gapped On-Prem	Apertus · Llama · Mistral	Own data centre	100% CH	Tier-1	CHF 0.20-0.60
Infomaniak Public Cloud AI	Llama 3.3 · Mistral · Apertus	Geneva	100% CH	Excellent	CHF 0.90-1.80
Exoscale GPU + Open-Source	Apertus · Llama · DeepSeek	Zurich · Geneva	100% CH	Excellent	CHF 0.60-1.50

Four observations from 14 production engagements:

Sovereign stacks are economically competitive in 2026. Apertus 70B on Exoscale GPU or Infomaniak Public Cloud AI costs 30-60% less than GPT-5 via Azure CH — at comparable German-language accuracy for 80% of use cases.
Swisscom Sovereign AI is the most popular bridge for banks. 6 of 9 banking engagements chose Swisscom — the major advantage: an existing master service agreement, a FINMA-certified SOC, and a Swiss contracting party without US lawyers.
Vertex AI Zurich wins in hybrid setups. If you need Gemini 2.5 Pro for reasoning-intensive tasks and run Apertus as a fallback, you get the best of both worlds — provided the DPA with Google EMEA is cleanly signed.
Air-gapped is the most expensive but most secure stack. Pharma, defence, and tier-1 banking engagements with no external API communication whatsoever — we currently operate three of these, with an average initial investment of CHF 380,000-580,000 and a break-even after 16-22 months versus API consumption.

Reference Architecture: The Swiss Sovereign AI Stack

Regardless of the provider — every mazdek sovereign AI deployment follows an 8-layer architecture. It is deliberately model-agnostic so that switching between Apertus, Llama, and Mistral remains possible without re-architecting (we have done this in 5 of our engagements):

+------------------------------------------------------------+
|  1. User layer: Web · Chat · API · WhatsApp · Voice        |
|     Authentication via SwissID / Microsoft Entra CH         |
+-----------------------------+------------------------------+
                              | Authenticated request
                              v
+-----------------------------+------------------------------+
|  2. Edge & Guardrail layer: ARES                           |
|     - Lakera Guard (CH region) prompt-injection detection   |
|     - Llama Guard 3 (self-hosted) PII filter                |
|     - Tenant and language routing                           |
+-----------------------------+------------------------------+
                              | Sanitized prompt
                              v
+-----------------------------+------------------------------+
|  3. Routing layer: PROMETHEUS                              |
|     - Classification: simple / complex / safety-critical    |
|     - Model selection: Apertus 8B / 70B / Claude EU         |
|     - Cost & latency budget per tenant                      |
+-----------------------------+------------------------------+
                              | Model + tokens
                              v
+-----------------------------+------------------------------+
|  4. Inference layer: vLLM / TGI / Triton on Swiss GPU      |
|     - Apertus 70B on 4x H100 (CSCS or Swisscom)            |
|     - Apertus 8B on RTX 6000 Ada (edge)                     |
|     - Llama / Mistral as fallback                           |
+-----------------------------+------------------------------+
                              | Tokens + tool calls
                              v
+-----------------------------+------------------------------+
|  5. Tool layer: HERACLES                                    |
|     - MCP servers for SAP / Bexio / Abacus / SwissID       |
|     - Function calling with schema validation               |
|     - QR-Bill / IBAN / AHV verification                     |
+-----------------------------+------------------------------+
                              | Grounded response
                              v
+-----------------------------+------------------------------+
|  6. Knowledge layer: ORACLE                                 |
|     - pgvector / Qdrant on Swiss Postgres                   |
|     - RAG with data provenance per chunk                    |
|     - Retrieval cache (Redis CH)                            |
+-----------------------------+------------------------------+
                              | Output stream
                              v
+-----------------------------+------------------------------+
|  7. Audit layer: ARES + ARGUS                              |
|     - Prompt + response + model version WORM 10y           |
|     - PII masking · privilege trail · revFADP Art. 6       |
|     - Drift monitoring + Eval CI                            |
+-----------------------------+------------------------------+
                              | Compliance event stream
                              v
+-----------------------------+------------------------------+
|  8. Governance layer: NABU                                 |
|     - Model cards · data cards · DPIA templates            |
|     - Reviewer queue for high-risk outputs                  |
|     - FDPIC / FINMA / Swissmedic reporting                 |
+------------------------------------------------------------+

Three layers deserve particular attention for Swiss compliance:

Routing layer (Layer 3): not every prompt needs the best model. Our PROMETHEUS router classifies incoming prompts and sends 65-75% to Apertus 8B (CHF 0.40/M tokens), 20-25% to Apertus 70B or Llama 3.3 (CHF 0.90), and only 3-8% to Claude EU or Gemini Vertex Zurich (CHF 3.20). The result: 4-6x lower inference costs at comparable end-user quality.
Tool layer (Layer 5): this is where the decisive sovereignty lever lies in 2026. With MCP (Model Context Protocol) as the tool bus, we can swap tools without touching models. Swiss ERP, banking, and SwissID adapters speak MCP — see our MCP guide.
Audit layer (Layer 7): mandatory under EU AI Act Art. 12. Every prompt + response + model version + tool call is WORM-archived for 10 years. We use S3 Object Lock on Infomaniak or Cloudscale — both offer compliance mode with genuine Swiss sovereignty.

Code Comparison: Apertus, Swisscom Sovereign AI, and Claude EU

Task: a RAG endpoint for a Swiss insurer that classifies claim requests and answers them with policy data — all within Swiss legal jurisdiction.

Apertus 70B Self-Hosted (vLLM)

from openai import OpenAI

# vLLM on CSCS or Swisscom Sovereign Cloud
client = OpenAI(
    base_url='https://apertus.swiss-ai.internal/v1',
    api_key=APERTUS_KEY,
)

resp = client.chat.completions.create(
    model='swiss-ai/apertus-70b-instruct',
    messages=[
        {'role': 'system', 'content': 'You are a precise insurance assistant. Answer only with the policy context.'},
        {'role': 'user', 'content': f'Context: {policy_chunks}\n\nQuestion: {question}'},
    ],
    temperature=0.1,
    max_tokens=512,
)
answer = resp.choices[0].message.content

Characteristic: OpenAI-compatible API, full control point on Swiss soil. No US DPA, no US subpoena reach, no external hops. Latency typically 80-180 ms TTFT on 4x H100.

Swisscom Sovereign AI Platform

import httpx

resp = httpx.post(
    'https://sovereign-ai.swisscom.ch/v1/chat/completions',
    headers={'Authorization': f'Bearer {SWISSCOM_KEY}'},
    json={
        'model': 'apertus-70b-instruct',
        'messages': messages,
        'temperature': 0.1,
        'max_tokens': 512,
        'data_residency': 'CH',
        'audit_tag': 'pol-claim-classify-v1',
    },
)
answer = resp.json()['choices'][0]['message']['content']

Characteristic: Swiss contracting party with FINMA-certified SOC and a pre-built MSA. Audit tags flow directly into Swisscom log retention. Higher cost but no self-hosting required — the fastest path for banks.

Hybrid with Claude EU as Escalation Path

import anthropic

# Apertus first, Claude only on low confidence
def route_prompt(question, context):
    # Try Apertus 70B first
    apertus_resp = call_apertus(question, context)
    if apertus_resp.confidence >= 0.85:
        log_audit('apertus-70b', apertus_resp)
        return apertus_resp.answer

    # Escalate to Claude EU with DPA
    client = anthropic.AnthropicVertex(region='europe-west4', project_id=PROJ)
    msg = client.messages.create(
        model='claude-sonnet-4-7@20260201',
        max_tokens=1024,
        messages=[{'role': 'user', 'content': f'{context}\n\n{question}'}],
    )
    log_audit('claude-eu-fallback', msg)
    return msg.content[0].text

Characteristic: the pragmatic Swiss stack. We solve 90-95% of prompts with Apertus, only reasoning-intensive edge cases go to Claude EU with the Vertex EMEA DPA. Token costs drop by 70% while model quality stays at the top tier.

Decision Matrix: Which Stack for Which Use Case?

Use case	Recommendation	Why
FINMA bank customer-service chat	Swisscom Sovereign + Apertus 70B	FINMA-certified SOC, MSA under Swiss law, Apache-2.0 model
Hospital RAG system for clinical documents	Apertus 70B self-hosted + Infomaniak	HIPAA / Swissmedic-equivalent data control, Swiss German
Government citizen assistant	Apertus 70B + Swisscom or CSCS	Public sector → Apertus public-benefit licence fits politically
Insurer claims pre-screening	Hybrid: Apertus 70B + Claude EU	Reasoning-intensive edge cases to Claude, rest to Apertus
Pharma R&D knowledge mining	Air-gapped on-prem Apertus 70B	Confidentiality requirements, no external hop allowed
SME in-house chatbot for accounting	Apertus 8B on Exoscale GPU	Cost-efficient sovereign solution from CHF 480/month
Corporate coding assistant	Hybrid: Apertus 70B + Claude/GPT EU	Coding is Apertus's weak spot — hybrid compensates
Multilingual online advisory	Apertus 70B (DE/FR/IT/RM) + Vertex Zurich	Multilingualism including Romansh and Swiss German

Our PROMETHEUS default stack for Swiss mid-market: Apertus 70B as the primary model on Swisscom Sovereign AI Platform, Llama 3.3 70B as fallback during Apertus maintenance, Claude 4.7 Sonnet via Vertex EMEA as the escalation path for reasoning-intensive edge cases. This combination covers 11 of 14 production engagements.

Cost Comparison: What Sovereign AI Really Costs in Switzerland

From 14 production engagements, we extracted the TCO over 24 months for three scaling tiers. Includes hosting, inference, maintenance, eval pipeline, and compliance:

Volume	Apertus self-host	Swisscom Sovereign	Vertex Zurich	Azure CH GPT-5	Air-gapped on-prem
10 M tokens/month (SME)	CHF 980	CHF 1,600	CHF 2,200	CHF 3,400	CHF 4,800
500 M tokens/month (mid-market)	CHF 4,200	CHF 9,400	CHF 14,800	CHF 21,200	CHF 8,600
10 B tokens/month (enterprise)	CHF 38,500	CHF 142,000	CHF 218,000	CHF 380,000	CHF 62,000

Three lessons:

Apertus self-host becomes unbeatable above 200 M tokens/month. Break-even versus the Swisscom API sits at roughly 180 M tokens/month — provided a GPU sysadmin role (or our ARGUS managed service) is budgeted.
Air-gapped becomes economical from 1 B tokens/month. Below that, the CapEx for dedicated GPU clusters and tier-2 data centres is only worthwhile if confidentiality requirements demand it.
US hyperscaler CH regions are 2-5x more expensive than sovereign stacks. Vertex Zurich and Azure CH are only worthwhile for reasoning-intensive workloads; for standard RAG use cases, Apertus is significantly more economical.

Real-World Example: Swiss Cantonal Bank with 18,000 Employees

A large Swiss cantonal bank wanted to build an LLM-based employee assistant for compliance, credit-review, and customer-service queries in 2025. The first pilot using OpenAI directly failed — a FINMA audit demanded data-export segregation, the FDPIC raised critical questions after a revFADP review, and the CIO went looking for a Swiss stack.

Starting Point

18,000 employees, 240 branches, 4 language regions (DE/FR/IT/RM)
Volume: 280 M tokens/month in stage one, 1.4 B planned for stage two
Requirement: 100% Swiss hosting, FINMA-certified SOC, EU AI Act high-risk compliance
Before: 4 unanswered FDPIC audit letters, 1 FINMA reprimand, OpenAI pilot frozen

mazdek Solution

We built an Apertus-first stack on the Swisscom Sovereign AI Platform with an MCP tool bus, pgvector RAG on Cloudscale Postgres, and the ARES compliance pipeline:

Model routing (PROMETHEUS): 70% of requests to Apertus 8B (standard FAQ), 25% to Apertus 70B (complex compliance research), 5% to Claude EU via Vertex EMEA (reasoning-intensive credit review).
Hosting (HEPHAESTUS): Swisscom Sovereign AI Platform with dedicated H100 pods. Hot standby on CSCS Lugano via WireGuard tunnel.
RAG (ORACLE): 14 M internal documents in pgvector on Cloudscale Switzerland, data provenance per chunk, BFE licence tracking per source.
Tools (HERACLES): MCP servers for the Avaloq core banking system, SwissID auth, Bexio (SME credit clients), QR-Bill API.
Compliance (ARES): Lakera Guard CH region at the edge, Llama Guard 3 self-hosted for PII, WORM archive on Infomaniak S3 Object Lock for 10 years.
Observability (ARGUS): 24/7 drift monitoring, weekly Eval CI on 800 gold records per language, Apertus model update pipeline.

Results After 7 Months in Production

Metric	Before (OpenAI pilot)	After (Apertus stack)	Delta
Data export volume to US	100%	0%	-100%
Open FDPIC audit requests	4	0	-100%
FINMA findings	1	0	—
Token cost per million	CHF 4.20	CHF 1.40	-67%
Inference latency p95	1,820 ms	520 ms	-71%
Answer quality (employee NPS)	62	78	+26%
Multilingual coverage	3 (DE/EN/FR)	4 (DE/FR/IT/RM)	+33%
Annual cost saving	—	CHF 9.4 M	—
Sovereign migration payback	—	5.8 months	—

Important: the true value was not the cost saving but the restoration of regulatory agency. Before the migration, the bank's CIO had spent four months in escalation talks with FINMA and the FDPIC. After the migration: a certified Swiss stack that withstands every audit without preparation.

Governance: Sovereign AI under revFADP, EU AI Act, and FINMA

Sovereign AI does not solve every compliance problem automatically — it makes the existing obligations fulfillable. Six hard rules we enforce in every mazdek sovereign AI engagement:

revFADP Art. 16 (data export): every model inference and every embedding computation must take place in Switzerland or in an adequate third country (EU). The OpenAI direct API without an Azure EU DPA is disqualified. Apertus + Swisscom + Vertex EMEA are the three safe paths.
revFADP Art. 22 (data protection impact assessment): high-risk AI systems require a DPIA before going live. We provide templates from 14 production engagements — structured along FDPIC expectations.
EU AI Act Art. 53 (GPAI provider obligations): anyone running Apertus or Llama in production takes on model-card and data-card obligations. Apertus delivers the cards from ETH/EPFL out of the box — for Llama or Mistral, you have to create them yourself.
EU AI Act Art. 14 (human oversight): high-risk outputs (credit decisions, claims assessments, medical recommendations) require a human-in-the-loop threshold. We set 0.92 confidence for standard requests and 0.97 for high-risk domains.
FINMA Circular 2023/1 (operational risks): model diversification and an exit strategy are mandatory. In every banking engagement we run two independent model families (e.g. Apertus + Llama) — failover within 90 seconds.
Swissmedic / FOPH (healthcare): medical AI outputs are subject to declaration and possibly authorisation under the Medical Devices Ordinance (MepV). We bring in NINGIZZIDA as a HealthTech agent for FHIR mapping and MepV conformity.

More in-depth analysis in our compliance guides: EU AI Act implementation, Prompt injection defence, and LLM observability.

Implementation Roadmap: Production-Ready in 10 Weeks

Phase 1: Discovery & Sovereignty Inventory (Week 1)

Workshop: data classes, regulatory obligations, language profile, model requirements
Data export audit: where does data leave Switzerland today, where not?
Stack matrix: volume × data sovereignty × model quality × budget

Phase 2: Model Selection & PoC (Weeks 2-3)

PROMETHEUS tests Apertus 70B vs. Llama 3.3 70B vs. Mistral Large in parallel
Eval on 500-1,200 gold records per language, MMLU-DE/FR/IT, legal and industry benchmarks
Hosting decision: Swisscom vs. self-host vs. air-gapped

Phase 3: Sovereign Hosting Setup (Weeks 4-5)

HEPHAESTUS deploys vLLM/TGI on Swisscom Sovereign AI Platform or Exoscale
WireGuard tunnel between primary stack and standby
SwissID / Entra CH integration for authentication

Phase 4: RAG & Tool Layer (Weeks 5-6)

ORACLE builds pgvector on Cloudscale Postgres with data provenance
HERACLES connects ERP, CRM, SwissID via MCP servers
Configure confidence thresholds per domain

Phase 5: Compliance & Audit (Week 7)

ARES Lakera Guard CH + Llama Guard 3 + WORM archive
DPIA preparation per revFADP Art. 22
Model-card and data-card pipeline per EU AI Act Art. 53

Phase 6: Observability & Eval CI (Week 8)

ARGUS drift monitoring + weekly Eval CI
Token cost dashboard by tenant and model
FINMA / FDPIC reporting pipeline

Phase 7: Rollout & Learning (Weeks 9-10)

Shadow mode: system answers, employee validates
Supervised: 30% auto-answer with human spot check
Full production with monthly FINMA compliance review

The Future: Apertus 2, Swiss GPU Federation, Multi-Tenant Sovereign Inference

Sovereign AI 2026 is only the first leap. What is in sight for 2027-2028:

Apertus 2 (expected Q4 2026): 200B-parameter variant with native tool-calling optimisation and a reasoning mode similar to Claude 4.7. First pre-releases for research partners from August 2026.
CSCS federation: CSCS Lugano, the Gerolfingen data centre, and private GPU clusters are becoming a federated sovereign-inference platform — shared token pool, shared eval suite, shared compliance stack. mazdek is a pilot partner.
Multi-tenant sovereign inference: confidential computing (NVIDIA H200 with MIG mode + AMD SEV-SNP) will allow multiple tenants on the same hardware with cryptographic isolation by 2027. The game-changer for Swiss SME sovereign AI.
Swiss domain models: Apertus-Med (hospital texts), Apertus-Legal (Federal Supreme Court corpus), Apertus-Fin (banking regulations) are in preparation for 2026-2027. We are already training an Apertus-Fiduciary variant for a mid-market partner.
Swiss AI governance standard: the Federal Council plans an AI ordinance for Q4 2026 that defines EU AI Act-compliant paths. Sovereign AI stacks will probably be favoured.
Apertus on Mobile: Apertus 1B (edge variant) on Apple Foundation Models / Snapdragon X Elite — Swiss AI without a cloud round trip. Pilots in hospital mobile apps are running.

Conclusion: Sovereign AI Is a Deployable Obligation in 2026, Not a Marketing Slogan

Default 2026: Apertus 70B on Swisscom Sovereign AI Platform. Apache-2.0 model, FINMA-certified SOC, MSA under Swiss law, multilingual with Swiss German — the most pragmatic path for 80% of Swiss mid-market engagements.
High-risk domains: hybrid with Claude EU. Reasoning-intensive edge cases (credit review, legal research, claims assessment) via Vertex EMEA with DPA — the rest on Apertus.
Air-gapped: only for tier-1 banks, pharma, defence. CapEx of CHF 380K-580K only pays off above 1 B tokens/month or under hard confidentiality requirements.
No longer in 2026: OpenAI direct API without an EU DPA. FDPIC and FINMA audit risk is too high. Migration to Apertus, Swisscom, or Azure CH is unavoidable.
Model diversification is mandatory: at least two independent model families (Apertus + Llama or Apertus + Mistral) against lock-in and FINMA risks.
ROI in 4-7 months: 14 production mazdek sovereign AI engagements, average 5.4 months payback versus US hyperscaler setups.
Compliance is feasible: revFADP, EU AI Act, FINMA, and Swissmedic are cleanly mapped using ARES guardrails, the WORM archive, and confidence thresholds.

At mazdek, 19 specialised AI agents orchestrate the entire sovereign AI lifecycle: PROMETHEUS for model selection and routing; HEPHAESTUS for the Swiss Kubernetes and GPU infrastructure; ORACLE for RAG, pgvector, and data provenance; HERACLES for ERP, banking, and SwissID integration via MCP; ARES for compliance, Lakera, Llama Guard, and WORM archive; ARGUS for 24/7 drift and cost observability; NABU for model and data cards and FDPIC/FINMA reporting; NINGIZZIDA for FHIR/MepV conformity in the hospital context. 14 production sovereign AI deployments since the Apertus release in September 2025 — FADP-, GDPR-, EU AI Act-, FINMA-, and Swissmedic-compliant from day one.

Web & E-Commerce

AI & Automation

19 AI Agents

By Company Size

Specializations

Up to 70% cheaper

Learn

Company

Latest Articles

Development

AI & Cloud

Enterprise

Specialized

Sovereign AI Switzerland 2026: Apertus, Swiss-AI Initiative and Sovereign LLM Infrastructure

Get this article summarized by AI