Skip to main content
Private preview. fremforge is in private preview — invited customers only. Content is still subject to change. Request access →
Sigstore PKI operator runbook

Sigstore PKI operator runbook

Operator-side reference for the fremforge keyless commit signing infrastructure (P3-SIGSTORE-COMMIT-SIGNING). Customer-facing setup is at /get-started/keyless-commit-signing/.

Architecture summary

  • Fulcio (CA) — issues 10-minute X.509 certs from OIDC tokens. Multi-issuer config; per-tenant policy enforced downstream by the pre-receive hook (not by Fulcio). Helm: sigstore/fulcio v2.9.0 (app v1.8.5). Root key in DEW KMS.
  • TSA (RFC 3161 timestamp authority) — provides the trusted-time anchor that makes 10-min certs verifiable post-expiry. Plain Deployment (no upstream Helm chart). Image sigstore/timestamp-server:v2.0.6. Root key in DEW KMS.
  • Pre-receive hook — Go binary sigstore-verify-hook mounted into the Forgejo pod via the forgejo-custom-templates ConfigMap. Verifies cert chain + TSA + SAN + per-tenant allowed-issuers. Sets Forgejo commit-status via fremforge api (POST /jobs/sigstore-commit-status).
  • Audit trail — every cert issuance is logged automatically by T Cloud Cloud Trace Service (CTS) on every KMS sign operation. The fremforge app-level audit (audit_events) records the verification result per commit.

Source-of-truth files (operator workspace — paths relative to the GitRoot, not the published docs):

  • Plan: fremforge/plan/next-up.md §P3-SIGSTORE-COMMIT-SIGNING
  • Spike: fremforge/plan/sigstore-spike-2026-05-23.md
  • Infrastructure: fremforge/forgejo/.infrastructure/sigstore/
  • Hook source: fremforge/forgejo/.infrastructure/sigstore/verify-hook/main.go

First apply (operator-only, requires bootstrap creds)

Pre-reqs:

  • fremverk/cloudplatform/.env.creds.local populated (root T Cloud admin AK/SK)
  • SWR mirrors exist:
    • swr.eu-de.otc.t-systems.com/fremforge-prd/cache-fulcio-server:v1.8.5
    • swr.eu-de.otc.t-systems.com/fremforge-prd/cache-timestamp-server:v2.0.6

Steps:

cd fremforge/forgejo/.infrastructure

# 1. Create the KMS root keys (needs ROOT admin creds).
set -a; source ../../../fremverk/cloudplatform/.env.creds.local; set +a
export TF_VAR_tcloud_ak="$TCLOUD_AK" TF_VAR_tcloud_sk="$TCLOUD_SK"
tofu apply \
  -target=opentelekomcloud_kms_key_v1.fulcio_root \
  -target=opentelekomcloud_kms_key_v1.tsa_root \
  -target=opentelekomcloud_obs_bucket.sigstore_bootstrap \
  -var-file=envs/prd.tfvars

# 2. Mint the Fulcio CA cert (offline, one-time) — uses the KMS key
#    via the OTC CLI to sign a self-signed root cert. The bootstrap
#    procedure is an operator-side step today (not yet a single script):
#    follow the smallstep / step-ca offline-root recipe with the KMS
#    key serving as the signer. Detailed procedure in the operator
#    console at /_app/_admin/runbooks → sigstore-bootstrap.
bash sigstore/bootstrap-fulcio-root.sh   # operator-supplied wrapper

# 3. Apply the remaining Tofu (no root creds needed — project deployer).
set -a; source .env.deploy.local; set +a
tofu apply -var-file=envs/prd.tfvars

# 4. Helm install Fulcio + TSA.
helm repo add sigstore https://sigstore.github.io/helm-charts
helm repo update sigstore
helm upgrade --install fulcio sigstore/fulcio \
  --version 2.9.0 \
  -f sigstore/helm/fulcio-values.yaml \
  -n sigstore-system --create-namespace
kubectl apply -f sigstore/helm/tsa-deployment.yaml

# 5. Extract roots and mount them into Forgejo's hook ConfigMap.
kubectl -n sigstore-system get secret fulcio-server -o jsonpath='{.data.tls\.crt}' | base64 -d > /tmp/fulcio-root.pem
kubectl -n sigstore-system get secret tsa-server-secret -o jsonpath='{.data.chain\.pem}' | base64 -d > /tmp/tsa-root.pem
kubectl -n fremforge-prd create configmap sigstore-trust-roots \
  --from-file=fulcio-root.pem=/tmp/fulcio-root.pem \
  --from-file=tsa-root.pem=/tmp/tsa-root.pem \
  -o yaml --dry-run=client | kubectl apply -f -

# 6. Build + push the verify-hook binary to SWR.
cd sigstore/verify-hook
docker build -t swr.eu-de.otc.t-systems.com/fremforge-prd/sigstore-verify-hook:r1 .
docker push   swr.eu-de.otc.t-systems.com/fremforge-prd/sigstore-verify-hook:r1
cd ../..

# 7. Sync custom templates (mounts the hook script + binary into Forgejo).
bash scripts/sync-custom.sh

# 8. Smoke test: sign a test commit against a smoke tenant.
git config gitsign.fulcio https://sign.frem.sh
git config gitsign.timestamp-server-url https://tsa.frem.sh
GIT_TRACE=1 git commit -S --allow-empty -m "sigstore smoke test"
git push  # → pre-receive hook fires; commit-status appears

Per-tenant onboarding

Fully self-service since r109. The tenant admin enters the issuer URL in their own admin UI; fremforge reconciles Fulcio’s config inline + restarts the deployment. No operator action required.

Customer flow

  1. Tenant admin opens /<slug>/_admin/auth-policyAllowed OIDC issuer URLs field.
  2. Pastes the OIDC issuer URL (e.g. https://login.microsoftonline.com/<entra-tenant-id>/v2.0).
  3. The api’s reconcileSigstoreFulcioConfig runs inline on the POST: reads every active tenant’s allowlist from tenant_security_policies.allowed_oidc_issuers_json, builds the Fulcio multi-issuer config, diffs against the live fulcio-server-config ConfigMap in sigstore-system, and patches + triggers a rollout-restart if the issuer set actually changed. Convergence is ~30 seconds (rollout-restart + new pod ready).
  4. Customer’s first gitsign mint against the new issuer succeeds.

Safety net

A 5-min CronJob (sigstore-issuer-reconcile) hits the same reconciler endpoint so any drift (manual ConfigMap edit, DB row added via direct SQL) is corrected within the next sweep. Audit-emits tenant.auth_policy.allowed_oidc_issuers.updated on every tenant-admin change.

Manual override (incident recovery only)

If the reconciler is itself down (api outage) and a customer urgently needs Fulcio to accept a new issuer, fall back to the direct kubectl edit:

kubectl -n sigstore-system edit configmap fulcio-server-config
# Add under OIDCIssuers:
#   "https://login.microsoftonline.com/<id>/v2.0":
#     issuer-url: "https://login.microsoftonline.com/<id>/v2.0"
#     client-id: sigstore
#     type: email
kubectl -n sigstore-system rollout restart deployment/fulcio-server

The next reconciler run will reconcile back to whatever the DB says — so update tenant_security_policies.allowed_oidc_issuers_json for the affected tenant before walking away.

Two-layer model

Per-tenant signing policy is independent of Fulcio’s allowlist: Fulcio MUST trust the issuer for cert minting; the pre-receive verify-hook MUST allow the issuer for verification. The reconciler keeps both layers aligned by reading the same allowed_oidc_issuers_json column — there’s no path where one layer drifts from the other.

Bypass procedure (Fulcio/TSA outage)

If Fulcio or TSA is down for longer than gitsign’s cert cache window (~10 min):

Canonical path — operator console UI

  1. Open /_app/_admin/sigstore-bypass (operator-authed, under Security — posture). The page lists every tenant where the signing gate is on OR bypass is currently active.
  2. For each affected tenant: type the incident ticket / SEV reference in the reason field and click Set bypass. Rows with bypass on float to the top with an orange tint so the active set is always visible.
  3. The pre-receive verify-hook starts skipping the chain check for that tenant on the next push. Audit row tenant.auth_policy.sigstore_bypass.set is emitted with the operator’s email + reason; LTS warn-level log line sigstore_bypass_set surfaces in default queries.
  4. After recovery: click Clear bypass on each row. Emits tenant.auth_policy.sigstore_bypass.cleared + warn-level sigstore_bypass_cleared.

The “Signing verification BYPASSED” warning banner is visible to tenant admins on /<slug>/_admin/auth-policy for the duration.

Fallback — direct SQL (only if the operator console itself is down)

If the admin console is unreachable (api outage, not just Fulcio), set via SQL:

UPDATE tenant_security_policies
SET signing_check_bypass = true,
    signing_check_bypass_reason = 'Fulcio outage <ticket-id>',
    signing_check_bypass_set_at = now(),
    signing_check_bypass_set_by = 'operator:<your-email>',
    updated_at = now(),
    updated_by = 'operator:<your-email>'
WHERE tenant_id IN (
  SELECT id FROM tenants WHERE org_slug IN ('acme-corp', 'globex-inc')
);

Reset post-recovery:

UPDATE tenant_security_policies
SET signing_check_bypass = false,
    signing_check_bypass_reason = NULL,
    signing_check_bypass_set_at = NULL,
    signing_check_bypass_set_by = NULL,
    updated_at = now(),
    updated_by = 'operator:<your-email>'
WHERE tenant_id IN (...);

The SQL fallback does NOT emit an audit row (the operator-console route is the audit-emit path). If you take this path, manually backfill an audit row or note the action in the incident postmortem.

KMS root key rotation

Yearly rotation, scheduled per the fremverk key-rotation calendar. Procedure:

  1. Tofu apply with -replace=opentelekomcloud_kms_key_v1.fulcio_root (and tsa_root) to mint new key versions.
  2. Re-bootstrap the Fulcio cert with the new key version.
  3. Forgejo pod restart picks up the new root via the ConfigMap.
  4. Existing certs issued under the old root remain valid until they expire (10-minute window) — no customer-facing impact.

CES alarms to set up

  • fremforge-prd-fulcio-issuance-error-rate — Fulcio 5xx rate > 1/min for 5 min
  • fremforge-prd-tsa-latency-p99 — TSA p99 > 2s for 5 min
  • fremforge-prd-fulcio-kms-quota — KMS API call rate > 80% of quota
  • fremforge-prd-signing-bypass-set — audit event tenant.auth_policy.signing_bypass.enabled fires

(These alarms aren’t in Tofu yet — Phase 5 follow-up; add when first regulated customer onboards.)

Verification path resilience

The pre-receive hook needs ONLY the Fulcio root cert + TSA root cert (both mounted via the sigstore-trust-roots ConfigMap). It does not call Fulcio or TSA at verification time. Result:

  • Fulcio outage → existing signed commits still verify; new signing attempts fail
  • TSA outage → existing signed commits still verify (their TSA tokens are baked in); new signing fails
  • Both outages → no push blocking unless require_signed_commits is set, in which case use the bypass procedure above
  • Pre-receive hook crash → push lands (Forgejo fails-OPEN per default), but no commit-status. Manually re-trigger by amending + re-pushing post-recovery.