QuantumAtlas ↔ RustFS integration¶
How the Go server (
cmd/qatlasd) wires to RustFS (S3-compatible object store) for paper assets. Covers env vars, IAM policy spec, bucket layout, version lifecycle, theqatlasd storage pruneoperator command, and known RustFS-vs-MinIO quirks.Application-level upload semantics (sha256 dedup, 409 conflict behaviour,
?expected_sha256=guard) live in upload-api.md. Wider storage architecture (why we have separate Raw / Metadata / Graph layers) lives in storage-architecture.md.
Backend selection¶
internal/objstore exposes a single Store interface with two
implementations:
LocalStore— directory undercfg.RawDir(XDG default~/.local/share/qatlasd/raw/). Dev / first-boot / CI. No version concept, no presigned URLs.S3Store— RustFS / MinIO / Amazon S3, viaminio-go/v7. Production.
Selection is all-or-nothing: setting any of the four
QATLAS_S3_* env vars without setting all four is a startup
error. With all four set, the server logs
on every boot. Without them it logs
The split is in cmd/qatlasd/main.go::initRawStore and the
all-or-nothing rule is enforced by
internal/config/config.go::(*Config).ValidateForServe (called from
cmd/qatlasd/main.go::validateServeCfgAfterFlags inside the
serve RunE). config.Load itself only emits a slog.Warn on a
half-set quartet so non-serve subcommands like qatlasd --help /
qatlasd pat list keep working on a partially-configured .env.
对象存储不必跟 qatlasd 同机:支持哪些后端?¶
qatlasd 通过 minio-go/v7 走 AWS SigV4 协议,所以任何 S3-compatible 后端都能用。验证过 / 强推荐:
| 后端 | 部署形态 | 备注 |
|---|---|---|
| RustFS | 自托管(推荐) | 默认选型;NAS / VPS / k8s 都行 |
| MinIO | 自托管 | RustFS 的前身,行为接近;CE / community 版免费 |
| AWS S3 | 公有云 | 真 S3;按调用次数计费,dev 慎用 |
| Cloudflare R2 | 公有云 | 零 egress 费;endpoint = https://<account>.r2.cloudflarestorage.com |
| Backblaze B2 (S3-compat) | 公有云 | 低价 cold storage 友好 |
| Wasabi | 公有云 | 同上 |
| DigitalOcean Spaces | 公有云 | 同上 |
| Ceph RadosGW | 私有云 | SigV4 全兼容 |
| SeaweedFS (S3 gateway) | 自托管 | 跨地域 erasure coding 场景 |
部分支持(需要 path-style URL + 验证 SigV4 实现完整性):
| 后端 | 注意事项 |
|---|---|
| 阿里云 OSS(S3 兼容模式) | endpoint 用 https://oss-<region>.aliyuncs.com;某些 bucket policy 字段不支持 |
| 腾讯云 COS(S3 兼容模式) | 同上,endpoint https://cos.<region>.myqcloud.com |
| 华为云 OBS(S3 兼容) | 同上 |
不支持(不是 SigV4 协议,需要换 SDK):
- Azure Blob Storage(native API)
- Google Cloud Storage(native API)
- WebDAV / NFS / SMB / FTP(不是 S3)
- Synology Drive、Nextcloud(不是 S3)
想用 Azure / GCS 必须前面挂个 s3proxy 翻译协议;社区有用例但 QuantumAtlas 未官方验证。
分离部署:对象存储放另一台机¶
RustFS / S3 backend 跟 qatlasd 不需要同机。常见拓扑:
- NAS 模式:家用 NAS(Synology / QNAP / 自组)跑 RustFS docker;qatlasd 在 VPS / 工作站;走家用宽带反代或 EasyTier mesh
- 公有云对象存储:qatlasd 自托管,对象走 R2 / AWS S3 / Wasabi;省自管存储的运维
- 跨地域多边缘:qatlasd 在多个 edge(VPS / 不同机房),共享同一 RustFS(active-active 写入;audit 留痕由 RustFS notify 兜底)
硬件 / 容量建议(对象存储侧)¶
对象存储的瓶颈在磁盘容量和网络带宽,CPU/内存要求低:
| 用量级别 | 容量估算 | 推荐设备 |
|---|---|---|
| 个人 / 实验室 | 万篇 paper ≈ 10 GB PDF + 50 GB images = 60 GB | 任意 NAS / SBC / 廉价 VPS |
| 团队 | 10 万篇 ≈ 600 GB | 中等 NAS / 中端 VPS |
| 多边缘共享 | 取决于增长曲线,建议 EC 起步 | 高 IO NAS / 专属对象存储 VPS |
CPU 2 核 + 内存 2 GB 已绰绰有余(HDD 场景见现有 RUSTFS_DRIVE_TIMEOUT_PROFILE=high_latency 优化)。
Required env vars¶
| Var | Example | Notes |
|---|---|---|
QATLAS_S3_ENDPOINT |
http://<rustfs-internal-host>:9000 |
Must include scheme. Production prefers mesh-direct (avoids edge-Caddy self-loop). |
QATLAS_S3_BUCKET |
qatlas-raw |
Must exist; bootstrap script creates it idempotently. |
QATLAS_S3_ACCESS_KEY_ID |
CNEDAZ2HQDU9TX8A2BUO |
Service-account key (qatlasd IAM user). Never use root keys here. |
QATLAS_S3_SECRET_ACCESS_KEY |
… |
Secret printed once by bootstrap; copy directly into .env (mode 600). |
QATLAS_S3_PUBLIC_ENDPOINT (可选) |
https://<rustfs-public-host> |
公网入口,给 client presigned URL 用;留空 = 单 endpoint 模式(仅适合 dev) |
Dual-endpoint mode¶
生产部署里 server↔RustFS 走mesh / 内网(省一跳反代 + TLS 终结),但 server 内部若需给 client 签 presign URL 拿对象,必须公网可达。两者用同一份 endpoint 显然不行——所以 qatlas server 支持 dual-endpoint:
| 用途 | 走哪个 endpoint |
|---|---|
| server 内部 Put/Get/Stat/List | QATLAS_S3_ENDPOINT(internal) |
| presign URL(dev / 调试场景) | QATLAS_S3_PUBLIC_ENDPOINT(public) |
启用方法:在 .env 同时设两个:
QATLAS_S3_ENDPOINT=http://<rustfs-internal-host>:9000 # mesh 内网
QATLAS_S3_PUBLIC_ENDPOINT=https://<rustfs-public-host> # 公网(独立子域)
公网入口必须反代到内网 RustFS 端口,且 preserve Host header——SigV4 把 Host 算进 canonical request,反代改 Host 会让 RustFS 报 SignatureDoesNotMatch。最小 Caddy 模板:
详见 反向代理。
启动 log 区分两种模式:
raw store: S3 backend http://<rustfs-internal-host>:9000/qatlas-raw (presign via https://<rustfs-public-host>)
少了 (presign via ...) 那段就是单 endpoint 模式。
每台边缘各自配自己的 public endpoint,不共享:
- Edge A:
https://<rustfs-public-host>(举例:LE 真证书 + 独立子域) - Edge B:
https://<ip-or-host>:<port>(举例:tls internal自签,client 必须-k)
IAM policy: qatlas-raw-rw¶
The qatlasd IAM user is bound to this policy (created by
scripts/rustfs_bootstrap.sh):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:GetObjectVersion",
"s3:DeleteObjectVersion"
],
"Resource": "arn:aws:s3:::qatlas-raw/*"
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:ListBucketVersions",
"s3:GetBucketLocation",
"s3:GetBucketVersioning",
"s3:PutBucketVersioning"
],
"Resource": "arn:aws:s3:::qatlas-raw"
}
]
}
What each permission is for:
| Action | Why qatlas needs it |
|---|---|
s3:GetObject / s3:PutObject |
Routine PDF / markdown / JSON I/O via the upload handlers. |
s3:DeleteObject |
Soft-delete via the (currently unimplemented) DELETE /api/papers/* route + admin cleanup. |
s3:GetObjectVersion |
Reading a specific past version (for future rollback CLI; not yet exposed in HTTP). |
s3:DeleteObjectVersion |
Required by qatlasd storage prune --yes — versioned deletes are a separate AWS perm from s3:DeleteObject. |
s3:ListBucket / s3:GetBucketLocation |
minio-go probes the endpoint and walks prefixes (e.g. enumerate-needs-mineru). |
s3:ListBucketVersions |
Powers ObjectVersion-aware listing — backs qatlasd storage prune enumeration. |
s3:GetBucketVersioning / s3:PutBucketVersioning |
Lets qatlas self-manage versioning at boot (see "Versioning" below). |
Deliberately not granted (re-test before adding):
s3:DeleteBucket,s3:PutBucketPolicy,s3:PutBucketAcl— bucket destruction / ACL change should stay root-only ops; qatlas has no use case.s3:GetLifecycleConfiguration,s3:PutLifecycleConfiguration— RustFS 1.0.0-beta.5 rejects these action names withinvalid action. Re-test when bumping RustFS; until then qatlas doesn't use lifecycle rules anyway (see "Why no auto-expiration" below).
Bucket layout¶
Object keys are constructed by internal/paperassets.AssetKey as
with <arxiv-id-prefix> being the first 4 chars of the YYMM segment
(e.g. 2501 → pdf/2501/2501.00010v1.pdf) so a flat list of papers
shards naturally into year-month folders, keeping individual prefix
listings manageable.
| Kind | Path | Content-Type |
|---|---|---|
pdf |
pdf/<prefix>/<id>v<n>.pdf |
application/pdf |
json |
json/<prefix>/<id>v<n>.json |
application/json |
markdown |
markdown/<prefix>/<id>v<n>.md |
text/markdown; charset=utf-8 |
User metadata always includes x-amz-meta-sha256 (lowercase) with
the hex digest of the bytes — see upload-api.md.
This is the field qatlasd storage prune and the upload handler
both rely on for idempotency / dedup.
Versioning: qatlas self-manages¶
internal/objstore/s3.go::EnsureVersioning is called once at server
boot, right after initRawStore. Pattern:
GetBucketVersioning(bucket)
if Status == "Enabled" → log "already enabled", no-op
else → EnableVersioning(bucket), log "enabled (was: <prior>)"
This is idempotent and monotonic: qatlas only ever
transitions to Enabled, never to Suspended. Even if an operator
manually suspends versioning via mc, the next qatlas restart
re-enables it. Rationale: losing the ability to recover an
over-written PDF is a much bigger correctness hazard than the
(small) extra storage cost.
Boot log lines you should always see (in this order):
raw store: S3 backend http://<rustfs-internal-host>:9000/qatlas-raw
bucket versioning: enabled (was: "") ← first boot ever
bucket versioning: already enabled ← every subsequent boot
Server started at http://127.0.0.1:4200
Failure mode: if the IAM user lacks s3:Put/GetBucketVersioning,
EnsureVersioning logs WARN bucket versioning: reconcile failed; …
and the server continues to serve. Uploads still work; you only
lose overwrite-rollback safety until perms are fixed. This is a
deliberate warn-and-continue choice — bouncing the whole server
because of a non-critical config drift is worse than degrading.
Why no auto-expiration (lifecycle)¶
We deliberately do not install an S3 lifecycle rule to auto-expire noncurrent versions. The model is "Synology Snapshot / Time Machine": keep everything by default, prune on demand.
Reasoning:
- sha256 dedup already short-circuits identical re-uploads (no wasted version), so the noncurrent versions we accumulate are real content changes — worth holding onto for rollback.
- Auto-expiration windows are operationally fraught: pick 30d and you regret it the day someone needs to restore a 6-week-old draft; pick 365d and the cost picture matters again.
- The ops side has full visibility + control via
qatlasd storage prune(see next section), so manual policy is just as good in our scale regime.
When (if ever) the bucket grows past a few hundred GB of noncurrent
versions, revisit. RustFS may by then support the standard
s3:*LifecycleConfiguration actions and we can add a rule.
qatlasd storage prune¶
The on-server CLI for manual cleanup. Lives in
cmd/qatlasd/storage_cmd.go; runs against whatever the server's
own env vars say (QATLAS_S3_* from the same .env qatlas reads at
boot).
qatlasd storage prune [--prefix P]
[--older-than DUR]
[--keep-last N]
[--yes]
[--json]
[--dry-run] # default true
Flags:
| Flag | Effect |
|---|---|
--prefix |
scope to keys under this prefix (e.g. pdf/2511/2511.00010v1.pdf for a single paper, pdf/2511/ for a cohort). Default empty = whole bucket. |
--older-than |
only versions older than this duration are eligible. Accepts Go duration syntax (24h, 720h) plus operator-friendly d / w / y units (30d, 4w, 1y). Default empty = no age cap. |
--keep-last N |
per object key, keep the N most-recent noncurrent versions, only delete those beyond that count. Current version is ALWAYS kept regardless. Default 0 = no per-key cap. |
--yes |
actually delete. Without it, the run is forced dry (regardless of --dry-run). |
--json |
emit one JSON object per row on stdout (audit-log friendly). |
--dry-run |
preview only. Defaults to true; --yes is the only way to actually delete. |
Hard safety invariants (enforced by planPruneCandidates + unit
tested in cmd/qatlasd/storage_cmd_test.go):
- Current (latest) versions are NEVER deleted. No flag combination can override this.
- Latest delete markers are NEVER deleted. Deleting one would resurrect the prior version, which is almost never what the operator wants.
- Filters compose. A version must satisfy BOTH
--older-thanand fall outside--keep-lastto be pruned. So--older-than 90d --keep-last 5means "keep at least 5 most-recent noncurrent per key, plus drop anything younger than 90d even if it's beyond the keep-last cap".
Recipes¶
# Audit pass: list every noncurrent in the bucket (no deletes)
sudo -u <USER> $TARGET storage prune
# Cohort cleanup: drop all noncurrent for one paper, keep current
sudo -u <USER> $TARGET storage prune \
--prefix pdf/2501/2501.00010v1.pdf --yes
# Tightening retention: per paper, keep at most 5 noncurrent
sudo -u <USER> $TARGET storage prune --keep-last 5 --yes
# Age-based: drop anything noncurrent for > 1 year
sudo -u <USER> $TARGET storage prune --older-than 1y --yes
# Machine-readable for an audit log
sudo -u <USER> $TARGET storage prune --json | tee prune-$(date +%F).log
$TARGET = the qatlas binary (/home/<USER>/.local/bin/qatlasd
on the production deploy). Run as the <USER> service user (the systemd
unit's User=) so the env / file paths resolve identically to the
running server.
Output format¶
Plain dry-run / preview:
KEY VERSION_ID SIZE AGE ACTION
pdf/2511/2511.88888v1.pdf 5f14251f-8b00-4be4-a0d1-e5ff592a8f89 92826 20m7s DELETE_PLANNED
pdf/2511/2511.88888v1.pdf 69537cbf-2035-4aa2-8ec3-4fc8dca357a6 92812 20m15s DELETE_PLANNED
---
candidates: 2 versions, 0.18 MiB total
dry-run only — pass --yes to delete the listed versions
--yes adds per-row deletion lines:
pdf/2511/2511.88888v1.pdf @5f14251f-8b00-4be4-a0d1-e5ff592a8f89 DELETED
pdf/2511/2511.88888v1.pdf @69537cbf-2035-4aa2-8ec3-4fc8dca357a6 DELETED
---
deleted: 2, failed: 0, freed: 0.18 MiB
Bootstrap (initial RustFS setup)¶
scripts/rustfs_bootstrap.sh is idempotent and creates everything
the server expects: bucket qatlas-raw, IAM user qatlasd,
policy qatlas-raw-rw, and one fresh service-account key pair.
export RUSTFS_ENDPOINT=https://<rustfs-public-host> # public, root-creds path
export RUSTFS_ROOT_ACCESS_KEY=<root_ak>
export RUSTFS_ROOT_SECRET_KEY=<root_sk>
bash scripts/rustfs_bootstrap.sh
Last few lines of stdout print the new access key + secret. Copy
into the server's .env immediately — they are NEVER persisted
to disk by the script. Bootstrapping a second time creates an
additional service-account key (existing keys are not rotated /
deleted) — useful for key rotation, see the script's own comments.
Local variable naming: the script uses IAM_USER (not USER)
internally. $USER is auto-set in every interactive shell to the
login user, so ${USER:-qatlasd} would never fall through to
the default. Setting IAM_USER=… from the environment if you want
to bootstrap a non-default IAM user.
Troubleshooting¶
"Access Denied" on upload but versioning works at boot¶
Probably the IAM user record got deleted (RustFS quirk) while the service-account key remained. Symptoms:
mc admin user info qatlas qatlasd→ "user does not exist"mc admin user svcacct ls qatlas qatlasd→ still shows your key- Server boots fine (versioning Get/Put succeed somehow)
- Upload returns
500 {"detail": "stat …: objstore: stat …: Access Denied."}
Recovery:
RAND_PW=$(openssl rand -base64 24)
mc admin user add qatlas qatlasd "$RAND_PW"
mc admin policy attach qatlas qatlas-raw-rw --user qatlasd
# verify
mc admin user info qatlas qatlasd # should now show PolicyName
Existing service-account keys re-associate with the recreated user record. You do NOT need to regenerate credentials or restart qatlas.
policy create succeeded but svcacct still 403¶
Cache. RustFS 1.0.0-beta.5 has a short policy-eval cache. Wait
~30s and retry. If still 403, double-check policy JSON via
mc admin policy info qatlas qatlas-raw-rw — sometimes mc reports
"created" but the JSON didn't apply (we hit this with s3:*Lifecycle*
action names, see "Deliberately not granted" above).
storage prune --yes fails with "Access Denied" on delete¶
The policy is missing s3:DeleteObjectVersion (versioned delete is
a different AWS perm from s3:DeleteObject). Update the policy via
mc + re-run prune. Bootstrap script already grants it correctly
since 2026-05-28.
Boot log says bucket versioning: reconcile failed¶
The IAM user lacks s3:Put/GetBucketVersioning. Fix the policy (see
"IAM policy" section). Server continues to run without rollback
safety until the policy is fixed and the server restarts (or
EnsureVersioning runs again on next boot).
s3:GetLifecycleConfiguration errors with "invalid action"¶
Known RustFS 1.0.0-beta.5 limitation. Don't grant lifecycle perms to the IAM user. We don't use lifecycle anyway (see "Why no auto-expiration"). Revisit when bumping RustFS.
Recovery walk-through: rolling back an overwritten PDF¶
# Find versions of the paper
mc ls --versions qatlas/qatlas-raw/pdf/2501/2501.00010v1.pdf
# Output:
# [2026-05-28 14:27:33 +08] 90KiB STANDARD <new-vid> v2 PUT 2501.00010v1.pdf
# [2026-05-28 14:27:14 +08] 689KiB STANDARD <old-vid> v1 PUT 2501.00010v1.pdf
# Restore v1 by copying it as the new current
mc cp --version-id <old-vid> \
qatlas/qatlas-raw/pdf/2501/2501.00010v1.pdf \
qatlas/qatlas-raw/pdf/2501/2501.00010v1.pdf
The server's next GET for that key serves the restored bytes. No
restart needed. The over-written v2 becomes noncurrent (but is
still recoverable until storage prune decides otherwise).
写入留痕 audit sink (T10)¶
问题:S3 svcacct key 一旦泄露,持有者能绕过 qatlasd API 直连桶
写/删对象。我们要能在日志里看到这种直连,并区分它和正规 server 写,
且跨 edge 一致(两台 edge 共享 RustFS,审计要落在一处)。
方案:RustFS notify webhook(per-bucket subscribe,PUT/DELETE 推到 sink)
→ NAS docker 内 Fluent Bit sidecar(HTTP input + S3 output)→ 批量写入
qatlas-s3-events 桶。每条事件带 userIdentity.principalId(SigV4 accessKey)、
requestParameters.sourceIPAddress、userAgent、eventName(s3:ObjectCreated:* /
s3:ObjectRemoved:*)、s3.bucket.name、s3.object.key 等。sink 刻意选通用、
零后端约定的日志转发器(Fluent Bit,CNCF Graduated 项目)作为 sidecar,不碰
我们的 binary——dumb 存储层不该被后端演进中的约定(事件 JSON 解析、桶布局、
过滤逻辑)绑死;我们每 cz bump 一次也不该逼 NAS 跟着换 sink 镜像。Go server
唯一参与的是 QATLAS_EDGE_NAME 打的 UA 标(见下,纯辅助标识)。
为什么不用 RustFS 原生 audit:audit 子系统在 1.0.0-beta.5 上有
has_any_audit_targets 门控 bug——env 把 target 摆进 config view 但
start_audit_system 读的是 persisted store(默认空),判定无 target → 跳过
activation → target 永远 status=offline / not_loaded_in_runtime;console 改
target 又被「env-source 锁定」挡掉(audit target '<x>' is managed by environment
variables and cannot be modified from the console)。死循环,beta.5 上没有干净
出口。notify webhook(per-bucket subscribe)路径经实测可靠,于是改用这条。
取证判定(主键 = accessKey,不是 UA)¶
accessKey= root(部署期取的 root 别名,例如admin)→ 直接点名误用 root(SigV4 绑定,不可伪造,强信号)。accessKey≠ 任何预期 svcacct(既非 edge 写 key、也非 sink 自己)→ 有人拿别的 key 直连。sourceIPAddress非预期网段 → 佐证。- UA 只作辅助提示,绝不作判定主键——UA 可伪造,靠 UA 判定的话攻击者把 UA
伪装成
qatlasd/*就隐身了。QATLAS_EDGE_NAME打的 UA 标 (qatlasd/<ver>/<edge>)只是让正规写在事件流里"一眼可读",不是安全边界。 注意:两台 edge 共享同一把 svcacct key,光看accessKey分不出是哪台 edge 写的——这正是 UA edge 标唯一的用处(要它生效得在每台 edge.env设QATLAS_EDGE_NAME)。
自循环陷阱(源头不订阅 > Fluent Bit filter drop)¶
如果给 qatlas-s3-events 桶也加 notify subscription,sink 写入事件对象本身
又触发 PUT 事件 → sink 再写 → 无限循环。解法:只订阅 5 个资产桶
(qatlas-raw / qatlas-pdf / qatlas-md / qatlas-images /
qatlas-openalex),不订阅 qatlas-s3-events——从源头不产生事件,比
Fluent Bit grep filter drop 更干净(filter drop 仍有 RustFS→sink 一次 HTTP
投递的开销,源头不订阅是零开销,也少一处可能配错的逻辑)。
⚠️ sink 仍用独立 svcacct(
qatlas-s3-events-writer),不复用 edge 的QATLAS_S3_ACCESS_KEY_ID——理由是最小权限 + 审计不可变:sink 只拿qatlas-s3-events桶的 Get/Put/List,没有 Delete(审计落了删不掉),也碰 不到 5 个资产桶。复用 edge key 既越权、又会污染分析("是 sink 自己写的还是 谁直连写的?"分不开)。
两个必守的配置坑(实测踩出来的真因)¶
QUEUE_DIR必须可写:RustFS notify webhook 自带磁盘队列(投递失败时缓冲 重放),默认/opt/rustfs/events在 container 内不可写 → target 创建直接 失败、状态not_loaded_in_runtime。改成/data/.notify-events(在rustfs_datanamed volume 内)就行。- ARN 必须小写:notify webhook env 后缀
_QATLAS会被 RustFS 内部小写化 成account_id="qatlas",所以mc event add的 ARN 必须写arn:rustfs:sqs::qatlas:webhook——大写 ARN(如::QATLAS:webhook)静默 丢弃所有事件(bucket→target 解析失败,event 直接丢,没报错也没日志)。
probe 失败 ≠ 永久放弃¶
RustFS notify webhook 启动时会跑一次 sink endpoint probe;probe 失败不会
让 target 永久 disable,而是落盘到 <QUEUE_DIR>/*.event.snappy。sink 起来后
会自动 replay 队列,不需要 docker compose 加 depends_on.condition:
service_healthy 起停顺序——RustFS 容器可以先起,Fluent Bit 慢几秒起来也不
丢事件,简单 depends_on: 即可。
供给(用户持 root 跑一次)¶
# RustFS root key 在 NAS compose env 里,agent 不持有。用户跑:
export RUSTFS_ENDPOINT=http://<rustfs-internal-host>:9000
read -rs RUSTFS_ROOT_ACCESS_KEY; export RUSTFS_ROOT_ACCESS_KEY # = compose RUSTFS_ACCESS_KEY
read -rs RUSTFS_ROOT_SECRET_KEY; export RUSTFS_ROOT_SECRET_KEY # = compose RUSTFS_SECRET_KEY
bash scripts/rustfs_notify_bootstrap.sh
脚本幂等:
- 建
qatlas-s3-events桶(无 versioning,事件对象 write-once); - 建
qatlas-s3-events-rwpolicy(Get/Put/ListBucket,故意不给 Delete = 审计不可变)+qatlas-s3-events-writeruser/svcacct; - 建
qatlas-s3-events-ro只读 policy 挂到现有qatlasd父用户(edge svcacct 自动继承读,给未来 Go 侧对账/扫描预留只读); - 5 个资产桶逐一绑定小写 ARN
arn:rustfs:sqs::qatlas:webhook(qatlas-raw、qatlas-pdf、qatlas-md、qatlas-images、qatlas-openalex),qatlas-s3-events不绑(断自循环)。绑定持久化在 RustFS 数据卷中,跨重启/recreate 不丢; 只有 wiperustfs_data才需重跑此脚本的[6/7]段。
只打印 sink 的 access/secret,root 不落盘——跟 rustfs_bootstrap.sh 供给 edge
svcacct 同款套路(agent 全程只见 scoped key,没见过 root)。
NAS compose(RustFS notify + Fluent Bit sidecar)¶
完整 compose 模板见
deploy/nas-rustfs-compose.example.yaml
(含占位符 + 配置坑注释)。结构:
- rustfs:5 个资产桶的 PUT/DELETE 事件通过 notify webhook 推到 sidecar。
target 命名
QATLAS(env 后缀;RustFS 内部小写化为account_id=qatlas、 ARNarn:rustfs:sqs::qatlas:webhook),endpointhttp://fluent-bit:9880/(docker 网络内 service 名解析,零公网暴露),queue dir/data/.notify-events,queue limit 100k(≈ 8 个月日上传量)。 - fluent-bit:HTTP input :9880 收事件 → S3 output 批量写入
qatlas-s3-events(endpointhttp://rustfs:9000、use_put_object=On、s3_key_format=/%Y/%m/%d/%H-%M-%S-$UUID.json)。sink 凭据走AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEYenv(S3 SDK 标准;值 = bootstrap 输出的qatlas-s3-events-writersvcacct)。不需要 grep filter 断自循环——qatlas-s3-events本来就不在 subscription 列表里。 - 桶→target 绑定:notify env 只声明 target,bucket→target 订阅必须用
mc event add或 S3PutBucketNotificationConfiguration——见上面 bootstrap 脚本[6/7]段,5 个资产桶各绑一次(小写 ARN)。绑定持久化在 RustFS 数据卷中,跨重启/recreate 不丢;只有 wipe rustfs_data 才需重跑。
durability 两层兜底:RustFS notify webhook 自带 QUEUE_DIR 磁盘队列(sink 挂
时缓冲重放,limit 100k);Fluent Bit S3 output 自带 filesystem buffer
(store_dir,RustFS 写挂时缓冲)。
⚠️ NAS 是 Synology DSM,compose 编辑 + 容器 down/up 只能在 DSM GUI 完成 (ssh 用户不在 docker 组、sudo 要交互密码)。agent 写好 compose 片段交用户 在 DSM 里粘贴 + down/up。
对象布局¶
Fluent Bit S3 output 把多条事件批量攒成时间分区的 JSON 对象
(/%Y/%m/%d/%H-%M-%S-<uuid>.json),每次 upload 是一个全新不可变对象——
S3 无 append,但这里根本不需要 append(不是 read-modify-write 同一文件,没有
并发丢行问题;Fluent Bit 的 disk buffer 负责攒批 + 崩溃重放)。读取:
mc cat qatlas-s3-events/<YYYY>/<MM>/<DD>/*.json | jq。
Related docs¶
- upload-api.md — request/response shape, sha256 semantics, in-transit guard from the client's perspective.
- storage-architecture.md — wider architecture (why Raw / Metadata / Graph are separate layers).
- deployment.md — systemd unit, .env layout.