OpenSearch Serverless 磁盘优化向量实战：低成本向量搜索方案¶

Lab 信息

难度: ⭐⭐ 中级
预估时间: 45 分钟
预估费用: $2-5（含清理）
Region: us-east-1
最后验证: 2026-03-27

背景¶

向量搜索是 RAG（检索增强生成）、语义搜索、推荐系统等 AI 应用的核心基础设施。然而，大规模向量数据的内存消耗一直是成本痛点 —— 100M 条 768 维向量仅存储就需要约 286 GB RAM。

2025 年 9 月，AWS 宣布 Amazon OpenSearch Serverless 支持 Disk-Optimized Vectors（磁盘优化向量）。这一功能通过二进制量化（Binary Quantization）将向量压缩存储在磁盘，以略高的搜索延迟换取显著的内存节省和成本降低。

本文将动手验证：在 OpenSearch Serverless 中创建磁盘优化向量索引，与标准内存优化索引进行对比测试，测量延迟、Recall 和参数调优效果。

前置条件¶

AWS 账号（需要 aoss:* 相关权限）
AWS CLI v2 已配置
Python 3 + opensearch-py 库（用于 API 调用和测试）
awscurl 工具（可选，用于简单 API 调试）

核心概念¶

Memory-Optimized vs Disk-Optimized¶

特性	Memory-Optimized（默认）	Disk-Optimized（新）
索引引擎	faiss / nmslib	faiss
向量压缩	无（1x）	默认 32x 二进制量化
搜索方式	单阶段 HNSW	两阶段：压缩搜索 + 全精度 Rescore
内存占用	4 bytes/维度	~0.125 bytes/维度（32x）
延迟	最低	略高（Rescore 开销）
适用场景	实时搜索、低延迟要求	RAG、语义搜索、推荐系统

工作原理¶

Disk-Optimized Vectors 使用两阶段搜索策略：

Phase 1（粗筛）：在 32x 压缩后的 HNSW 图上执行近似最近邻搜索，快速获取候选集
Phase 2（精排）：从磁盘加载候选向量的全精度版本，重新计算距离并排序

关键参数 oversample_factor（默认 3.0）控制 Phase 1 检索的候选数量。例如查询 top-10，Phase 1 会检索 10 × 3 = 30 个候选，Phase 2 从中选出最终 top-10。

Compression Level 选项¶

级别	引擎	内存节省	适用场景
32x（默认）	faiss	~97%	最大成本优化
16x	faiss	~94%	平衡成本与精度
8x	faiss	~87%	更高精度要求
4x	lucene	~75%	最高精度

动手实践¶

Step 1: 创建 Serverless 集合¶

首先创建加密策略、网络策略和集合：

# 创建加密策略
cat > /tmp/enc-policy.json << 'EOF'
{
  "Rules": [{"ResourceType": "collection", "Resource": ["collection/disk-vector-test"]}],
  "AWSOwnedKey": true
}
EOF

aws opensearchserverless create-security-policy \
  --name disk-vector-test-enc \
  --type encryption \
  --policy file:///tmp/enc-policy.json \
  --region us-east-1

# 创建网络策略（允许公网访问，生产环境建议限制 VPC）
cat > /tmp/net-policy.json << 'EOF'
[{
  "Rules": [
    {"ResourceType": "collection", "Resource": ["collection/disk-vector-test"]},
    {"ResourceType": "dashboard", "Resource": ["collection/disk-vector-test"]}
  ],
  "AllowFromPublic": true
}]
EOF

aws opensearchserverless create-security-policy \
  --name disk-vector-test-net \
  --type network \
  --policy file:///tmp/net-policy.json \
  --region us-east-1

# 创建集合（关闭冗余用于测试）
aws opensearchserverless create-collection \
  --name disk-vector-test \
  --type VECTORSEARCH \
  --standby-replicas DISABLED \
  --region us-east-1

等待集合变为 ACTIVE 状态（约 3-5 分钟）：

aws opensearchserverless batch-get-collection \
  --ids <collection-id> \
  --region us-east-1

创建数据访问策略：

cat > /tmp/access-policy.json << 'EOF'
[{
  "Rules": [
    {
      "ResourceType": "index",
      "Resource": ["index/disk-vector-test/*"],
      "Permission": ["aoss:CreateIndex", "aoss:DeleteIndex", "aoss:UpdateIndex",
                      "aoss:DescribeIndex", "aoss:ReadDocument", "aoss:WriteDocument"]
    },
    {
      "ResourceType": "collection",
      "Resource": ["collection/disk-vector-test"],
      "Permission": ["aoss:CreateCollectionItems", "aoss:DescribeCollectionItems",
                      "aoss:UpdateCollectionItems"]
    }
  ],
  "Principal": ["arn:aws:iam::<ACCOUNT_ID>:user/<USERNAME>"]
}]
EOF

aws opensearchserverless create-access-policy \
  --name disk-vector-test-access \
  --type data \
  --policy file:///tmp/access-policy.json \
  --region us-east-1

Step 2: 创建对比索引¶

使用 awscurl 或 opensearch-py 创建三个索引：

索引 1：内存优化（基准）

PUT mem-index
{
  "settings": {"index": {"knn": true}},
  "mappings": {
    "properties": {
      "embedding": {
        "type": "knn_vector",
        "dimension": 768,
        "space_type": "cosinesimil",
        "data_type": "float"
      },
      "title": {"type": "text"},
      "category": {"type": "keyword"}
    }
  }
}

索引 2：磁盘优化 32x（默认压缩）

关键差异：添加 "mode": "on_disk"

PUT disk-32x-index
{
  "settings": {"index": {"knn": true}},
  "mappings": {
    "properties": {
      "embedding": {
        "type": "knn_vector",
        "dimension": 768,
        "space_type": "cosinesimil",
        "data_type": "float",
        "mode": "on_disk"
      },
      "title": {"type": "text"},
      "category": {"type": "keyword"}
    }
  }
}

索引 3：磁盘优化 16x（自定义压缩级别）

PUT disk-16x-index
{
  "settings": {"index": {"knn": true}},
  "mappings": {
    "properties": {
      "embedding": {
        "type": "knn_vector",
        "dimension": 768,
        "space_type": "cosinesimil",
        "data_type": "float",
        "mode": "on_disk",
        "compression_level": "16x"
      },
      "title": {"type": "text"},
      "category": {"type": "keyword"}
    }
  }
}

Step 3: 写入测试数据¶

使用 Python 生成 1000 条 768 维标准化随机向量并批量写入：

import json, random, math, boto3
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth

# 连接设置
session = boto3.Session(profile_name="your-profile")
creds = session.get_credentials().get_frozen_credentials()
auth = AWSV4SignerAuth(creds, "us-east-1", "aoss")

client = OpenSearch(
    hosts=[{"host": "<endpoint>", "port": 443}],
    http_auth=auth, use_ssl=True, verify_certs=True,
    connection_class=RequestsHttpConnection, timeout=120
)

# 生成测试数据
random.seed(42)
categories = ["technology", "science", "business", "health", "entertainment"]

for index_name in ["mem-index", "disk-32x-index", "disk-16x-index"]:
    bulk_body = []
    for i in range(1000):
        vec = [random.gauss(0, 1) for _ in range(768)]
        norm = math.sqrt(sum(x*x for x in vec))
        vec = [x/norm for x in vec]

        bulk_body.append({"index": {"_index": index_name}})
        bulk_body.append({
            "embedding": vec,
            "title": f"Document {i} about {categories[i % 5]}",
            "category": categories[i % 5],
            "price": round(random.uniform(1, 100), 2)
        })

    body_str = "\n".join(json.dumps(item) for item in bulk_body) + "\n"
    resp = client.bulk(body=body_str)
    print(f"{index_name}: errors={resp.get('errors')}")

Step 4: 执行搜索对比¶

K = 10
query_vector = [random.gauss(0, 1) for _ in range(768)]  # 随机查询向量

for index_name in ["mem-index", "disk-32x-index", "disk-16x-index"]:
    latencies = []
    for _ in range(10):
        resp = client.search(index=index_name, body={
            "size": K,
            "query": {"knn": {"embedding": {"vector": query_vector, "k": K}}}
        })
        latencies.append(resp["took"])

    avg = sum(latencies) / len(latencies)
    latencies.sort()
    print(f"{index_name}: avg={avg:.1f}ms p50={latencies[5]}ms p90={latencies[9]}ms")

Step 5: 调优 oversample_factor¶

for osf in [1.0, 3.0, 10.0, 20.0]:
    resp = client.search(index="disk-32x-index", body={
        "size": K,
        "query": {"knn": {"embedding": {
            "vector": query_vector, "k": K,
            "rescore": {"oversample_factor": osf}
        }}}
    })
    print(f"osf={osf}: took={resp['took']}ms, hits={resp['hits']['total']['value']}")

测试结果¶

搜索延迟对比¶

使用 1000 条 768 维向量，k=10，每个查询执行 10 次取统计值：

索引类型	平均延迟	P50	P90	P99
Memory-Optimized	29.9ms	25ms	36ms	151ms
Disk-32x	24.9ms	21ms	32ms	153ms
Disk-16x	19.9ms	18ms	25ms	38ms

小规模数据延迟说明

在 1000 条文档的小规模测试中，磁盘优化索引延迟反而低于内存优化索引。这是因为压缩后的索引体积更小，更容易被缓存命中。在大规模生产数据集（百万级+）中，磁盘优化索引的延迟预期会高于内存优化索引，因为 Phase 2 的磁盘 I/O 开销会更明显。

Recall@10 对比¶

以 Memory-Optimized 索引作为 ground truth：

索引类型	Q0	Q1	Q2	Q3	Q4	平均
Disk-32x	90%	70%	80%	80%	90%	82%
Disk-16x	40%	100%	90%	90%	100%	84%

Top-5 结果高度一致，差异主要出现在排名靠后的候选中。

oversample_factor 调优效果¶

oversample_factor	平均延迟	P50	P90	Recall@10
1.0	19.0ms	18ms	23ms	82%
3.0（默认）	17.3ms	16ms	23ms	82%
5.0	16.4ms	16ms	21ms	82%
10.0	16.8ms	16ms	23ms	82%
20.0	14.7ms	14ms	18ms	96%

关键发现：oversample_factor=20.0 将 Recall 从 82% 提升到 96%，同时延迟并未显著增加。在生产环境中，建议根据 Recall 要求适当提高此参数。

Score 对比（第一组查询 Top-5）¶

排名	Memory-Optimized	Disk-32x	Disk-16x
#1	1.000000	1.000000	1.000000
#2	0.557751	0.557751	0.545876
#3	0.545943	0.545943	0.542794
#4	0.545876	0.545876	0.540795
#5	0.545753	0.545753	0.538810

32x 压缩的 Top-5 结果与内存优化完全一致，16x 压缩在 Top-2 之后出现差异。

踩坑记录¶

Radial Search 不可用

在启用量化（quantization）的索引上执行 Radial Search 会返回错误：

unsupported_operation_exception: Radial search is not supported for
indices which have quantization enabled

已查文档确认：这是 OpenSearch 的已知限制，不仅限于 disk-optimized 模式。

仅支持 float 数据类型

on_disk 模式仅支持 float 类型向量。如果您使用 byte 或 binary 向量，需要继续使用内存优化模式。已查文档确认。

4x 压缩使用 lucene 引擎

compression_level: "4x" 会强制使用 lucene 引擎（而非默认的 faiss）。如果您的工作流依赖 faiss 特有功能，请选择 8x 或更高压缩级别。已查文档确认。

实测发现

小规模数据集（1000 条）下磁盘优化索引延迟低于内存优化索引，这是因为压缩后的索引更易被 OCU 本地缓存覆盖。实测发现，官方未记录此行为。大规模数据下预期行为相反。

费用明细¶

资源	单价	用量	费用
AOSS OCU（无冗余）	~$0.24/OCU-hr	2 OCU × ~1h	~$0.48
S3 存储	$0.024/GB-mo	<1 GB	<$0.01
合计			~$0.50

最低消费

OpenSearch Serverless 首个集合最低消耗 2 OCU（关闭冗余时 1 OCU = 0.5 indexing + 0.5 search），即使空闲也会计费。创建后应尽快测试并清理。

清理资源¶

# 1. 删除索引
awscurl --service aoss --region us-east-1 \
  -X DELETE "https://<endpoint>/disk-32x-index"
awscurl --service aoss --region us-east-1 \
  -X DELETE "https://<endpoint>/mem-index"
awscurl --service aoss --region us-east-1 \
  -X DELETE "https://<endpoint>/disk-16x-index"

# 2. 删除集合
aws opensearchserverless delete-collection \
  --id <collection-id> --region us-east-1

# 3. 删除策略
aws opensearchserverless delete-access-policy \
  --name disk-vector-test-access --type data --region us-east-1
aws opensearchserverless delete-security-policy \
  --name disk-vector-test-net --type network --region us-east-1
aws opensearchserverless delete-security-policy \
  --name disk-vector-test-enc --type encryption --region us-east-1

务必清理

OpenSearch Serverless 即使空闲也会产生 OCU 费用（~$0.24/OCU-hr）。Lab 完成后请立即执行清理步骤。

结论与建议¶

适用场景¶

✅ RAG 应用：Recall 要求 90%+ 即可，延迟容忍 50-100ms，大规模向量数据
✅ 语义搜索：电商商品搜索、文档检索，不需要 sub-ms 响应
✅ 推荐系统：离线或近实时推荐，成本敏感
❌ 实时交互搜索：需要 <10ms 延迟的场景仍建议内存优化

生产建议¶

从 32x 开始：默认压缩级别提供最大成本节省，Recall 仍在 80%+
调优 oversample_factor：生产环境建议测试 10.0-20.0，可显著提升 Recall
监控 OCU 消耗：通过 CloudWatch 的 SearchOCU 和 IndexingOCU 指标观察实际节省
大规模前先基准测试：本文小规模测试的延迟特征与大规模不同，务必用真实数据量做基准

与已有方案对比¶

方案	成本	延迟	Recall	运维
Serverless Memory-Optimized	高	最低	基线	无
Serverless Disk-Optimized	低	略高	80-96%	无
Provisioned + UltraWarm	中	中	基线	手动管理
S3 Vectors（新）	最低	最高	取决于实现	无