健康检查与验证
部署完成后,在宣布环境就绪之前,请验证每个服务是否正常运行。
服务健康检查
运行以下命令检查每个服务。健康响应为 {"status":"UP"}。
bash
# Core service
curl -s http://localhost:18020/prometheus/actuator/health
# Support services
curl -s http://localhost:18030/email-service/actuator/health
curl -s http://localhost:18040/file-service/actuator/health
curl -s http://localhost:18050/schedule/actuator/health
# Partner integrations (if deployed)
curl -s http://localhost:18101/partner/channel/slash/actuator/health
curl -s http://localhost:18102/partner/channel/stripe/actuator/health
# Wallet services (if deployed)
curl -s http://localhost:18060/actuator/health
curl -s http://localhost:18070/actuator/health就绪探针
就绪端点会检查所有下游依赖:
bash
curl -s http://localhost:18020/prometheus/actuator/health/readiness | python3 -m json.tool预期输出包含组件检查结果:
json
{
"status": "UP",
"components": {
"db": { "status": "UP" },
"redis": { "status": "UP" },
"rabbit": { "status": "UP" }
}
}如果任何组件显示 DOWN,请检查对应的中间件容器。
快速状态脚本
一次性检查所有服务:
bash
#!/bin/bash
SERVICES=(
"18020|app-prometheus|/prometheus"
"18030|support-email|/email-service"
"18040|support-file|/file-service"
"18050|support-schedule|/schedule"
)
for svc in "${SERVICES[@]}"; do
IFS='|' read -r port name path <<< "$svc"
status=$(curl -sf "http://localhost:$port$path/actuator/health" 2>/dev/null | grep -o '"status":"[A-Z]*"' | head -1)
if [ -n "$status" ]; then
echo "✓ $name ($port): $status"
else
echo "✗ $name ($port): UNREACHABLE"
fi
done冒烟测试:端到端登录流程
所有健康检查通过后,验证完整的 API 流程是否正常:
第 1 步:获取公钥(安全通道)
bash
curl -s http://localhost:18020/prometheus/actuator/health
# Confirm UP first, then:
curl -s -X GET http://localhost:18020/web/v1/system/secure-channel/public-key \
-H "X-PORTAL-ACCESS-CODE: <your-portal-access-code>"预期:响应中包含 RSA 公钥。
第 2 步:创建安全通道会话
bash
curl -s -X POST http://localhost:18020/web/v1/system/secure-channel/session \
-H "Content-Type: application/json" \
-H "X-PORTAL-ACCESS-CODE: <your-portal-access-code>" \
-d '{"clientPublicKey": "<base64-encoded-client-public-key>"}'预期:返回会话 ID 和服务端公钥。
第 3 步:发起登录
bash
curl -s -X POST http://localhost:18020/web/v1/system/auth/login/initiate \
-H "Content-Type: application/json" \
-H "X-PORTAL-ACCESS-CODE: <your-portal-access-code>" \
-d '{"rawRequestBody": "<encrypted-payload>"}'预期:返回登录会话,包含 MFA 质询或直接返回 JWT。
TIP
冒烟测试需要有效的门户访问代码和加密的请求体。如果只是简单检查,健康检查和就绪端点已足以确认系统正常运行。
日志检查
查看最近日志
bash
# Last 100 lines
docker logs --tail 100 slaunchx-app-prometheus-test
# Follow logs in real-time
docker logs -f slaunchx-app-prometheus-test
# Filter for errors
docker logs slaunchx-app-prometheus-test 2>&1 | grep -i "error\|exception" | tail -20常见启动错误
| 错误模式 | 原因 | 修复方法 |
|---|---|---|
Connection refused: MySQL | MySQL 未运行或主机地址错误 | 检查 DB_HOST 是否匹配 MySQL 容器名 |
NOAUTH Authentication required | Redis 密码缺失 | 在 env 文件中设置 REDIS_PASSWORD |
ACCESS_REFUSED | RabbitMQ 凭据错误 | 检查 RABBITMQ_USERNAME / RABBITMQ_PASSWORD |
Flyway migration failed | Schema 冲突 | 检查迁移文件,可能需要 repair |
EncryptionKeyNotFoundException | 主密钥缺失 | 设置 SLAUNCHX_SECURITY_ENCRYPTION_MASTER_KEYS_K1 |
Port already in use | 其他容器占用同一端口 | 使用 docker ps 查找冲突容器 |
容器状态检查
bash
# List all SlaunchX containers
docker ps -a --filter "name=slaunchx-" --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
# Check for crashed containers
docker ps -a --filter "name=slaunchx-" --filter "status=exited" --format "{{.Names}}: {{.Status}}"监控集成
Uptime Kuma
如果部署了 Uptime Kuma(端口 3001),为每个服务添加 HTTP 监控:
| 监控项 | URL | 间隔 |
|---|---|---|
| app-prometheus | http://localhost:18020/prometheus/actuator/health | 60s |
| support-email | http://localhost:18030/email-service/actuator/health | 60s |
| support-file | http://localhost:18040/file-service/actuator/health | 60s |
| support-schedule | http://localhost:18050/schedule/actuator/health | 60s |
| MySQL | TCP check localhost:18306 | 60s |
| Redis | TCP check localhost:18379 | 60s |
| RabbitMQ | HTTP http://localhost:18672/api/healthchecks/node | 60s |
验证检查清单
- [ ] 所有服务健康端点返回
{"status":"UP"} - [ ] 就绪探针显示所有组件 UP(db、redis、rabbit)
- [ ] 启动日志中无 error/exception
- [ ] 所有容器处于
running状态(无exited) - [ ] (可选)冒烟测试登录流程成功
- [ ] (可选)Uptime Kuma 监控已配置