Skip to content

健康检查与验证

部署完成后,在宣布环境就绪之前,请验证每个服务是否正常运行。

服务健康检查

运行以下命令检查每个服务。健康响应为 {"status":"UP"}

bash
# Core service
curl -s http://localhost:18020/prometheus/actuator/health

# Support services
curl -s http://localhost:18030/email-service/actuator/health
curl -s http://localhost:18040/file-service/actuator/health
curl -s http://localhost:18050/schedule/actuator/health

# Partner integrations (if deployed)
curl -s http://localhost:18101/partner/channel/slash/actuator/health
curl -s http://localhost:18102/partner/channel/stripe/actuator/health

# Wallet services (if deployed)
curl -s http://localhost:18060/actuator/health
curl -s http://localhost:18070/actuator/health

就绪探针

就绪端点会检查所有下游依赖:

bash
curl -s http://localhost:18020/prometheus/actuator/health/readiness | python3 -m json.tool

预期输出包含组件检查结果:

json
{
  "status": "UP",
  "components": {
    "db": { "status": "UP" },
    "redis": { "status": "UP" },
    "rabbit": { "status": "UP" }
  }
}

如果任何组件显示 DOWN,请检查对应的中间件容器。

快速状态脚本

一次性检查所有服务:

bash
#!/bin/bash
SERVICES=(
  "18020|app-prometheus|/prometheus"
  "18030|support-email|/email-service"
  "18040|support-file|/file-service"
  "18050|support-schedule|/schedule"
)

for svc in "${SERVICES[@]}"; do
  IFS='|' read -r port name path <<< "$svc"
  status=$(curl -sf "http://localhost:$port$path/actuator/health" 2>/dev/null | grep -o '"status":"[A-Z]*"' | head -1)
  if [ -n "$status" ]; then
    echo "✓ $name ($port): $status"
  else
    echo "✗ $name ($port): UNREACHABLE"
  fi
done

冒烟测试:端到端登录流程

所有健康检查通过后,验证完整的 API 流程是否正常:

第 1 步:获取公钥(安全通道)

bash
curl -s http://localhost:18020/prometheus/actuator/health
# Confirm UP first, then:

curl -s -X GET http://localhost:18020/web/v1/system/secure-channel/public-key \
  -H "X-PORTAL-ACCESS-CODE: <your-portal-access-code>"

预期:响应中包含 RSA 公钥。

第 2 步:创建安全通道会话

bash
curl -s -X POST http://localhost:18020/web/v1/system/secure-channel/session \
  -H "Content-Type: application/json" \
  -H "X-PORTAL-ACCESS-CODE: <your-portal-access-code>" \
  -d '{"clientPublicKey": "<base64-encoded-client-public-key>"}'

预期:返回会话 ID 和服务端公钥。

第 3 步:发起登录

bash
curl -s -X POST http://localhost:18020/web/v1/system/auth/login/initiate \
  -H "Content-Type: application/json" \
  -H "X-PORTAL-ACCESS-CODE: <your-portal-access-code>" \
  -d '{"rawRequestBody": "<encrypted-payload>"}'

预期:返回登录会话,包含 MFA 质询或直接返回 JWT。

TIP

冒烟测试需要有效的门户访问代码和加密的请求体。如果只是简单检查,健康检查和就绪端点已足以确认系统正常运行。

日志检查

查看最近日志

bash
# Last 100 lines
docker logs --tail 100 slaunchx-app-prometheus-test

# Follow logs in real-time
docker logs -f slaunchx-app-prometheus-test

# Filter for errors
docker logs slaunchx-app-prometheus-test 2>&1 | grep -i "error\|exception" | tail -20

常见启动错误

错误模式原因修复方法
Connection refused: MySQLMySQL 未运行或主机地址错误检查 DB_HOST 是否匹配 MySQL 容器名
NOAUTH Authentication requiredRedis 密码缺失在 env 文件中设置 REDIS_PASSWORD
ACCESS_REFUSEDRabbitMQ 凭据错误检查 RABBITMQ_USERNAME / RABBITMQ_PASSWORD
Flyway migration failedSchema 冲突检查迁移文件,可能需要 repair
EncryptionKeyNotFoundException主密钥缺失设置 SLAUNCHX_SECURITY_ENCRYPTION_MASTER_KEYS_K1
Port already in use其他容器占用同一端口使用 docker ps 查找冲突容器

容器状态检查

bash
# List all SlaunchX containers
docker ps -a --filter "name=slaunchx-" --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"

# Check for crashed containers
docker ps -a --filter "name=slaunchx-" --filter "status=exited" --format "{{.Names}}: {{.Status}}"

监控集成

Uptime Kuma

如果部署了 Uptime Kuma(端口 3001),为每个服务添加 HTTP 监控:

监控项URL间隔
app-prometheushttp://localhost:18020/prometheus/actuator/health60s
support-emailhttp://localhost:18030/email-service/actuator/health60s
support-filehttp://localhost:18040/file-service/actuator/health60s
support-schedulehttp://localhost:18050/schedule/actuator/health60s
MySQLTCP check localhost:1830660s
RedisTCP check localhost:1837960s
RabbitMQHTTP http://localhost:18672/api/healthchecks/node60s

验证检查清单

  • [ ] 所有服务健康端点返回 {"status":"UP"}
  • [ ] 就绪探针显示所有组件 UP(db、redis、rabbit)
  • [ ] 启动日志中无 error/exception
  • [ ] 所有容器处于 running 状态(无 exited
  • [ ] (可选)冒烟测试登录流程成功
  • [ ] (可选)Uptime Kuma 监控已配置

内部手册