Skip to content

备份与维护

日常运维:备份、日志管理、服务升级和故障排查。

数据库备份

手动备份

bash
# Dump all SlaunchX databases
docker exec slaunchx-mysql-test mysqldump -uroot -p<root-password> \
  --databases slaunchx slaunchx_tron_wallet slaunchx_solana_wallet \
  --single-transaction --quick --routines --triggers \
  > /home/backups/mysql-$(date +%Y%m%d-%H%M%S).sql

# Verify backup integrity
tail -1 /home/backups/mysql-*.sql
# Expected: "-- Dump completed on ..."

自动每日备份(Cron)

bash
cat > /opt/slaunchx/scripts/backup-mysql.sh << 'EOF'
#!/bin/bash
BACKUP_DIR=/home/backups/mysql
RETENTION_DAYS=14
TIMESTAMP=$(date +%Y%m%d-%H%M%S)

mkdir -p $BACKUP_DIR

docker exec slaunchx-mysql-test mysqldump -uroot -p<root-password> \
  --databases slaunchx slaunchx_tron_wallet slaunchx_solana_wallet \
  --single-transaction --quick --routines --triggers \
  > "$BACKUP_DIR/backup-$TIMESTAMP.sql"

# Verify
if [ $? -eq 0 ] && [ -s "$BACKUP_DIR/backup-$TIMESTAMP.sql" ]; then
  sha256sum "$BACKUP_DIR/backup-$TIMESTAMP.sql" > "$BACKUP_DIR/backup-$TIMESTAMP.sha256"
  echo "Backup successful: backup-$TIMESTAMP.sql"
else
  echo "ERROR: Backup failed!" >&2
  exit 1
fi

# Cleanup old backups
find $BACKUP_DIR -name "backup-*.sql" -mtime +$RETENTION_DAYS -delete
find $BACKUP_DIR -name "backup-*.sha256" -mtime +$RETENTION_DAYS -delete
EOF

chmod +x /opt/slaunchx/scripts/backup-mysql.sh

# Schedule: daily at 03:40 UTC
crontab -e
# Add: 40 3 * * * /opt/slaunchx/scripts/backup-mysql.sh >> /var/log/slaunchx-backup.log 2>&1

从备份恢复

bash
docker exec -i slaunchx-mysql-test mysql -uroot -p<root-password> < /home/backups/mysql/backup-YYYYMMDD-HHMMSS.sql

Redis 备份

Redis 启用 AOF 持久化(--appendonly yes)后会在重启时自动恢复。如需显式备份:

bash
# Trigger RDB snapshot
docker exec slaunchx-redis-test redis-cli -a <password> BGSAVE

# Copy the dump file
docker cp slaunchx-redis-test:/data/dump.rdb /home/backups/redis-$(date +%Y%m%d).rdb

MinIO 备份

bash
# Mirror all buckets
docker exec slaunchx-minio-test mc mirror local/slaunchx /backup/minio/

日志管理

Docker 日志轮转

所有容器启动时均使用 --log-opt max-size=50m --log-opt max-file=3,限制每个容器最多 150 MB 日志(3 x 50 MB 文件,自动轮转)。

查看日志

bash
# Recent logs
docker logs --tail 200 slaunchx-app-prometheus-test

# Logs since a timestamp
docker logs --since "2026-03-23T10:00:00" slaunchx-app-prometheus-test

# Follow real-time
docker logs -f slaunchx-app-prometheus-test

# Filter errors
docker logs slaunchx-app-prometheus-test 2>&1 | grep -iE "error|exception|fatal" | tail -30

日志位置

Docker 将日志存储在:

{docker-data-root}/containers/{container-id}/{container-id}-json.log

使用 docker info | grep "Docker Root Dir" 查找数据根目录。

服务升级

标准升级流程

bash
MODULE=app-prometheus
ENV=test
PORT=18020
IMAGE=localhost:5000/slaunchx/$MODULE:dev-latest

# 1. Pull new image
docker pull $IMAGE

# 2. Stop and remove old container
docker stop slaunchx-$MODULE-$ENV
docker rm slaunchx-$MODULE-$ENV

# 3. Start new container (same command as initial deploy)
docker run -d \
  --name slaunchx-$MODULE-$ENV \
  --network slaunchx-intra \
  -p 0.0.0.0:$PORT:$PORT \
  --env-file /opt/slaunchx/config/$MODULE/$ENV.env \
  -e SPRING_PROFILES_ACTIVE=$ENV \
  --log-opt max-size=50m --log-opt max-file=3 \
  --restart unless-stopped \
  $IMAGE

# 4. Verify health
sleep 15  # wait for Spring Boot startup
curl -s http://localhost:$PORT/prometheus/actuator/health

使用部署脚本

bash
# Upgrade a single module (handles stop/rm/pull/run)
ci/local/deploy.sh test app-prometheus

# Upgrade all modules
ci/local/deploy.sh test

升级时的数据库迁移

Flyway 在启动时自动运行。如果新镜像中包含新的迁移文件:

  1. Flyway 检测到新文件并按顺序执行
  2. 如果迁移失败,容器会退出——请查看日志
  3. 请勿手动修补数据库来绕过失败的迁移
bash
# Check Flyway history
docker exec -i slaunchx-mysql-test mysql -uslaunchx -p<password> slaunchx \
  -e "SELECT version, description, success FROM flyway_schema_history ORDER BY installed_rank DESC LIMIT 10;"

镜像仓库清理

旧 Docker 镜像会在私有仓库中不断累积。定期清理:

bash
# Run garbage collection
docker exec slaunchx-registry bin/registry garbage-collect \
  /etc/docker/registry/config.yml --delete-untagged

# Check registry disk usage
du -sh /home/registry/data/

故障排查

容器无法启动

bash
# Check exit code and logs
docker inspect slaunchx-app-prometheus-test --format '{{.State.ExitCode}}'
docker logs --tail 50 slaunchx-app-prometheus-test
退出码含义
0正常关闭
1应用错误(查看日志)
137OOM killed(增大 -Xmx 或主机内存)
143SIGTERM(正常 docker stop)

MySQL 连接问题

bash
# Test from inside the app container
docker exec -it slaunchx-app-prometheus-test bash
# Then: curl -s slaunchx-mysql-test:3306 || echo "Cannot reach MySQL"

# Test credentials
docker exec -it slaunchx-mysql-test mysql -uslaunchx -p<password> -e "SELECT 1"

Redis 连接问题

bash
docker exec -it slaunchx-redis-test redis-cli -a <password> INFO server | head -5

RabbitMQ 队列堆积

bash
# Check queue depths
curl -s -u slaunchx:<password> http://localhost:18672/api/queues | \
  python3 -c "import sys,json; [print(f'{q[\"name\"]}: {q[\"messages\"]}') for q in json.load(sys.stdin)]"

磁盘空间

bash
# Docker disk usage
docker system df

# Reclaim unused resources (careful in production)
docker system prune --volumes  # removes stopped containers, unused images/volumes

内部手册