Wzorce odzyskiwania

Budowanie odpornych systemów to nie tylko zapobieganie awariom—to graceful odzyskiwanie gdy coś pójdzie nie tak. Z pomocą AI możesz wdrożyć zaawansowane wzorce odzyskiwania, które minimalizują przestoje, zapobiegają kaskadowym awariom i utrzymują jakość usług nawet podczas zakłóceń. Ten przewodnik obejmuje sprawdzone wzorce odzyskiwania dla systemów produkcyjnych.

Filozofia odpornego odzyskiwania

Nowoczesne systemy muszą przyjąć awarie jako nieuniknione i projektować dla odzyskiwania:

Awaria szybko, odzyskiwanie szybko

Wykrywaj awarie szybko i natychmiast inicjuj odzyskiwanie

Graceful degradacja

Utrzymuj podstawową funkcjonalność nawet gdy niektóre cechy zawodzą

Automatyczne uzdrawianie

Systemy powinny odzyskiwać się bez interwencji człowieka gdy to możliwe

Ucz się z awarii

Każda awaria to szansa na poprawę odporności

Podstawowe wzorce odzyskiwania

Wzorzec 1: Wyłącznik

Zatrzymaj kaskadowe awarie zapobiegając wywołaniom do wadliwych usług.

JavaScript/TypeScript
Java/Spring

// Implementacja wyłącznika generowana przez AI
class CircuitBreaker {
  private state: 'CLOSED' | 'OPEN' | 'HALF_OPEN' = 'CLOSED';
  private failures = 0;
  private lastFailureTime?: Date;
  private successfulProbes = 0;

  constructor(
    private readonly options: {
      failureThreshold: number;
      resetTimeout: number;
      probeThreshold: number;
      onStateChange?: (oldState: string, newState: string) => void;
    }
  ) {}

  async execute<T>(operation: () => Promise<T>): Promise<T> {
    if (this.state === 'OPEN') {
      if (this.shouldAttemptReset()) {
        this.transitionTo('HALF_OPEN');
      } else {
        throw new Error('Wyłącznik jest OTWARTY');
      }
    }

    try {
      const result = await operation();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }

  private onSuccess() {
    this.failures = 0;

    if (this.state === 'HALF_OPEN') {
      this.successfulProbes++;
      if (this.successfulProbes >= this.options.probeThreshold) {
        this.transitionTo('CLOSED');
      }
    }
  }

  private onFailure() {
    this.failures++;
    this.lastFailureTime = new Date();

    if (this.failures >= this.options.failureThreshold) {
      this.transitionTo('OPEN');
    }

    if (this.state === 'HALF_OPEN') {
      this.transitionTo('OPEN');
    }
  }

  private shouldAttemptReset(): boolean {
    return (
      this.lastFailureTime &&
      Date.now() - this.lastFailureTime.getTime() > this.options.resetTimeout
    );
  }

  private transitionTo(newState: 'CLOSED' | 'OPEN' | 'HALF_OPEN') {
    const oldState = this.state;
    this.state = newState;

    if (newState === 'HALF_OPEN') {
      this.successfulProbes = 0;
    }

    this.options.onStateChange?.(oldState, newState);
  }
}

// Użycie z fallbackiem
const paymentBreaker = new CircuitBreaker({
  failureThreshold: 5,
  resetTimeout: 60000, // 1 minuta
  probeThreshold: 3,
  onStateChange: (oldState, newState) => {
    logger.warn('Stan wyłącznika zmieniony', { oldState, newState });
  }
});

async function processPayment(order: Order) {
  try {
    return await paymentBreaker.execute(
      () => paymentService.process(order)
    );
  } catch (error) {
    if (error.message === 'Wyłącznik jest OTWARTY') {
      // Fallback do kolejkowania przetwarzania
      return await queuePaymentForLater(order);
    }
    throw error;
  }
}

// Używając Resilience4j ze Spring Boot
@Service
public class PaymentService {
    private final CircuitBreaker circuitBreaker;
    private final PaymentClient paymentClient;

    public PaymentService() {
        CircuitBreakerConfig config = CircuitBreakerConfig.custom()
            .failureRateThreshold(50)
            .waitDurationInOpenState(Duration.ofSeconds(60))
            .slidingWindowSize(100)
            .permittedNumberOfCallsInHalfOpenState(3)
            .recordExceptions(IOException.class, TimeoutException.class)
            .ignoreExceptions(BusinessException.class)
            .build();

        this.circuitBreaker = CircuitBreaker.of("payment", config);

        // Dodaj nasłuchiwacze zdarzeń
        circuitBreaker.getEventPublisher()
            .onStateTransition(event ->
                log.warn("Przejście stanu wyłącznika: {}", event));
    }

    public PaymentResult processPayment(Order order) {
        return circuitBreaker.executeSupplier(
            () -> paymentClient.process(order),
            throwable -> {
                log.error("Płatność nieudana, używam fallbacku", throwable);
                return queuePaymentForLater(order);
            }
        );
    }
}

Wzorzec 2: Ponówne próby z wykładniczym wycofywaniem

Automatycznie ponawiaj nieudane operacje ze wzrastającymi opóźnieniami.

// Inteligentny mechanizm ponownych prób z jitterem
class RetryStrategy {
  async executeWithRetry<T>(
    operation: () => Promise<T>,
    options: {
      maxRetries: number;
      initialDelay: number;
      maxDelay: number;
      factor: number;
      jitter?: boolean;
      retryableErrors?: (error: any) => boolean;
    }
  ): Promise<T> {
    let lastError: any;

    for (let attempt = 0; attempt <= options.maxRetries; attempt++) {
      try {
        return await operation();
      } catch (error) {
        lastError = error;

        // Sprawdź czy błąd można ponowić
        if (options.retryableErrors && !options.retryableErrors(error)) {
          throw error;
        }

        if (attempt < options.maxRetries) {
          const delay = this.calculateDelay(
            attempt,
            options.initialDelay,
            options.maxDelay,
            options.factor,
            options.jitter
          );

          logger.debug('Ponawiam operację', {
            attempt: attempt + 1,
            maxRetries: options.maxRetries,
            delay,
            error: error.message
          });

          await this.sleep(delay);
        }
      }
    }

    throw new Error(`Operacja nieudana po ${options.maxRetries} próbach: ${lastError.message}`);
  }

  private calculateDelay(
    attempt: number,
    initialDelay: number,
    maxDelay: number,
    factor: number,
    jitter?: boolean
  ): number {
    // Wykładnicze wycofywanie
    let delay = Math.min(initialDelay * Math.pow(factor, attempt), maxDelay);

    // Dodaj jitter aby zapobiec thundering herd
    if (jitter) {
      delay = delay * (0.5 + Math.random() * 0.5);
    }

    return Math.floor(delay);
  }

  private sleep(ms: number): Promise<void> {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

// Użycie z niestandardową logiką ponawiania
const retry = new RetryStrategy();

async function fetchUserData(userId: string) {
  return retry.executeWithRetry(
    () => apiClient.getUser(userId),
    {
      maxRetries: 3,
      initialDelay: 1000,
      maxDelay: 10000,
      factor: 2,
      jitter: true,
      retryableErrors: (error) => {
        // Ponawiaj błędy sieciowe i kody statusu 5xx
        return error.code === 'ECONNRESET' ||
               error.code === 'ETIMEDOUT' ||
               (error.status >= 500 && error.status < 600);
      }
    }
  );
}

Wzorzec 3: Izolacja przegrodem

Izoluj awarie aby zapobiec wpływowi na cały system.

Wzorzec izolacji zasobów

// Izolacja puli wątków dla różnych operacji
class BulkheadManager {
  private bulkheads = new Map<string, Bulkhead>();

  createBulkhead(name: string, config: BulkheadConfig) {
    const bulkhead = new Bulkhead(config);
    this.bulkheads.set(name, bulkhead);
    return bulkhead;
  }

  getBulkhead(name: string): Bulkhead {
    const bulkhead = this.bulkheads.get(name);
    if (!bulkhead) {
      throw new Error(`Przegroda ${name} nie znaleziona`);
    }
    return bulkhead;
  }
}

class Bulkhead {
  private semaphore: Semaphore;
  private queue: Array<() => void> = [];
  private activeRequests = 0;

  constructor(private config: BulkheadConfig) {
    this.semaphore = new Semaphore(config.maxConcurrent);
  }

  async execute<T>(operation: () => Promise<T>): Promise<T> {
    // Sprawdź czy kolejka jest pełna
    if (this.queue.length >= this.config.maxQueueSize) {
      throw new Error('Kolejka przegrody jest pełna');
    }

    // Czekaj na dostępny slot
    await this.semaphore.acquire();
    this.activeRequests++;

    try {
      return await operation();
    } finally {
      this.activeRequests--;
      this.semaphore.release();
    }
  }

  getMetrics() {
    return {
      activeRequests: this.activeRequests,
      queueSize: this.queue.length,
      maxConcurrent: this.config.maxConcurrent,
      maxQueueSize: this.config.maxQueueSize
    };
  }
}

// Przykład użycia
const bulkheadManager = new BulkheadManager();

// Stwórz oddzielne przegrody dla różnych operacji
bulkheadManager.createBulkhead('payment', {
  maxConcurrent: 10,
  maxQueueSize: 50
});

bulkheadManager.createBulkhead('search', {
  maxConcurrent: 20,
  maxQueueSize: 100
});

// Używaj przegród do izolacji operacji
async function processPayment(order: Order) {
  const bulkhead = bulkheadManager.getBulkhead('payment');
  return bulkhead.execute(() => paymentService.process(order));
}

async function searchProducts(query: string) {
  const bulkhead = bulkheadManager.getBulkhead('search');
  return bulkhead.execute(() => searchService.search(query));
}

Wzorzec 4: Zarządzanie timeoutami

Zapobiegaj wyczerpaniu zasobów przez wolne operacje.

// Kompleksowe obsługiwanie timeoutów
class TimeoutManager {
  async executeWithTimeout<T>(
    operation: () => Promise<T>,
    timeoutMs: number,
    options?: {
      onTimeout?: () => void;
      cancelOnTimeout?: boolean;
    }
  ): Promise<T> {
    const timeoutPromise = new Promise<never>((_, reject) => {
      const timer = setTimeout(() => {
        options?.onTimeout?.();
        reject(new TimeoutError(`Operacja przekroczyła timeout po ${timeoutMs}ms`));
      }, timeoutMs);

      // Wyczyść timer jeśli operacja zakończy się pierwsza
      operation().finally(() => clearTimeout(timer));
    });

    try {
      return await Promise.race([operation(), timeoutPromise]);
    } catch (error) {
      if (error instanceof TimeoutError && options?.cancelOnTimeout) {
        // Spróbuj anulować operację jeśli to możliwe
        this.cancelOperation(operation);
      }
      throw error;
    }
  }

  private cancelOperation(operation: any) {
    // Implementacja zależy od typu operacji
    // Dla żądań HTTP, przerwij żądanie
    // Dla zapytań bazy danych, zabij zapytanie
    // Dla operacji asynchronicznych, ustaw flagę anulowania
  }
}

// Kaskadowe timeouty dla rozproszonych wywołań
class CascadingTimeout {
  constructor(private totalTimeout: number) {}

  async executeWithCascadingTimeout<T>(
    operations: Array<{
      name: string;
      operation: () => Promise<any>;
      weight: number; // Względna ważność/oczekiwany czas trwania
    }>
  ): Promise<T[]> {
    const results: T[] = [];
    let remainingTime = this.totalTimeout;
    const startTime = Date.now();

    const totalWeight = operations.reduce((sum, op) => sum + op.weight, 0);

    for (const { name, operation, weight } of operations) {
      const allocatedTime = Math.floor((weight / totalWeight) * this.totalTimeout);
      const actualTimeout = Math.min(allocatedTime, remainingTime);

      logger.debug('Wykonuję operację z timeoutem', {
        name,
        allocatedTime,
        actualTimeout,
        remainingTime
      });

      try {
        const result = await new TimeoutManager().executeWithTimeout(
          operation,
          actualTimeout
        );
        results.push(result);

        // Aktualizuj pozostały czas
        remainingTime = this.totalTimeout - (Date.now() - startTime);
      } catch (error) {
        logger.error('Operacja nieudana', { name, error });
        throw error;
      }
    }

    return results;
  }
}

Wzorzec 5: Mechanizmy fallbackowe

Zapewnij alternatywną funkcjonalność gdy główne systemy zawodzą.

// Strategia fallbacku wielopoziomowego
class FallbackChain<T> {
  private strategies: Array<{
    name: string;
    execute: () => Promise<T>;
    condition?: (error: any) => boolean;
  }> = [];

  addStrategy(
    name: string,
    execute: () => Promise<T>,
    condition?: (error: any) => boolean
  ) {
    this.strategies.push({ name, execute, condition });
    return this;
  }

  async execute(): Promise<T> {
    const errors: Array<{ strategy: string; error: any }> = [];

    for (const strategy of this.strategies) {
      try {
        logger.debug('Próbuję strategii', { name: strategy.name });
        const result = await strategy.execute();
        logger.info('Strategia udana', { name: strategy.name });
        return result;
      } catch (error) {
        errors.push({ strategy: strategy.name, error });

        // Sprawdź czy powinniśmy próbować następnej strategii
        if (strategy.condition && !strategy.condition(error)) {
          logger.error('Strategia nieudana z błędem nieodwracalnym', {
            strategy: strategy.name,
            error
          });
          throw error;
        }

        logger.warn('Strategia nieudana, próbuję następną', {
          strategy: strategy.name,
          error: error.message
        });
      }
    }

    // Wszystkie strategie nieudane
    throw new Error(
      `Wszystkie strategie fallbacku nieudane: ${JSON.stringify(errors)}`
    );
  }
}

// Przykład z rzeczywistego świata: ładowanie profilu użytkownika
async function getUserProfile(userId: string): Promise<UserProfile> {
  const fallback = new FallbackChain<UserProfile>();

  return fallback
    .addStrategy('primary-db', async () => {
      // Spróbuj głównej bazy danych
      return await primaryDb.getUser(userId);
    })
    .addStrategy('replica-db', async () => {
      // Fallback do repliki tylko do odczytu
      logger.warn('Używam repliki do odczytu dla profilu użytkownika');
      return await replicaDb.getUser(userId);
    })
    .addStrategy('cache', async () => {
      // Fallback do cache (może być przestarzały)
      logger.warn('Używam cache profilu użytkownika');
      const cached = await cache.get(`user:${userId}`);
      if (!cached) throw new Error('Nie ma w cache');
      return { ...cached, stale: true };
    })
    .addStrategy('default', async () => {
      // Ostatnia deska ratunku: zwróć minimalny profil
      logger.error('Używam domyślnego profilu użytkownika');
      return {
        id: userId,
        name: 'Użytkownik',
        avatar: '/default-avatar.png',
        limited: true
      };
    })
    .execute();
}

Zaawansowane wzorce odzyskiwania

Systemy samonaprawiające

Wdrożenie automated recovery bez interwencji człowieka.

// Samonaprawiający menedżer usług
class SelfHealingService {
  private healthChecks = new Map<string, HealthCheck>();
  private healingStrategies = new Map<string, HealingStrategy>();
  private isHealing = false;

  registerHealthCheck(name: string, check: HealthCheck) {
    this.healthChecks.set(name, check);
  }

  registerHealingStrategy(name: string, strategy: HealingStrategy) {
    this.healingStrategies.set(name, strategy);
  }

  async monitorAndHeal() {
    setInterval(async () => {
      if (this.isHealing) return;

      for (const [name, check] of this.healthChecks) {
        try {
          const isHealthy = await check.isHealthy();

          if (!isHealthy) {
            await this.attemptHealing(name, check);
          }
        } catch (error) {
          logger.error('Sprawdzenie zdrowia nieudane', { name, error });
        }
      }
    }, 30000); // Sprawdzaj co 30 sekund
  }

  private async attemptHealing(
    checkName: string,
    check: HealthCheck
  ) {
    this.isHealing = true;
    const diagnosis = await check.diagnose();

    logger.warn('Wykryto niezdrową usługę', {
      check: checkName,
      diagnosis
    });

    try {
      // Znajdź odpowiednią strategię naprawy
      const strategy = this.findHealingStrategy(diagnosis);

      if (strategy) {
        logger.info('Próbuję samouzdrawiania', {
          check: checkName,
          strategy: strategy.name
        });

        await strategy.heal(diagnosis);

        // Zweryfikuj że uzdrawianie było udane
        await this.sleep(5000);
        const isHealthy = await check.isHealthy();

        if (isHealthy) {
          logger.info('Samouzdrawianie udane', {
            check: checkName,
            strategy: strategy.name
          });
        } else {
          logger.error('Samouzdrawianie nieudane', {
            check: checkName,
            strategy: strategy.name
          });
          // Tu można eskalować do alertów
        }
      }
    } finally {
      this.isHealing = false;
    }
  }

  private findHealingStrategy(diagnosis: Diagnosis): HealingStrategy | null {
    for (const [, strategy] of this.healingStrategies) {
      if (strategy.canHeal(diagnosis)) {
        return strategy;
      }
    }
    return null;
  }
}

// Przykład: uzdrawianie puli połączeń bazy danych
const dbHealing = new SelfHealingService();

dbHealing.registerHealthCheck('db-connections', {
  isHealthy: async () => {
    const stats = await db.getPoolStats();
    return stats.active < stats.max * 0.9 &&
           stats.waiting === 0;
  },
  diagnose: async () => {
    const stats = await db.getPoolStats();
    return {
      type: 'connection-exhaustion',
      stats,
      timestamp: new Date()
    };
  }
});

dbHealing.registerHealingStrategy('connection-recovery', {
  name: 'connection-recovery',
  canHeal: (diagnosis) => diagnosis.type === 'connection-exhaustion',
  heal: async (diagnosis) => {
    // Wyczyść bezczynne połączenia
    await db.clearIdleConnections();

    // Tymczasowo zwiększ rozmiar puli
    await db.setPoolSize(diagnosis.stats.max * 1.5);

    // Zaplanuj zmniejszenie rozmiaru puli
    setTimeout(async () => {
      await db.setPoolSize(diagnosis.stats.max);
    }, 300000); // 5 minut
  }
});

Integracja chaos engineering

Testuj mechanizmy odzyskiwania proaktywnie.

// Framework chaos engineering do testowania odzyskiwania
class ChaosMonkey {
  private experiments = new Map<string, ChaosExperiment>();

  registerExperiment(experiment: ChaosExperiment) {
    this.experiments.set(experiment.name, experiment);
  }

  async runExperiment(name: string, options?: RunOptions) {
    const experiment = this.experiments.get(name);
    if (!experiment) {
      throw new Error(`Eksperyment ${name} nie znaleziony`);
    }

    logger.info('Rozpoczynam eksperyment chaosu', { name });

    // Zapisz stan początkowy
    const initialMetrics = await this.captureMetrics();

    // Wstrzyknij awarię
    const cleanup = await experiment.inject();

    try {
      // Pozwól systemowi odpowiedzieć
      await this.sleep(options?.duration || 60000);

      // Zmierz wpływ
      const impactMetrics = await this.captureMetrics();

      // Zweryfikuj czy mechanizmy odzyskiwania zadziałały
      const recoverySuccess = await experiment.verifyRecovery();

      return {
        experiment: name,
        initialMetrics,
        impactMetrics,
        recoverySuccess,
        observations: experiment.observations
      };
    } finally {
      // Zawsze posprzątaj
      await cleanup();
    }
  }

  private async captureMetrics() {
    return {
      errorRate: await metrics.getErrorRate(),
      responseTime: await metrics.getResponseTime(),
      throughput: await metrics.getThroughput(),
      availability: await metrics.getAvailability()
    };
  }
}

// Przykład eksperymentu: testuj wyłącznik
const chaosMonkey = new ChaosMonkey();

chaosMonkey.registerExperiment({
  name: 'payment-service-failure',
  description: 'Symuluj awarię usługi płatności',

  inject: async () => {
    // Spraw by usługa płatności zwracała błędy
    await mockServer.setResponse('/api/payment', {
      status: 500,
      body: { error: 'Internal Server Error' }
    });

    // Zwróć funkcję czyszczącą
    return async () => {
      await mockServer.resetResponse('/api/payment');
    };
  },

  verifyRecovery: async () => {
    // Sprawdź czy wyłącznik się otworzył
    const circuitState = await getCircuitBreakerState('payment');

    // Sprawdź czy fallback był używany
    const fallbackMetrics = await metrics.getFallbackUsage();

    return {
      circuitBreakerOpened: circuitState === 'OPEN',
      fallbackUsed: fallbackMetrics.count > 0,
      userExperienceMaintained: await checkUserExperience()
    };
  },

  observations: []
});

Najlepsze praktyki odzyskiwania

Szybkie wykrywanie

Używaj sprawdzeń zdrowia i monitorowania syntetycznego
Ustaw odpowiednie timeouty
Monitoruj współczynniki błędów i opóźnienia
Alertuj o anomaliach, nie tylko progach

Graceful degradacja

Zidentyfikuj funkcje podstawowe vs. opcjonalne
Wdrożone feature flagi do szybkiego wyłączania
Zapewnij znaczące fallbacki
Komunikuj degradację użytkownikom

Automatyczne odzyskiwanie

Wdrożone mechanizmy samouzdrawiania
Używaj wyłączników do zapobiegania kaskadowych awarii
Automatyzuj popularne procedury odzyskiwania
Regularnie testuj ścieżki odzyskiwania

Ucz się i poprawiaj

Prowadź bezwinne post-mortemy
Aktualizuj runbooki na podstawie incydentów
Wdrażaj poprawki aby zapobiec powtórzeniu
Dziel się wiedzą między zespołami

Monitorowanie skuteczności odzyskiwania

Śledź te metryki aby mierzyć sukces odzyskiwania:

// Kolektor metryk odzyskiwania
class RecoveryMetrics {
  private metrics = {
    mttr: new Map<string, number[]>(), // Średni czas odzyskiwania
    recoverySuccess: new Map<string, number>(),
    fallbackUsage: new Map<string, number>(),
    circuitBreakerTrips: new Map<string, number>()
  };

  recordRecovery(service: string, duration: number, success: boolean) {
    // Śledź MTTR
    if (!this.metrics.mttr.has(service)) {
      this.metrics.mttr.set(service, []);
    }
    this.metrics.mttr.get(service)!.push(duration);

    // Śledź współczynnik sukcesu
    const current = this.metrics.recoverySuccess.get(service) || 0;
    this.metrics.recoverySuccess.set(
      service,
      success ? current + 1 : current
    );
  }

  getReport(service: string) {
    const mttrValues = this.metrics.mttr.get(service) || [];
    const avgMttr = mttrValues.reduce((a, b) => a + b, 0) / mttrValues.length;

    return {
      averageMTTR: avgMttr,
      recoverySuccessRate: this.calculateSuccessRate(service),
      fallbackUsageRate: this.metrics.fallbackUsage.get(service) || 0,
      circuitBreakerTrips: this.metrics.circuitBreakerTrips.get(service) || 0
    };
  }
}

Runbooki odzyskiwania

Stwórz automatyczne runbooki dla popularnych scenariuszy awarii:

// Automatyczny runbook odzyskiwania
class RecoveryRunbook {
  constructor(
    private name: string,
    private steps: RecoveryStep[]
  ) {}

  async execute(context: FailureContext): Promise<RecoveryResult> {
    const results: StepResult[] = [];

    logger.info('Wykonuję runbook odzyskiwania', {
      runbook: this.name,
      context
    });

    for (const step of this.steps) {
      try {
        logger.info('Wykonuję krok odzyskiwania', {
          step: step.name
        });

        const result = await step.execute(context);
        results.push(result);

        if (!result.success && step.critical) {
          logger.error('Krytyczny krok nieudany', {
            step: step.name,
            result
          });
          break;
        }
      } catch (error) {
        logger.error('Krok odzyskiwania nieudany', {
          step: step.name,
          error
        });

        if (step.critical) break;
      }
    }

    return {
      runbook: this.name,
      success: results.every(r => r.success),
      steps: results,
      timestamp: new Date()
    };
  }
}

// Przykład: runbook odzyskiwania bazy danych
const dbRecoveryRunbook = new RecoveryRunbook('database-recovery', [
  {
    name: 'verify-connectivity',
    critical: true,
    execute: async (context) => {
      const canConnect = await db.testConnection();
      return { success: canConnect };
    }
  },
  {
    name: 'clear-connection-pool',
    critical: false,
    execute: async (context) => {
      await db.clearPool();
      return { success: true };
    }
  },
  {
    name: 'failover-to-replica',
    critical: true,
    execute: async (context) => {
      if (context.severity === 'critical') {
        await db.failoverToReplica();
        return { success: true };
      }
      return { success: false, skipped: true };
    }
  },
  {
    name: 'notify-oncall',
    critical: false,
    execute: async (context) => {
      await alerting.notifyOncall({
        severity: context.severity,
        runbook: 'database-recovery',
        context
      });
      return { success: true };
    }
  }
]);

Kolejne kroki

Opanuj wzorce odzyskiwania z:

Wzorce debugowania - Znajdź przyczyny źródłowe szybko
Wzorce logowania - Śledź skuteczność odzyskiwania
Wzorce monitorowania - Wykrywaj awarie wcześnie