Managing Node and Application Health in Azure Service Fabric

Managing Node and Application Health in Service Fabric

In large Service Fabric clusters, continuous health monitoring is critical. Whether you are operating hundreds of services or thousands of nodes, knowing their health state helps you maintain reliability and uptime.

🩺 What is "Health" in Service Fabric?

Each Node (physical/VM) and Application (deployed service) has a health state.
Health states can be: OK, Warning, or Error.
Service Fabric aggregates health reports across multiple services and nodes.

Real-World Analogy:

Think of Service Fabric as a hospital managing patients (nodes and apps). Nurses (your monitoring tools) constantly check temperature, blood pressure (metrics) — if anything goes wrong, alarms are raised!

🚀 How Service Fabric Monitors Health

Built-in Health Monitors: Automatically checks node, application, partition, and replica health.
Custom Health Reports: Your services can send additional health information if needed.

Service Fabric Health Hierarchy:

Cluster Health
 ├── Node Health
 │     ├── Application Health
 │         ├── Service Health
 │             ├── Partition Health
 │                 ├── Replica/Instance Health

🛠️ How to View Node and Application Health

Method 1: Using Service Fabric Explorer (SFX)

Open your browser and navigate to: http://localhost:19080/Explorer (local) or your Azure cluster URL.
Expand the Cluster Map.
Click on Nodes → See health status: OK, Warning, or Error.
Expand Applications → View per-service health reports.

Method 2: Using PowerShell

Connect to your cluster:

Connect-ServiceFabricCluster
Get-ServiceFabricClusterHealth

Method 3: Using CLI (sfctl)

Install and use sfctl commands:

sfctl cluster health
sfctl node health --node-name NodeName

🛠️ How to Send a Custom Health Report

In your service code, you can manually report an error or warning:

var healthInformation = new HealthInformation("MyApp", "DatabaseLatency", HealthState.Warning)
{
    Description = "Database response time high."
};

await this.Partition.ReportPartitionHealthAsync(healthInformation);

Custom health reports improve observability dramatically!

💡 Did You Know?

If a service keeps reporting Warning or Error health for a long time without fixing, Service Fabric can trigger auto-scaling or auto-restart actions!

⚡ Common Monitoring Problems and Solutions

Problem: Health report is not visible.
Solution: Check your service code is properly calling ReportHealthAsync.
Problem: Node suddenly goes to "Error".
Solution: Review node capacity usage: memory, disk, CPU — sometimes overloads cause health transitions.
Problem: Frequent Warnings clutter dashboard.
Solution: Implement smarter thresholds before raising health warnings.

🚨 Best Practices

Use both built-in and custom health reporting together for full coverage.
Keep your health report descriptions meaningful and actionable.
Set up alerts in Azure Monitor based on Service Fabric health events.

✅ Self-Check Quiz

What are the three possible health states?
Where do you view health status inside Service Fabric Explorer?
What method is used in .NET to report service health?

⬅️ Previous: Cluster Architecture Next: Application Upgrade Strategies ➡️