Introduction: Why Your .NET API Feels Sluggish
Modern .NET APIs are incredibly powerful, yet many teams inadvertently introduce performance bottlenecks that degrade user experience and increase infrastructure costs. This guide examines six common pitfalls we've observed across numerous projects, from startups to enterprise systems. Each section follows a problem-solution format, explaining why the issue occurs, how to diagnose it, and concrete steps to fix it. By the end, you'll have a practical checklist to audit your own APIs.
One recurring pattern is that developers often trust framework defaults or follow patterns that worked in older versions of .NET without adapting to modern best practices. For example, the shift from synchronous to asynchronous programming introduced subtle traps around thread pool starvation and deadlocks. Similarly, the move to JSON-based APIs exposed serialization inefficiencies that were less noticeable with XML or binary formats.
Throughout this article, we'll use composite scenarios drawn from common experiences: a team building a high-throughput order processing service, another optimizing a reporting endpoint that aggregates data from multiple sources, and a third dealing with a legacy API that gradually slowed down as traffic grew. These stories illustrate how the same pitfalls manifest in different contexts and how targeted fixes can yield dramatic improvements.
Before diving in, a word on methodology: the recommendations here reflect practices widely adopted by the .NET community as of May 2026. Always verify critical decisions against current Microsoft documentation and your specific workload characteristics. Now, let's explore the first pitfall: the sync-over-async antipattern.
1. Sync-Over-Async: The Silent Thread Killer
One of the most pervasive performance killers in modern .NET APIs is the sync-over-async antipattern—calling synchronous methods from asynchronous contexts or vice versa. This often manifests when developers use .Result or .Wait() on tasks, or when they mark controllers as async but call synchronous blocking operations inside. The result is thread pool starvation, increased latency, and reduced throughput.
Why This Hurts
When you block a thread in an async context, you effectively waste a thread that could have been used to serve other requests. Under load, the thread pool must create more threads, which incurs overhead from context switching and memory allocation. In extreme cases, the API can become unresponsive as all threads are blocked waiting for I/O operations to complete.
A common scenario: a developer implements an async action method that calls a synchronous database driver via .Result. The async method returns a task, but the synchronous call blocks the current thread until the database responds. During that time, that thread cannot process any other requests. If many requests arrive simultaneously, the thread pool may queue them, leading to high response times.
How to Fix
The fix is straightforward: use async all the way down. Ensure every method in the call chain returns Task or Task<T> and is awaited. For libraries that only offer sync APIs, consider using Task.Run to offload the blocking work to a thread pool thread, but this is a workaround, not a solution. Ideally, use async-compatible libraries.
For example, instead of:
public IActionResult GetData() { var result = _db.Query().Result; return Ok(result); }Write:
public async Task<IActionResult> GetData() { var result = await _db.QueryAsync(); return Ok(result); }Diagnosing the Issue
Use tools like Application Insights or dotTrace to monitor thread pool utilization. If you see a high number of thread pool threads (above the ideal count of your CPU cores) and high latency under load, sync-over-async is likely present. Another sign: the API becomes unresponsive during peak traffic but recovers when load decreases.
In a composite scenario, a team supporting a payment processing API noticed that latency spiked from 50ms to over 2 seconds during lunch hours. By profiling, they discovered a third-party SDK that only offered synchronous HTTP calls. They switched to an async-compatible SDK and latency dropped back to normal within days.
Remember that async is not free—it adds a small overhead per method call due to state machine creation. However, the benefit of freeing threads far outweighs this cost under load. For extremely high-throughput endpoints, consider using ValueTask to reduce allocations, but start with standard async patterns.
Transitioning to the next pitfall, let's examine how excessive memory allocations can silently degrade performance over time.
2. Excessive Allocations: The Hidden Tax on Throughput
Every object allocation in .NET eventually triggers garbage collection (GC), which pauses application threads. While modern GC is efficient, high allocation rates—especially in hot paths—can cause frequent Gen0 collections that degrade throughput and increase latency percentiles. In .NET APIs, common culprits include string concatenation, LINQ overhead, and boxing of value types.
Why Allocations Matter
Allocating memory is fast, but collecting it is not. When the GC runs, it suspends managed threads to compact memory. For server workloads, even a few milliseconds of pause per collection can add up under high request rates. Moreover, allocations increase memory pressure, leading to more frequent collections and potential out-of-memory conditions.
A typical example: an API endpoint that processes a list of orders and returns a summary string. Using string.Join with a list of strings is fine, but using + in a loop creates many intermediate strings that must be collected. Similarly, using LINQ's .ToList() unnecessarily creates new list objects that could be avoided with streaming.
How to Fix
Use StringBuilder for complex string building, prefer ArrayPool<T> for temporary buffers, and avoid LINQ in performance-critical sections if it creates hidden allocations (e.g., .OrderBy). For value types, implement IEquatable<T> to avoid boxing when used in dictionaries or LINQ queries.
Another powerful technique is to use Span<T> and Memory<T> for slice operations without allocating. For example, parsing a CSV line can be done with MemoryExtensions methods that operate on spans, avoiding substring allocations.
Consider a reporting endpoint that aggregates sales data from multiple sources. The original code used ConcurrentDictionary with string keys and LINQ to group results. Profiling revealed that 30% of CPU time was spent in GC. By replacing LINQ with loops and using ArrayPool for temporary lists, allocations dropped by 80%, and throughput doubled.
Diagnosing Allocation Issues
Use dotMemory or PerfView to capture allocation snapshots. Look for high allocation rates in hot paths. In Visual Studio, the Diagnostic Tools window shows GC activity. A rule of thumb: if your API allocates more than 10 MB per request, investigate. For high-throughput services, aim for near-zero allocations in critical paths.
It's also worth noting that the .NET runtime has improved significantly with each version. For example, .NET 6+ introduced JsonSerializer optimizations that reduce allocations for repeated serialization. But don't rely solely on runtime improvements—profile and fix proactively.
Now, let's move to the third pitfall: inefficient JSON serialization, a common pain point in REST APIs.
3. Inefficient JSON Serialization: The Cost of Reflection
JSON serialization is a core part of any modern API, yet it's often a performance bottleneck. The default System.Text.Json serializer uses reflection to read and write properties, which adds overhead per serialization call. For high-throughput endpoints, this can become a significant CPU consumer. Additionally, serialization of large payloads or deeply nested objects can exacerbate memory pressure.
Why Reflection Slows You Down
Reflection involves runtime type inspection and method invocation, which is slower than direct code. While System.Text.Json caches reflection results, the first serialization for each type still pays a startup cost. More importantly, reflection-based serialization cannot be optimized by the JIT compiler as effectively as hand-written code.
A common example: an API endpoint that returns a list of 10,000 items with many nested objects. With reflection-based serialization, the CPU spends a large fraction of time enumerating properties and writing them. The response size also matters—large JSON strings require more memory and bandwidth.
Another scenario: using Newtonsoft.Json (Json.NET) for legacy reasons. While it's flexible, it's generally slower and more allocation-heavy than System.Text.Json. Teams migrating to newer .NET versions often keep Json.NET for compatibility, inadvertently leaving performance on the table.
How to Fix
Enable source generators in System.Text.Json by creating a partial class that derives from JsonSerializerContext. This generates compile-time serialization code, eliminating reflection entirely. For example:
[JsonSerializable(typeof(MyModel))] public partial class MyJsonContext : JsonSerializerContext { }Then serialize using JsonSerializer.Serialize(myModel, MyJsonContext.Default.MyModel). This approach can double serialization throughput in some benchmarks.
If you must use Json.NET, consider upgrading to System.Text.Json with source generation. For cases where the JSON structure is dynamic, use Utf8JsonWriter directly for maximum performance, but be aware of the complexity.
Other Serialization Optimizations
Reduce payload size by using JsonPropertyName to shorten property names, or consider using Protocol Buffers for internal services. For public APIs, use compression (e.g., Brotli) to reduce wire size. Also, avoid serializing large binary data as Base64—use binary endpoints instead.
In a composite scenario, a team supporting a product catalog API found that serialization accounted for 40% of CPU time. They migrated to source-generated serialization and shortened property names, cutting CPU usage by 25% and reducing average response time from 200ms to 150ms.
Next, we'll tackle improper caching, a pitfall that can either save or destroy performance depending on implementation.
4. Improper Caching: When Your Cache Becomes a Liability
Caching is a powerful performance optimization, but done incorrectly, it can lead to stale data, memory bloat, or even increased latency due to cache overhead. Common mistakes include caching too much, caching too little, using inappropriate eviction policies, or not considering cache stampede (thundering herd) scenarios.
Why Caching Goes Wrong
One classic error is using an in-memory cache (like IMemoryCache) without size limits. Over time, the cache grows unbounded, consuming memory and causing GC pressure. Another mistake is caching data that changes frequently, resulting in constant invalidation and re-caching, which adds overhead without benefit.
Cache stampede occurs when many requests miss the cache simultaneously and all try to recompute the same expensive result. This can overwhelm the backend database or service. A common example: an API endpoint that computes a daily report. When the cache expires at midnight, hundreds of users request the report, each triggering a database query.
In distributed systems, using local in-memory cache for each instance leads to inconsistency. One instance may have stale data while another has fresh data. The solution is a distributed cache like Redis, but that introduces network latency.
How to Fix
First, apply a caching strategy that matches your data characteristics. For data that changes infrequently (e.g., configuration, product categories), use a long TTL and proactive invalidation. For data that changes frequently (e.g., user sessions), consider a sliding expiration and limit cache size.
To prevent cache stampede, use a distributed locking mechanism or the "early expiration" pattern: recompute the cache before it expires, so that only one thread regenerates the value. In .NET, you can use SemaphoreSlim to throttle concurrent cache builds.
For distributed caching, use IDistributedCache with Redis and serialize data efficiently. Consider using HybridCache (available in .NET 9+) that combines local and distributed caching with stampede protection.
Diagnosing Cache Issues
Monitor cache hit ratios. A very high ratio (>95%) might indicate you're caching too aggressively and serving stale data. A low ratio (
Comments (0)
Please sign in to post a comment.
Don't have an account? Create one
No comments yet. Be the first to comment!