Your C# Collections Are Leaking: Common Performance Pitfalls and How to Plug Them

Every C# developer has written code that works fine in small tests but slows to a crawl under real load. Often the culprit isn't bad algorithms—it's how we use collections. The list you chose because it was familiar, the LINQ query you wrote without thinking about allocations, the dictionary you left with a default comparer—these seemingly innocent choices can silently leak performance. This guide walks through the most common collection pitfalls in C# and shows how to fix them without over-engineering.

Where Collection Leaks Show Up in Real Work

Collection performance problems rarely appear in unit tests or during local development. They emerge under production load, and they tend to cluster in a few predictable places. The most common is high-traffic API endpoints that return lists of entities. A typical controller action might fetch a set of records, filter them with LINQ, and materialize the result into a List<T>. If that endpoint handles thousands of requests per second, the repeated allocations and GC pressure become a bottleneck.

Another hotspot is batch processing pipelines. When you read a file line by line and accumulate results into a collection that grows unpredictably, the internal array resizing costs add up. A team I once worked with had a CSV import job that took 45 minutes for a million-row file. Replacing the intermediate List<T> with a pre-sized array and a separate index variable cut the time to under 10 minutes—no algorithm change, just better collection hygiene.

UI-thread bottlenecks are a third common site. WPF or WinForms applications that populate ListBox or DataGrid items from a collection built on the UI thread often cause visible lag. The culprit is often a naive search or sort operation on the collection inside a property getter or an event handler. Moving that logic to a background thread and using a more appropriate collection type (like a SortedSet or a Dictionary for lookups) can make the UI feel responsive again.

Why These Leaks Are Hard to Spot

The problem is that most performance analysis tools highlight CPU hotspots, not allocation patterns. A method that spends 80% of its time in garbage collection won't show up as a hot function—it will show up as high GC overhead in a memory profiler. Developers often miss this because they profile CPU first. The collection leak is a memory problem that masquerades as a CPU problem.

Additionally, collection leaks are cumulative. A single extra allocation per request might add only a few microseconds, but under 10,000 requests per second, that's 10 million extra allocations per second. The garbage collector eventually kicks in, and the pause times ruin throughput. The leak isn't a bug; it's a slow erosion of performance that's easy to ignore until it becomes a crisis.

Foundations Readers Confuse

Many C# developers conflate algorithmic complexity with real-world performance. They memorize that Dictionary lookup is O(1) and List lookup is O(n), but they forget that O(1) includes the cost of hashing, bucket lookup, and potential collision resolution. For small collections (fewer than 20 items), a List's linear scan can actually be faster than a Dictionary's hash computation. The constant factors matter.

Another common confusion is between value type and reference type collections. A List<int> stores the integers inline, so iterating it is cache-friendly. But a List<SomeClass> stores references, and the actual objects may be scattered across memory. Iterating that list causes cache misses, which are orders of magnitude slower than the iteration itself. Teams often benchmark the iteration without accounting for memory locality, then wonder why their real-world performance doesn't match the numbers.

The Misunderstood Role of Capacity

Every dynamic collection (List, Dictionary, HashSet) has an internal array that grows when the collection exceeds its capacity. The growth policy doubles the size, which means the last resize is wasted if you know the final size in advance. Setting the initial capacity to the expected number of items can eliminate all resizing overhead. Yet many developers leave the default capacity, even when they know the approximate size. This is one of the simplest and most effective optimizations, yet it's routinely ignored.

There's also confusion about when to use arrays versus lists. Arrays have fixed size but offer the best performance for indexed access and iteration. Lists add dynamic growth and convenience methods like Add and Remove, but they come with a small overhead for bounds checking and method dispatch. In hot paths, an array can be 10–20% faster than a list for simple iteration. The rule of thumb: if the size is known and fixed, use an array; if you need to add or remove items, use a list but pre-size it.

Patterns That Usually Work

When you need temporary buffers for parsing or serialization, System.Buffers.ArrayPool<T> can dramatically reduce allocation. Instead of allocating a new array each time, you rent one from the pool and return it when done. This eliminates allocation overhead for short-lived buffers. The pattern is especially effective in high-throughput scenarios like web servers or message processors. Just remember to return the array to the pool in a finally block to avoid leaks.

For membership checks—testing whether an item exists in a set—HashSet<T> is the standard choice. But its performance depends on a good hash code. If your custom type has a poor GetHashCode implementation (e.g., returning a constant), the HashSet degenerates into a list. Always override GetHashCode and Equals together, and ensure the hash code is well-distributed. For small sets (fewer than 10 items), a simple List with linear search might be faster due to the overhead of hashing.

Sorted Collections for Range Queries

If you need to retrieve items in sorted order or perform range queries (e.g., all items between two dates), a SortedList or SortedDictionary can be more efficient than sorting a list after each insertion. SortedList uses less memory and has faster indexed access, but insertion is O(n) because it shifts elements. SortedDictionary uses a tree and has O(log n) insertion, but uses more memory. The choice depends on your read-to-write ratio. If you insert rarely and read often, SortedList wins. If you insert frequently, SortedDictionary is better.

Another pattern that works well is using ConcurrentDictionary for thread-safe caching. Many teams try to use a regular Dictionary with locks, but ConcurrentDictionary is optimized for concurrent access with fine-grained locking. It also provides atomic operations like GetOrAdd and AddOrUpdate that eliminate race conditions without external synchronization. The key is to use the right overload—GetOrAdd with a value factory is convenient, but it may execute the factory multiple times under contention. If that's a problem, use TryAdd and fall back to a separate retrieval.

Anti-Patterns and Why Teams Revert

The most common anti-pattern is using List<T> for everything. It's the default collection that everyone learns first, and it's versatile. But in hot paths, its Add method causes array resizing, its RemoveAt is O(n), and its Contains is O(n). Teams often start with a List, and by the time performance becomes a problem, the code is so tangled that replacing it with a more appropriate collection requires a big refactor. The result: they leave it as is and add caching or hardware to compensate.

Another anti-pattern is overusing LINQ in tight loops. LINQ queries create enumerator objects, and chaining multiple operations (Where, Select, OrderBy) creates nested enumerators and intermediate allocations. In a loop that runs millions of times, these allocations add up. The fix is to materialize the query once outside the loop or replace LINQ with a plain foreach loop. Many teams resist this because LINQ is more readable, but in performance-critical code, readability must sometimes yield to efficiency.

The Custom Comparer Trap

When using Dictionary or HashSet with custom types, developers often write a comparer that boxes or allocates. For example, a comparer that calls ToString() on each key creates a new string every time the hash is computed. Or a comparer that uses reflection to compare properties is extremely slow. The fix is to implement IEquatable<T> on the type itself and override GetHashCode with a simple, allocation-free computation. If you must use a comparer, make it a struct to avoid heap allocation.

Teams also revert to using ArrayList or Hashtable from the non-generic collections, often because they're working with legacy code or they need to store mixed types. These collections box every value type, causing massive allocation overhead. The modern equivalent is List<object> or Dictionary<object, object>, but even those box. The better approach is to use a generic collection with a common base type or interface, or to use a dynamic type like dynamic if you're in a truly heterogeneous scenario.

Maintenance, Drift, and Long-Term Costs

Collection performance doesn't stay constant over the life of a project. As features are added, collections grow in size or are used in new ways that the original developer didn't anticipate. A Dictionary that was sized for 100 items might now hold 10,000, and the hash collisions that were negligible become a problem. The team that originally chose the collection may no longer be around, and the maintenance developers don't question the choice.

Another form of drift is the gradual addition of logging or metrics inside collection operations. A team adds a line to log every time an item is added to a list, and suddenly the Add method is doing I/O. Or they add a try-catch around a dictionary lookup to log missing keys, and the exception handling overhead kills performance. These changes seem harmless in isolation, but they compound over time.

The Hidden Cost of Default Comparers

When you create a Dictionary<string, string> without specifying a comparer, it uses the default string comparer, which is culture-sensitive. That means every lookup involves culture-specific rules, which are slower than ordinal comparison. In most scenarios, you want StringComparer.Ordinal or OrdinalIgnoreCase. The same applies to other types that implement IComparable with culture sensitivity. This is a silent performance leak that only shows up when you profile string operations.

Long-term, the cost of not auditing collections is technical debt that accumulates interest. A single collection that causes a 5% performance degradation might not be noticeable, but ten such collections across the codebase can make the entire application feel sluggish. The maintenance team ends up spending time on performance tuning instead of feature development. The best time to fix collection leaks is when the code is written; the second best time is during a targeted performance review.

When Not to Use This Approach

Not every piece of code needs optimized collections. In a desktop application that processes a few hundred records on a button click, the overhead of a List is negligible. Spending time to replace it with a pre-sized array or a HashSet is premature optimization. The rule of thumb: if the collection is used in a loop that runs fewer than 10,000 times and the total time is under a millisecond, leave it alone.

Similarly, if you're writing a prototype or a short-lived script, convenience trumps performance. Use List<T> and LINQ freely; the code will be easier to read and modify. Only when the code enters production and profiling shows a hotspot should you optimize. The danger is optimizing too early, which leads to complex code that's harder to maintain.

Avoid Over-Abstraction

Another case where collection optimization backfires is when you abstract the collection behind an interface like IList<T> or IEnumerable<T>. While interfaces are good for testability, they hide the concrete type and prevent the JIT from inlining methods. In hot paths, calling Add on an IList<T> is slower than calling Add on a List<T> because of virtual dispatch. If you control the implementation, use the concrete type in internal code and reserve interfaces for public APIs.

Finally, don't use concurrent collections on single-threaded code. ConcurrentDictionary adds overhead for atomic operations and memory barriers that you don't need if only one thread accesses the collection. Use a regular Dictionary with a lock if you need thread safety, or better yet, use immutable collections if you're in a functional style. The concurrent collections are designed for high-contention scenarios; using them unnecessarily is a performance leak in itself.

Open Questions / FAQ

Should I always set initial capacity?

If you know the approximate number of items, yes. For List, set capacity to the expected count. For Dictionary and HashSet, set capacity to the expected count divided by the load factor (default 0.72). This avoids resizing. If you don't know the size, don't guess—the default is fine.

Is Span<T> always faster than arrays?

Span<T> is a stack-allocated reference type that avoids heap allocation and provides bounds checking. It's faster for slicing and indexing on the stack, but if you need to store the data long-term, you'll still need an array or a memory pool. Use Span<T> for temporary operations on contiguous memory.

How do I choose between SortedList and SortedDictionary?

Use SortedList when you need indexed access (by position) and memory is tight. Use SortedDictionary when you need fast insertions and deletions. Both are sorted, but SortedList is better for read-heavy scenarios, while SortedDictionary is better for write-heavy ones.

What about ImmutableArray vs. ImmutableList?

ImmutableArray is a wrapper around an array and is very fast for reads, but creating a modified copy is O(n). ImmutableList uses a tree structure, so modifications are O(log n) but reads are slower. Use ImmutableArray for data that changes rarely; use ImmutableList for data that changes frequently but is accessed often.

Can I use a List as a queue or stack?

Yes, but it's not efficient. For a queue, use Queue<T>; for a stack, use Stack<T>. These are optimized for their respective operations. Using List with RemoveAt(0) for a queue is O(n) per dequeue, which is terrible for large collections.

After reading this guide, the next step is to profile your own code. Pick one hot path—an API endpoint, a batch job, or a UI operation—and run a memory profiler. Look for collections that are resized frequently, LINQ queries that allocate in loops, and dictionaries with default comparers. Fix those first. Then, in new code, make a habit of choosing the right collection from the start. Your future self—and your users—will thank you.

Your C# Collections Are Leaking: Common Performance Pitfalls and How to Plug Them

Table of Contents

Where Collection Leaks Show Up in Real Work

Why These Leaks Are Hard to Spot

Foundations Readers Confuse

The Misunderstood Role of Capacity

Patterns That Usually Work

Sorted Collections for Range Queries

Anti-Patterns and Why Teams Revert

The Custom Comparer Trap

Maintenance, Drift, and Long-Term Costs

The Hidden Cost of Default Comparers

When Not to Use This Approach

Avoid Over-Abstraction

Open Questions / FAQ

Should I always set initial capacity?

Is Span<T> always faster than arrays?

How do I choose between SortedList and SortedDictionary?

What about ImmutableArray vs. ImmutableList?

Can I use a List as a queue or stack?

Comments (0)

Table of Contents

Where Collection Leaks Show Up in Real Work

Why These Leaks Are Hard to Spot

Foundations Readers Confuse

The Misunderstood Role of Capacity

Patterns That Usually Work

Sorted Collections for Range Queries

Anti-Patterns and Why Teams Revert

The Custom Comparer Trap

Maintenance, Drift, and Long-Term Costs

The Hidden Cost of Default Comparers

When Not to Use This Approach

Avoid Over-Abstraction

Open Questions / FAQ

Should I always set initial capacity?

Is Span<T> always faster than arrays?

How do I choose between SortedList and SortedDictionary?

What about ImmutableArray vs. ImmutableList?

Can I use a List as a queue or stack?

Share this article:

Comments (0)