How a Feed App Survives: Two-Level Caching, Weak-Network Degradation, and OOM Defense

From Stuttering and Blank Images to Stable and Usable: How I Implemented Two-Level Caching, Weak Network Degradation, and OOM Management in a Feed App

When building a feed-based app, the most common problems almost always appear together:

The first screen of the list is slow, and users see a white screen first.
Images take forever to appear while scrolling, or they suddenly all load in a burst.
Under a weak network, there is frantic refreshing, pagination failures, and a terrible experience.
In multi-image scenarios, memory spikes and the app is eventually killed by the system.

On the surface, these problems seem to belong to separate domains like networking, caching, UI, and memory, but in reality, they are different links in the same chain.

In this article, I will combine a UIKit feed project to systematically lay out a set of solutions I consider practical:

Two-level caching for Feed data
Two-level caching for images
Request deduplication
RunLoop idle scheduling for decoding
Weak network degradation
OOM memory management
Observability loop with logging and analytics

This set of solutions does not pursue "flashy techniques"; the focus is on being implementable, stable, and able to clearly explain why it is done this way.

1. Where a Feed Project Is Most Likely to Die

Let's start with the conclusion: The real difficulty in a feed-based app is not loading data, but controlling peak resource usage.

Common problems fall into these categories:

The bitmap after decoding an original image is too large, causing an instantaneous memory spike.
Multi-image cells, fast scrolling, and prefetching trigger simultaneously, creating a very high concurrency peak.
The same image is repeatedly requested, decoded, and cached.
Off-screen cells continue to load images.
Meaningless refreshing and pagination still happen under a weak network.
After a memory warning, although the cache is cleared, background tasks continue to run, and memory quickly rebounds.

Therefore, optimizing a feed is never just about "adding a cache"; it is a whole set of coordinated mechanisms.

2. Establish the Architectural Boundaries First

In this project, I adopted a fairly standard layered approach:

UI Layer: ViewController, FeedPostCell
ViewModel Layer: FeedViewModel
Network/Infrastructure Layer: ImageLoader, NetworkStateMonitor, FeatureFlagCenter
Persistence Layer: SQLitePostStore
Composition Root: SceneDelegate

The core benefit of this separation can be summed up in one sentence:

The UI only handles display, the business logic only handles orchestration, and the infrastructure only provides capabilities.

This directly impacts subsequent performance management, because when image loading, weak network degradation, and caching strategies are not scattered across ViewControllers, optimization becomes much easier.

3. Two-Level Feed Data Caching: Get Content Displayed as Soon as Possible

The first principle of the list content experience is not "always the latest," but to have content as soon as possible, then calibrate asynchronously.

So, for Feed data, I implemented two data sources:

L1: SQLite local cache
L2: FeedAPI network data

The cold start flow is:

First, try to read the cache from SQLite.
If data exists, display it first.
Then, asynchronously initiate a refresh to fetch the latest data and overwrite.
After success, write back to SQLite.

Core code:

func loadInitial() async {
    if featureFlags.bool(.diskPostCacheEnabled) {
        if let cached = try? store.fetchLatest(limit: 50), !cached.isEmpty {
            posts = cached
            analytics.track(.feedCacheHit, properties: ["count": cached.count])
            onStateChanged?(self)
        }
    }
    await refresh()
}

This strategy is very suitable for feeds:

The first screen on cold start is faster.
Content is still viewable when offline.
The latest data is automatically calibrated after the network recovers.

4. Two-Level Image Caching: The Key to Scrolling Experience

For image caching, I split it into two levels:

L1: LRUCache<String, UIImage>, caches decoded images.
L2: URLCache.shared, caches the original response data.

1) L1 Hit Returns Immediately

If the same image at the same size is already in memory, return the UIImage directly. This path is the fastest.

if let cached = memoryCache.value(forKey: key) {
    log.debug("memory cache hit key=\(key, privacy: .public)")
    AnalyticsTracker.shared.track(.imageCacheHit, properties: nil)
    completion(.success(cached))
    return token
}

2) Network Request Only on L2 Miss

URLSession with URLCache.shared will prioritize the system cache and only make a network request on a miss.

let config = URLSessionConfiguration.default
config.requestCachePolicy = .useProtocolCachePolicy
config.urlCache = .shared
config.httpMaximumConnectionsPerHost = 8
session = URLSession(configuration: config)

3) Downsample Before Writing Back to L1

Note that this is critical: what is written into L1 is not the original data, but a UIImage downsampled to the target size.

This step is essentially about controlling bitmap memory.

5. Why "Images Are White While Scrolling, and Only Appear When You Stop"

Many people seeing this phenomenon for the first time assume the network is slow.

In fact, what is slow is often not the network, but that decoding is actively delayed.

In this project, the downloaded data is not decoded immediately. Instead, it is first thrown into RunLoopIdleWorkScheduler, waiting for the main thread to enter an idle point, and then the task is dispatched to a background decoding queue.

Core code:

RunLoopIdleWorkScheduler.shared.enqueue { [weak self] in
    guard let self else { return }
    self.decodeQueue.async {
        let image = ImageDownsampler.downsample(
            data: data,
            to: targetPixelSize,
            scale: UIScreen.main.scale
        )
        if let image {
            let cost = ImageLoader.approxCost(of: image)
            self.memoryCache.setValue(image, forKey: key, cost: cost)
            self.finish(key: key, result: .success(image))
        } else {
            self.finish(key: key, result: .failure(NSError(domain: "ImageLoader", code: -2)))
        }
    }
}

RunLoop observation point:

let obs = CFRunLoopObserverCreate(
    kCFAllocatorDefault,
    CFRunLoopActivity.beforeWaiting.rawValue,
    true,
    0,
    { _, _, info in
        guard let info else { return }
        let scheduler = Unmanaged<RunLoopIdleWorkScheduler>.fromOpaque(info).takeUnretainedValue()
        scheduler.drain(maxCount: 2)
    },
    &context
)

What does this mean?

When the user is scrolling fast, the main thread is busy.
The RunLoop does not easily enter an idle state.
Decoding tasks accumulate.
When the user stops, draining begins.
Visually, this results in "images only start filling in when you stop."

This is not a bug, but a typical performance trade-off:

It is better to delay image display slightly than to drop frames while scrolling.

6. Request Deduplication: Saving More Than Just Bandwidth

Another easily overlooked problem in feeds is duplicate requests for the same image.

For example:

The same cell is rapidly reused.
Prefetching and actual display request simultaneously.
Multiple positions request the same URL at the same time.

Without deduplication, the problems are very obvious:

Network waste
Duplicate decoding
Duplicate memory usage
A surge in the number of in-flight requests

So, the project added a layer of inFlight merging:

if var inflight = inFlight[key] {
    inflight.tokens.insert(token)
    inflight.completions.append(completion)
    inFlight[key] = inflight
    lock.unlock()
    return token
}

The benefits of this design are huge:

Only one request is sent for the same key.
After the result returns, all waiters are called back uniformly.
It is a subtraction for both network and memory.

7. Why LRUCache Must Evict by Cost

Image caching cannot use a "number of images" limit, because the memory footprint of a small image and a large image are not in the same order of magnitude.

Therefore, the cache must evict by cost.

Here, I used a custom LRUCache with the internal structure:

dict for O(1) lookup.
A doubly linked list to maintain the least-recently-used order.
Eviction from the tail when totalCostLimit is exceeded.

Core code:

func setValue(_ value: Value, forKey key: Key, cost: Int) {
    lock.lock()
    defer { lock.unlock() }

    if let node = dict[key] {
        totalCost -= node.cost
        node.value = value
        node.cost = max(0, cost)
        totalCost += node.cost
        moveToHead(node)
    } else {
        let node = Node(key: key, value: value, cost: max(0, cost))
        dict[key] = node
        insertAtHead(node)
        totalCost += node.cost
    }

    evictIfNeeded()
}

Bitmap cost estimation:

private static func approxCost(of image: UIImage) -> Int {
    guard let cg = image.cgImage else { return 1 }
    return cg.bytesPerRow * cg.height
}

This is a more reasonable approach for image caching in feeds.

8. Weak Network Degradation: Not Just a "No Network" Alert

In many projects, handling a weak network is just showing an alert box.

But for a Feed, truly useful weak network degradation should be:

Prioritize displaying existing cache when offline.
Do not disturb the user when there is content.
Give a clear prompt when there is no content.
Prevent meaningless refreshing and pagination.

The degradation logic in this project is:

if featureFlags.bool(.weakNetworkDegradeEnabled), networkMonitor.isOnline == false {
    if posts.isEmpty {
        onError?("Currently offline, downgraded to display local cache only")
    }
    return
}

Combined with the SQLite cold start backfill, the user experience in weak network or offline scenarios becomes much more stable:

No sudden blank screens.
No error pop-ups every time.
No endless retries of pagination.

9. The Root Cause of OOM Is Not "Cache Too Large," but "Uncontrolled Peaks"

When many people encounter OOM, their first reaction is to reduce the cache size.

This certainly helps, but the more common real cause is: the instantaneous memory peak is too high.

For example, a typical scenario:

The user scrolls quickly.
Prefetching triggers many images.
Downloads all complete.
The user stops.
The RunLoop starts draining.
The decodeQueue processes multiple images consecutively.
A bunch of bitmaps enter memory simultaneously.
Superimposed on the LRU and system cache.
The peak spikes instantly, and the app is killed by the system.

So, what OOM management really needs to do is:

Reduce the generation of unnecessary large objects.
Reduce duplicate decoding.
Reduce invisible resources from continuing to work.
Immediately stop the bleeding on a memory warning.

10. After a Memory Warning, Why Just Clearing the Cache Is Not Enough

Many projects only do one thing on a memory warning:

cache.removeAll()

But in a feed, this is usually not enough.

Because right after you clear it, background downloads are still ongoing, decoding tasks in the idle queue are still ongoing, and memory will spike back up the next second.

So, the MemoryGuard in this project performs a whole set of actions:

@objc private func didReceiveMemoryWarning() {
    let cacheCost = ImageLoader.shared.currentCacheCost
    let inFlightCount = ImageLoader.shared.currentInFlightCount
    log.warning("memory warning cache_cost=\(cacheCost, privacy: .public) inflight=\(inFlightCount, privacy: .public)")
    AnalyticsTracker.shared.track(.memoryWarning, properties: [
        "cache_cost": cacheCost,
        "inflight": inFlightCount
    ])

    FeatureFlagCenter.shared.set(false, for: .imagePrefetchEnabled)
    imageCache?.removeAll()
    ImageLoader.shared.cancelAllLoads()
}

Combined with:

func cancelAllLoads() {
    lock.lock()
    let tasks = inFlight.values.map(\.task)
    tokenToKey.removeAll()
    inFlight.removeAll()
    lock.unlock()

    tasks.forEach { $0.cancel() }
    RunLoopIdleWorkScheduler.shared.removeAll()
}

The significance of this strategy is:

Clear the LRU.
Cancel all in-flight requests.
Clean up pending decoding tasks.
Automatically disable prefetching and enter a low-memory degraded mode.

This is the real "stop the bleeding."

11. Cancellation Strategy: Don't Wait for Reuse to Cancel

Many people only cancel image tasks in prepareForReuse().

This has a problem:

The cell is already off-screen.
But it hasn't been reused yet.
During this time, it may still continue downloading and decoding.

So, a more robust approach is a three-layer cancellation:

1) Cancel on Cell Reuse

override func prepareForReuse() {
    super.prepareForReuse()
    cancelImageLoading()
    imageViews.forEach { $0.image = nil }
    lastPostID = nil
}

2) Cancel When Cell Goes Off-Screen

func tableView(_ tableView: UITableView, didEndDisplaying cell: UITableViewCell, forRowAt indexPath: IndexPath) {
    (cell as? FeedPostCell)?.cancelImageLoading()
    prefetchTokensByIndexPath.removeValue(forKey: indexPath)?.forEach { imageLoader.cancelLoad($0) }
}

3) Cancel Prefetching

func tableView(_ tableView: UITableView, cancelPrefetchingForRowsAt indexPaths: [IndexPath]) {
    for indexPath in indexPaths {
        guard let tokens = prefetchTokensByIndexPath.removeValue(forKey: indexPath) else { continue }
        tokens.forEach { imageLoader.cancelLoad($0) }
    }
}

Only by layering these three can you truly avoid "meaningless loading."

12. Observability: Without Data, There Is No Real Optimization

What performance optimization for feeds fears most is "going by feel."

So, in this project, I added several key event types:

feedCacheHit
imageCacheHit
imageLoadSuccess
imageLoadFailure
memoryWarning

This allows answering these questions later:

Is the first-screen cache hit rate high?
Is the image memory cache actually effective?
Did the image failure rate suddenly spike in a certain version?
What were the cache cost and in-flight count during a memory warning?
Was there too much prefetching or decoding before an OOM?

This information is critical for locating online OOMs, because many OOMs don't produce a usable crash stack at all.

13. Complete Link Flowchart

flowchart TD
  A[Enter Feed] --> B[SQLite Cold Start Backfill]
  A --> C[cellForRowAt -> FeedPostCell.configure]
  C --> D[ImageLoader.loadImage]

  D --> E{L1 Memory Cache Hit?}
  E -->|Yes| F[Display Image on Main Thread]
  E -->|No| G{Existing inFlight Request?}
  G -->|Yes| H[Reuse Same Task]
  G -->|No| I[Start URLSession Request]

  I --> J{URLCache Hit?}
  J -->|Yes| K[Return Data Directly]
  J -->|No| L[Network Download]
  L --> K

  K --> M[RunLoopIdleWorkScheduler.enqueue]
  M --> N{Main Thread Idle?}
  N -->|No| O[Wait for Idle Point]
  N -->|Yes| P[Background Downsample on decodeQueue]

  P --> Q[Write to LRUCache]
  Q --> R[Main Thread Completion]
  R --> F

  F --> S{Cell Off-screen / Fast Scrolled?}
  S -->|Yes| T[didEndDisplaying Cancel]
  S -->|Yes| U[cancelPrefetching Cancel Prefetch]

  F --> V{Memory Warning?}
  V -->|Yes| W[Log cache_cost / inflight]
  W --> X[Report memoryWarning]
  X --> Y[Disable imagePrefetchEnabled]
  Y --> Z[Clear LRU + cancelAllLoads]

14. Actual Benefits Brought by This Solution

For Users

Faster first screen, content is seen first.
Smoother scrolling, less likely to drop frames.
Still usable under a weak network.
Less likely to crash on memory warning.

For Development

Problem diagnosis is more evidence-based.
Caching, degradation, and image loading logic can all evolve independently.
Performance optimization is no longer "guesswork."

15. Final Summary

Performance optimization for feeds is not a single small trick, but a system design.

If you only do caching without request deduplication, the effect is limited. If you only do prefetching without cancellation, you might actually blow up the memory. If you only clear the cache without stopping background tasks, memory will rebound after a memory warning. Without analytics and logging, you don't even know if the optimization had any effect.

The truly reliable approach is:

Get content out as soon as possible.
Then smoothly fill in images.
Reduce meaningless work under a weak network.
Immediately stop the bleeding on low memory.
Finally, close the loop with observability.

In one sentence:

The performance stability of a feed-based app is essentially resource lifecycle management.

Project