Heap Allocation in the Internet of Things
Some Internet of Things (IoT) devices will be expected to run for long periods of time to minimize maintenance and replacement costs and so they can maintain availability and their local states. The primary limitations to running lifetime are power and dynamic memory usage. Power is not in the scope of this paper, but to minimize the effect of power, many IoT nodes will be implemented by highly constrained computing environments to minimize power consumption. Several resources are typically constrained, but a critical constraint is the amount of data memory available. Data memory (typically RAM) is used dynamically, so it can be the source of unpredictable behavior. This paper shows the failures that can result from the sometimes counterintuitive behavior of heap allocation, discusses the implications for IoT applications, and introduces a class of solutions to eliminate the failures. The issues discussed here apply to all systems where operating lifetime is of concern, regardless of the size of the system. A detailed and comprehensive discussion of these solutions is discussed in a related paper.
The impact of heap allocation strategies on embedded systems has been discussed online at length [2, 3, 4, 5]. It is a critical issue  in the design of constrained-platform operating systems, which are typically referred to as Real Time Operating Systems (RTOS) whether the application using it is real-time or not.
Memory shortage has been an issue since the earliest use of embedded systems, but its importance has seemingly diminished as the RAM resources of a typical computing node have grown from KB to MB to GB in size. It is still a critical issue in the work of most embedded software engineers, but most software developers and recent college graduates have never dealt with a true shortage of resources.
The audience of this paper is software developers who are new to embedded software development, perhaps because of the growth of the Internet of Things.
Examples of constrained devices applicable to this paper include many Atmel* processors (used in Arduino boards and elsewhere) and the recently introduced Intel®Curie™module. With as little as 2 KB of RAM, these processors require developers to carefully craft memory management rather than just call
malloc(). This paper is also applicable to any system that must run for a long time, regardless of how much memory it has.
Without carefully crafting the dynamic memory usage of a constrained computing node, the node is liable to behave badly or otherwise fail.
This article first presents an explanation for why heap usage is a critical reliability factor. Then we illustrate the effect of heap allocation failures on IoT systems. Finally we survey several solutions for heap allocation failures. Further information is provided in IoT Memory Management: A Case Study .
Reliability and the heap
A heap failure manifests as the inability to allocate a desired buffer. It is also known as a malloc failure, since the
malloc() library call is the most common way to request heap allocations. The
malloc() call indicates failure by returning a zero pointer to the requested buffer. A program can test the return value and respond appropriately, and most engineers are aware of the need to test the
malloc() return value, even if they have never seen a null return. Using the C++ new() construct makes things worse because the allocation is implicit.
A malloc failure is often catastrophic, especially if it is not handled well. Some of the downsides to a malloc failure are:
The process making the
malloc()call typically can’t perform its purpose.
The most common response to a malloc failure is to report it, and the reporting mechanisms often depend on their own mallocs(), which might also fail, often resulting in unrecoverable failure.
Mallocs often occur in groups, and the possibility of one
malloc()failing means the related allocated buffers must be freed (else leaking memory), so additional complexity and code size are incurred because of a rare possibility.
malloc() failure often means device functionality is lost, and can even sometimes mean that the device is lost. Moreover, because it is hard to test, this possibility is often not tested for, so a malloc failure is a likely reason for silent device failure. This might be considered the source of highly unreliable systems, and the developers might not know the source of the silent device failures.
IoT systems make silent failures more problematic. IoT sensor and actuator nodes (called servers in some parlance) might be either inaccessible or very expensive and time-consuming to reach physically. A server node might contain state information, which can be lost, and will be unusable until the failure is handled. And even after successfully restarting a server, the source and nature of the failure can be opaque.
Note that the effect of other constrained resources, such as program memory and processing power, are more easily planned and tested for.
Sources of heap failure
Three primary types of heap failure can appear:
Leaks and corruption can reliably be solved by proper programming methods and testing. Fragmentation can’t. On the other hand, memory leaks often result from too many
free() calls, which also contributes to fragmentation. If an application has hundreds of
free() calls, the difficulty of testing for and dealing with memory leaks can be unworkably high. We will see that reducing the number of
free() calls benefits heap fragmentation as well. The rest of this paper focuses on fragmentation and assumes that memory leaks and heap corruption are resolved issues.
A trivial example of heap fragmentation is a 4 KB heap with a small allocation centered in the middle. Almost 4 KB are available for allocation, but a 2 KB allocation will fail. Such seemingly unlikely problems commonly occur, and they are usually masked by the ability to extend the heap with virtual memory.
Heap fragmentation is a side effect of memory reuse. Reuse results from using
free(), and heap fragmentation doesn’t happen unless
free() is called. By itself, dynamic allocation of memory is not a problem except for allocating too much, which is easy to test for in the absence of
free(). Fragmentation results from using
free() to allow reuse of previously used buffers. For this reason, some programming environments support malloc-only heaps as a primary allocation strategy .
For a trivial example of how fragmentation occurs, consider a 4 KB heap, a series of carefully crafted allocations for illustrative purposes, and a first-fit allocator with these steps:
Free first 1500.
Then try to allocate 1500.
The final allocation fails even though there are 2296 free bytes (4 KB - 1500 - 3 * 100) remaining in the heap. Here is what the heap looks like at the point of failure:
  [1400 free space]  [1500 allocated space] [796 free space]
Heap fragmentation is not inevitable. Fragmentation means that a heap is unable to satisfy a given buffer request even though the total amount of free memory is greater than the size of the request. It results from the stochastic distribution of buffer allocations of varying sizes, durations, and frequencies. Some situations will result in fragmentation and a malloc failure, and others will not. You can’t tell the difference without deep analysis, and seemingly trivial changes to allocation patterns can have a major impact on the issue.
The author has spent many hours examining heap details. Given enough time, heaps tend to become disorganized in surprising and even perverse ways, reducing effective heap capacity. For example, a very small number of long-lived allocations can result in unfortunate allocation patterns long after they are freed.
Note that these comments about heap allocation refer to a conventional heap, such as that managed by libc on various Linux systems. There are various replacements for the standard allocator, and the replacements have differing performance characteristics, including how likely they are to fragment in specific situations. However, the differences are relatively small, and the problems result from arbitrary
malloc() occurrences over time in a single linear heap, not a because of a failing of the allocation algorithm used. Qualitatively different allocation methods offered by most RTOSes do contribute to the solution of fragmentation: they must be understood and used properly, and they might add noticeable overhead relative to a more targeted solution.
Here are some myths surrounding heap fragmentation:
Fragmentation results from memory leakage. No, it can occur when no leakage exists, due to the random nature of heap allocations.
Memory management units (MMU) solve fragmentation. By itself, an MMU doesn’t change the picture at all. The fragmentation occurs in program memory, which the MMU maps into physical memory and paging storage. Only the addition of paging storage allows the MMU to help with fragmentation, and then only by increasing the apparent memory. Paging costs memory and causes page faults when it is available, and is useless in smaller systems, which is why smaller systems often don’t have an MMU.
Garbage collection solves fragmentation. It helps, but it doesn’t eliminate fragmentation because it is time dependent. Few of the smaller IoT nodes are programmed in languages that support garbage collection because of the substantial overhead and uncertainty it brings.
A tested system won’t suffer from fragmentation. No amount of testing can eliminate the possibility of a fragmentation failure since fragmentation is essentially a random process. Running for a week or month does not ensure that a system will run for a year.
Because there is no general solution to heap fragmentation, we must have a toolbox of techniques. Careful memory design can reliably eliminate the effects of heap fragmentation, but it is usually applied only when the problem is fully understood.
The impact of heap fragmentation
Applications die. After an application is purged of all programming errors, the most likely source of failure is heap fragmentation. Under exceptional, untested circumstances, a related problem of stack-overrun might be considered a programming error or a configuration error.
When an application dies, it fails to accomplish its purpose for a time. If the application restarts, it might resume performing its function, but the interruption can affect the system success as a whole. If the application doesn’t restart, the larger system will likely be degraded. These effects on larger system success mean that the random effects of memory fragmentation failures can affect system reliability. If one node fails because of a confluence of events, it is more likely that other nodes will face the same events and also fail, further impacting system behavior.
When a malloc failure occurs, it is not enough to test for the failure. The tests must lead to reliable reporting (so an external process can manage the application/node), deterministic continuation (for example, dropping a transaction), or an immediate restart (so a subsequent transaction can succeed).
Reliable reporting is a useful system response to a failure in any case and has the advantage that the process managing the issue is not itself compromised by the source of the failure. It has the additional advantage that the failure occurrence will be recognized by the larger system, and that the system can accommodate and keep track of the failure. The downside is that the steps to recover from the failure are more complex and are subject to failure themselves. In any case, the reporting mechanism must be carefully constructed to not be subject to allocation failures, since the state of the heap is unknown while the reporting is done.
Reliable continuation depends on careful design. When a malloc failure occurs, related memory allocations must be handled/freed appropriately to keep the problem from getting worse. Unwinding a transaction can be tricky, but the worst part is that the application has no way of knowing whether the next transaction will succeed. If one transaction fails due to a malloc failure, will the next one also fail? If so, extraordinary measures (like a restart) might be needed. Unfortunately, the application can’t know if the next transaction will fail (halting problem?), so in general the only solution is to restart.
Immediate restart is a certain response to a malloc failure. It ensures that the application/node will be available after a delay. It has downsides, though. First, the application/node is unavailable for a period of time. During that time, it isn’t performing its function, and the larger system must behave reasonably both when the node doesn’t respond and after the node resumes responding. Second, restarting often loses state. Perhaps an application is returning a smoothed parameter based on recent sensor values, so the parameter might appear to jump without smoothing. Third, some processes must recognize the need to restart and make sure the restart happens properly. This process must be highly reliable, probably should not depend on memory allocations, and usually won’t share code with the application. Fourth, the decision to restart must be exceptionally well defined. Some applications will run as processes on top of an OS, allowing the OS or another application to make the decision and mediate it. Applications on some RTOSes are so integrated with the OS that the decision must be made by the application itself, and the entire node must be restarted to restart the application.
Responses to failures might be a combination of these three approaches, but keep in mind that simplicity has merit.
One of the biggest weaknesses of these recovery mechanisms might be not knowing if a node has executed one of these approaches. Application states in a distributed system need to be treated as first class deliverables, but application state is often deferred or ignored. One of the reasons knowing state is often ignored is believing that once all the errors are wrung out of the application, there is no need for the overhead. Perhaps considering how hard it is to eliminate fragmentation failures will encourage better state transparency.
Solutions to heap fragmentation
Careful design can eliminate heap fragmentation and the ensuing malloc failures. Of course, this doesn’t mean an application will never run out of memory, but it means that running out of memory is recognized and handled reliably.
The most straightforward way to eliminate heap fragmentation is to avoid having and using a heap. Recognizing this, every RTOS provides memory allocation methods that avoid the use of heaps. Even without a heap, the dynamic reuse of memory required by many applications remains fraught with peril.
Writing a nontrivial application without using malloc/free is well understood in the embedded software community, but is something of a lost art among the majority of application developers, most of whom have never seen a
malloc() call fail. Virtual memory, indefinitely growing paging files and garbage collection have hidden the realities of heaps from most of us.
The rest of this paper will survey the design changes needed to reduce and even eliminate fragmentation. A companion paper will provide concrete examples of the changes.
The most important change needed to deal with heap fragmentation is architectural. The architecture is mostly ad hoc in many applications. As each object or buffer is needed, it is allocated and may be attached to other object via pointers. This leads to a proliferation of objects and memory allocation calls to create them. It also provides great flexibility, since objects can appear, change size, and disappear from the design and its execution, seemingly independent of other objects.
The requisite architectural change is to recognize the inherent similarities among allocations and combine them. For example, if an application is responding to a series of similar transactions, each transaction might be represented by one or two complex structures that support all the needs of a single transaction (rather than a collection of smaller allocations). Doing this requires more analysis of what a transaction is than an ad hoc architecture.
Before getting into specifics, here are some general principles to guide the architecture and design of applications that avoid heap fragmentation:
Every use of memory must be considered for its lifetime and use by other data structures.
Predictability is more important that efficiency. Many developers are trained to accept that efficient use of resources underlies good software. We must break this pattern to use memory effectively.
malloc()call should be justified in terms of the functionality it provides and the available alternative approaches, with allocation lifetime explicitly identified.
Use static, stack, and permanent heap allocations where possible.
Keep in mind that heap fragmentation is a side effect of
The use of
free() is useful in rapid prototyping since it eliminates the early need to pin down where and how large allocations need to be. Applications using them might never need to get beyond the working prototype stage. The following tactics are for production software that must run continuously for a long time. It is reasonable to prototype with malloc/free, and in a later development stage apply more careful memory management.
Keep in mind that several string functions call
realloc() functions call
malloc(), and some other library functions also call
Start out assuming reentrancy of some sort, which requires allocations for widely used local variables. Of course, if you are certain that no cause for reentrancy will ever exist, then the declaration of parameterized static variables eliminates their heap usage and resulting fragmentation. But starting out with static allocations can lead to unfortunate results if reentrancy is needed later and synchronization isn’t supplied. Effective use of dynamic allocations for reentrancy imposes no performance disadvantages, and following the other tactics in this section can minimize or eliminate heap fragmentation. This tactic is not directly related to heap fragmentation, but this type of careful design can simplify memory issues later as the design is maintained.
Parameterize strings and other buffers
One of the nice things about mallocing strings and buffers is that you don’t have to consider limits to their sizes. Ultimately, size matters, and defining maximum sizes should be considered an important activity in careful design. If you know the maximum size of a buffer, you have a lot more design flexibility.
The size of a buffer/string might be part of a specification you are working with, or it may need to be determined by analysis, and in some cases it may be chosen arbitrarily. All of these are valid as long as the source of the size is documented. In many cases, the sizes can be spelled out in a configuration file used to build the application, allowing adjustments based on experience or a specific usage.
Once a string or buffer has a maximum size, fixed size buffers can be used to hold it. This leads to predictable application behavior in several ways (some of which are beyond the scope of this paper). Refer to principle 2 above. Also, many developers aren’t aware that allocating from the heap incurs substantial memory overhead (for example, 16 bytes on 32-bit architecture, plus alignment bytes), so allocating small buffers (< 20 bytes) often consumes more heap than using a fixed size definition in another dynamic structure.
Combine related structures
If you have several structures that share a lifetime, they can often be combined to form a larger structure. This is common in transaction processing, where all aspects of the transaction appear when the transaction starts and go away when the transaction completes. Creating a single structure that contains all variables required for the transaction can replace multiple structures that must be allocated separately and made to point to each other.
The multiple structures might not all be required at all times, but principle 2 suggests that you can reasonably allocate them as one all the time anyway. The possibility that some allocated memory will go unused is less important than the predictability that joint allocation provides.
Parameterized strings and buffers are structures that can be combined. If a transaction needs one or more strings, you should include string buffers in the single transaction structure, each with its full parameterized length, rather than allocating them from the heap and putting pointers in the common structure.
The combined structure needn’t be flat. It can be a composition of structures that support the various parts of the transaction and can thus provide the same collection of structures that might have been allocated separately.
Consider stack buffers
Sometimes you need a temporary buffer to do something like combine multiple strings in two functions that are called in proximity. Rather than allocating a heap buffer and passing it to both functions, declare a stack buffer and pass it to the two functions. This might seem obvious when described like this, but sometimes people don’t think about passing stack buffers to called functions.
A buffer can be allocated on the stack as long it is used only in subsequent calls. This puts a burden on the stack length allocated by the OS, but it’s likely that space won’t need to be available in the heap. The buffer space is shared between transactions using that stack frame and is freed automatically, and no burden is placed on the heap.
Once the total number of heap allocations is reduced using the tactics above, only a handful of allocations might be necessary for even complex processes. After determining how many of each buffer size is needed, you can allocate all the buffers you will ever need at process startup. For instance, when a transaction arrives, the code can grab an unused, preallocated buffer and dedicate it to handling that transaction. If another transaction arrives, the code can grab another unused, preallocated buffer without interfering with the first transaction. When each transaction completes, it will make its buffer available for subsequent transactions. The application can still run out of transaction buffers, but the shortage is easier to notice and deal with.
The preallocated buffers will likely contain space that is often unused (for example, maximum string length), and one might regret that the space is apparently wasted. Noting how many of the structures are preallocated might emphasize that regret. This regret should be tempered by considering how much simpler and more predictable the heap usage is.
Other advantages of preallocating the buffers include faster access to the buffer when it is needed (
malloc() and free() are often quite expensive), single thread access to a preallocated buffer doesn’t require synchronization, and you can better tell the transaction state by looking at the allocated and available transaction buffers.
Use generalized memory pools
The use of preallocated buffers was described assuming an ad hoc mechanism. However, a formal collection of memory pools, each of a different size (and maybe purpose), can add to memory allocation capability. A memory pool supplies buffers of a certain size to a caller, and a common mechanism can provide multiple memory pools to an application. This method is known as slab allocation  in OS kernels.
Each memory pool might provide a size appropriate for a specific usage (for example, the 768 byte buffer required to handle a transaction). In that case, the memory pool request to a common mechanism would indicate what type of memory it wants. Or a memory pool mechanism might keep a range of buffer sizes (power of 2?), and pick the one that’s big enough for a given size request. The former requires more problem analysis and offers more specificity, while the latter allows more general usage.
Some RTOSes offer memory pools as a service. The offered service can be shared by multiple uses, and it might or might not be adequate for a specific usage. It is also easy enough to write a memory pool mechanism to support a specific application.
The tactics surveyed here point to a class of solutions for heap fragmentation. All of them require different thinking about memory usage. They require thinking about memory (heap) usage as a fundamental design concept rather than relying on malloc/free as an implementation detail. The bottom line is that you should avoid using malloc(), free() and heap allocation to carefully craft dynamic memory usage in constrained computing nodes.
These tactics also provide advantages beyond mitigating heap fragmentation. For example, drastically limiting the number of
free() calls greatly simplifies protecting against memory leaks. It also eliminates the sometimes considerable CPU time associated with making many
A case study on applying these principles and tactics to IoT software is offered in IoT Memory Management: A Case Study .
Any long-running application can have memory allocation failures. The importance of considering this as a fundamental design issue grows with the use of highly constrained environments and longer intended lifetimes in the Internet of Things. The constrained resources in an IoT device and the networks in which many IoT devices live might make responding to problems difficult or impossible. Making things worse is a relative shortage of software developers who understand the imperatives of embedded programming.
This paper attempts to clearly delineate the constrained RAM issue, make the consequences of the issue unequivocal, and show how a change in development practices can alleviate or eliminate the issue.
I appreciate the feedback and encouragement I received from Dirk Brandewie, Thiago Macieira, and Bill Randle, all from the Intel Open Source Technology Center.