Creating a swap file – or how to deal with a temporary memory shortage

Creating a swap space file

One of the milestones in the development of operating systems was memory virtualization. It allows processes in the system to "see" memory as a continuous area that is available to them. A natural extension of the ability to virtualize memory is to hold memory pages not only in working memory (RAM), but also to dump and load them from block devices (such as disks). In today's article we will look at creating a swap file that will reside on one of our system's disks.

One of the problems developers may encounter when building " heavyweight" programs, such as .NET or database servers, is running out of memory. The following log then appears in the kernel ring buffer log in the format:

[TIME] Out of memory: Killed process PID (PROCESS_NAME) # further information on memory
[TIME] oom_reaper: reaped process PID (PROCESS_NAME), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

A sample log looks like this:

[227711.146660] Out of memory: Killed process 842395 (cc1plus) total-vm:3291904kB, anon-rss:2984768kB, file-rss:0kB, shmem-rss:0kB, UID:1001 pgtables:768kB oom_score_adj:0
[227711.168569] oom_reaper: reaped process 842395 (cc1plus), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

This phenomenon is generally associated with build parallelization, when, for example there is 32GiB of ram for 16 logical CPU cores, which may prove to be an insufficient amount of operating memory. In addition, when buying a more powerful machine it is easier to be tempted to increase the volume of tasks it can perform. This can be done, for example, by setting more execution units, increasing the maximum load or some other value in solutions that perform automatic project building. Nowadays, swap space is rarely used, and the old rules about its size have been pushed out in favor of abandoning it altogether. In this article, however, we assume that the system we are running on does not have swap memory in a separate partition or has too little of it.

In such specific cases, a swap file, which we can use temporarily or mount permanently, can help. The process is described below.

Creating a memory swap area in a file

Creating a swap file involves the following:

  1. Creating a file filled with zeros.
  2. Giving the file secure permissions.
  3. Using the mkswap command to create appropriate manageable structures for the kernel.
  4. Adding the temporary swap file to active/used swap spaces.
  5. Mounting the swap file permanently, if needed, or removing it It when it is no longer needed.

Despite seemances, the first point is not so obvious. In the example man 8 mkswap the fallocate command was used to create the swap file. Its syntax is friendlier than the syntax of the ancient dd program. However, it has one major drawback – it may not work properly on file systems ext4, xfs or btrfs - that is, the most popular ones 🙂 This is due to mechanisms such as preallocation and CoW (Copy-on-Write). So you need to force the file system to actually allocate, as the Linux kernel will use direct file access, instead of using the abstraction offered by VFS (Virtual File System).

# Creating a 1GiB file myswapfile
sudo dd if=/dev/zero of=/myswapfile bs=1M count=1024

The swap file should only be able to be read and written by root user. Its reading by other users may result in the leakage of data that has been written to the swap space. Therefore, the command should be performed on it:

sudo chown 600 /swapfile

The third step is to create the appropriate structure used by the kernel to manage the file or swap partition. You can compare the file before and after use.

[[email protected] ~]$ xxd -l 100 /swapfile 
0000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
(...)

Those interested can see that there is no other information in it from zero:

[[email protected] ~]$ xxd  /swapfile | fgrep -v '................'

Then execute the sudo mkswap /swapfile command and filter its contents using fgrep (or grep -F).

[[email protected] ~]$ sudo mkswap  /swapfile
[[email protected] ~]$ xxd  /swapfile | fgrep -v '................'
0000400: 0100 0000 ffff 0300 0000 0000 37c4 1595  ............7...
0000410: cb34 4a42 8b23 1c82 4a8e 10bc 0000 0000  .4JB.#..J.......
0000ff0: 0000 0000 0000 5357 4150 5350 4143 4532  ......SWAPSPACE2

The structure created for the kernel is not impressive. However, there is a curiosity here. The mkswap does not overwrite the first block of data, as there may be information such as the bootloader (in older computers, booting was multi-stage and used the MBR [Master Boot Record]) or the partition table. From there, it is only after the offset (8192 bits -> 1024 bytes -> 1KiB [000400 is 1024 in hexadecimal]) that the data appears. This is, incidentally, consistent with the swap_header union found in the Linux source code. A union is a structure in C that, depending on the context, can be interpreted differently i.e. contain different data.

// https://github.com/torvalds/linux/blob/master/include/linux/swap.h
union swap_header {
	struct {
		char reserved[PAGE_SIZE - 10];
		char magic[10];			/* SWAP-SPACE or SWAPSPACE2 */
	} magic;
	struct {
		char		bootbits[1024];	/* Space for disklabel etc. */
		__u32		version;
		__u32		last_page;
		__u32		nr_badpages;
		unsigned char	sws_uuid[16];
		unsigned char	sws_volume[16];
		__u32		padding[117];
		__u32		badpages[1];
	} info;
};

The next step is to use the swapon command. After checking permissions it makes a system call with the same name. Here comes another curiosity - it is assumed that Linux can have 32 swap spaces. However, according to man 2 swapon:

There is an upper limit on the number of swap files that may be used, defined by the kernel constant MAX_SWAPFILES. Before kernel 2.4.10, MAX_SWAPFILES has the value 8; since kernel 2.4.10, it has the value 32. Since kernel 2.6.18, the limit is decreased by 2 (thus: 30) if the kernel is built with the CONFIG_MIGRATION option (which reserves two swap table entries for the page migration features of mbind(2) and migrate_pages(2)). Since kernel 2.6.32, the limit is further decreased by 1 if the kernel is built with the CONFIG_MEMORY_FAILURE option.

In fact, the documentation contains an error. This is because it does not take into account the kernel compilation option CONFIG_DEVICE_PRIVATE, which for kernels from 4.14 to 5.13 subtracts 2, and for kernels from 5.14 onward - it subtracts 4 possible swap spaces. This means that for Enterprise Linux kernels it will be 27, or 25 if using a newer kernel (such as kernel-ml). This observation, along with a patch, was reported by us to the Linux kernel man file mailing list: https://marc.info/?l=linux-man&m=164245800929084&w=2. After rewriting, it was accepted, so that the latest version of the kernel documentation includes information on the effect of CONFIG_DEVICE_PRIVATE on the maximum number of swap files/partitions.

To check how many files or swap partitions can be mounted on the system, you can use the following script:

Back to the main topic – activating the swap file is simple and only requires the command:

sudo swapon /path/to/file

When we are done with the swap file, we can exclude it from the kernel swap space and delete it:

sudo swapoff /path/to/file
sudo rm /path/to/file

or type it into the file system table /etc/fstab so that it is always used:

echo "/path/to/file    none    swap    defaults    0    0" | sudo tee -a /etc/fstab

Final script

# swap file location
SWAP_FILE=/swapfile
# swap size in MiB
SWAP_FILE_SIZE=4096
sudo dd if=/dev/zero of=$SWAP_FILE bs=1M count=$SWAP_FILE_SIZE
sudo chmod 600 $SWAP_FILE
sudo mkswap $SWAP_FILE
sudo swapon $SWAP_FILE
# Uncomment line below to add swapfile to fstab
# echo "$SWAP_FILE    none    swap    defaults    0    0" | sudo tee -a /etc/fstab

Summary

Although the title of the article might suggest a rather low-level topic to Linux administrators, we have gone strongly into its depths. From explaining the reasons for using the dd command instead of fallocate, to seeing what the swap structure looks like on disk, looking into the system kernel sources, verifying the offset (offset) used, to finding an error in the documentation and writing a patch for the Linux kernel man files.