Understanding Linux cp Command and Implementing in Python

As a regular Linux user, I’ve been intrigued by the simplicity of the UNIX philosophy, which states, “Have one tool, and have that tool do its job well.” In this post, I will dive into how the cp command works in Linux, followed by a basic implementation of the command in Python for a deeper understanding.

How `cp` Command Works?

When you type cp into your terminal, you’re invoking a program that’s part of the GNU Core Utilities, equipped to handle a variety of file system operations.

Here’s the typical workflow of the cp command:

Argument Parsing: The command-line arguments, including any options (like -r for recursive copy or -i for interactive prompts), are parsed.
Path Resolution: The command resolves the absolute or relative paths provided for the source and destination.
Permission Checks: Before proceeding, cp checks if you have the read permission for the source file and the write permission for the destination directory.
File Opening: Utilizing the open() system call, cp opens the source file for reading. If the destination file doesn’t exist, it’s created using the creat() system call; otherwise, it’s opened with open().
Data Transfer: Through a loop that uses read() and write() system calls, data is transferred from the source to the destination file in chunks. This way, cp efficiently handles files of any size without consuming excessive memory.
Metadata Copying: The command duplicates the source file’s metadata, such as permissions and timestamps, to the new file using system calls like fchmod() and futimens().
Resource Cleanup: After the copy operation, cp closes both file descriptors using the close() system call.

Now, let’s dive deep into the system level to understand how system calls are utilized at a lower level when using the cp command.

System Calls: Diving Deeper into the internals of `cp`.

System calls provide the interface between a running process and the operating system. Here’s a closer look at the ones cp uses:

`open()`

The open() system call is used by cp to obtain a file descriptor for the source and destination files. This system call takes a pathname and flags as arguments, determining how the file should be accessed. When copying a file, cp opens the source file in read-only mode (O_RDONLY) to ensure the file is not modified. If the destination file does not exist, cp uses open() with the O_CREAT flag to create it, and with O_WRONLY to open it for writing.

`read()`

After opening the source file, cp uses the read() system call to read data from the file into a buffer. This buffer temporarily stores the data as it’s being copied. The read() function takes three arguments: the file descriptor, the buffer into which the data is read, and the number of bytes to read.

`write()`

write() is the system call used to transfer data from the buffer to the destination file. It takes a file descriptor for the destination file, the buffer with the data, and the number of bytes to write from the buffer. cp will repeatedly read from the source and write to the destination until all data is copied.

`close()`

Once the copy operation is complete, cp needs to release the file descriptors so they can be reused by the system. The close() system call is used for this purpose, closing both the source and destination file descriptors.

`fchmod()`

Copying a file also involves duplicating its permissions. The fchmod() system call is used by cp to set the permissions of the destination file to match those of the source file. It requires the file descriptor of the open file and the mode (permission settings) to be applied.

`futimens()`

The futimens() system call allows cp to preserve the timestamps of the source file, setting the access and modification times of the destination file to match. It takes a file descriptor and an array of timespec structures representing the new times.

`creat()`

creat() is worth mentioning as it’s often used as a shorthand for open() with flags set to create a new file or rewrite an old one. It’s equivalent to open() with O_CREAT | O_WRONLY | O_TRUNC flags.

These system calls are the building blocks that allow cp to function, orchestrating the process of duplicating file content, permissions, and timestamps from one location to another within linux.

Replicating `cp` in Python:

import os
import sys


def copy(src: str, dest: str) -> None:
    """
    Copy the contents of the source file to the destination file or directory.
    If dest is a directory, the file will be copied into it with the same filename.

    Args:
        src (str): The path to the source file.
        dest (str): The path to the destination file or directory.

    Raises:
        FileNotFoundError: If the source file does not exist or is not a file.
        IsADirectoryError: If the destination is a directory but the filename is not provided.
    """
    # Ensure the source file exists and is a file
    if not os.path.isfile(src):
        raise FileNotFoundError(f"Source file {src} not found or is not a file.")

    # Check if destination is a directory
    if os.path.isdir(dest):
        # Construct the full file path by joining the directory with the basename of the source file
        dest = os.path.join(dest, os.path.basename(src))

    # Open the source file in binary read mode and destination file in binary write mode
    with open(src, "rb") as fsrc, open(dest, "wb") as fdst:
        # Read the source file content in chunks
        while True:
            chunk = fsrc.read(1024)  # Read in 1KB chunks
            if not chunk:
                break
            fdst.write(chunk)


# Command-line interface
if __name__ == "__main__":
    if len(sys.argv) < 3:
        print("Usage: python script.py source_file destination_path")
        sys.exit(1)

    source_path = sys.argv[1]
    destination_path = sys.argv[2]

    try:
        copy(source_path, destination_path)
        print(f"Copied {source_path} to {destination_path}")
    except Exception as e:
        print(f"Error: {e}")
        sys.exit(1)

The script reads the source file in manageable chunks (1KB in this case) and writes these chunks to the destination file, ensuring that the content is preserved during the copy process.

If the destination path is a directory, the script appends the name of the source file to the destination path to maintain the original filename in the new location. This behavior mimics the standard cp command in Linux when the target is a directory.

Conclusion

The cp command is a great example of the Unix philosophy: simple tools that do one thing well. By understanding the system calls it leverages, we gain insight into the operating system’s inner workings. Moreover, by implementing its functionality in Python, we can appreciate the power and simplicity provided by high-level programming languages.

While our Python script does not cover all features of cp, such as recursive copying or interaction with the user, it serves as an understanding of the cp command.

Finally, I hope you enjoyed reading this and had the opportunity to learn something new. If you have any feedback, please feel free to leave a comment below. If you prefer not to comment publicly, you can always send me an email.

If you’re interested in advancing your programming skills and would love to learn how to build cool projects like Docker, BitTorrent, or even understand the internals of your favorite tools such as Git by recreating them in your preferred programming language, I highly recommend you join Code Crafters.

If you loved this post, you can always support my work by buying me a coffee. your support would mean the world to me! Also, if you end up sharing this on X, definitely tag me @muhammad_o7. Also follow me on LinkedIn

Note: If you like to be notified about the upcoming posts you can subscribe to the RSS or you can leave your email here

About the Author

Muhammad Raza is a Senior DevOps Engineer and former AWS Professional Services Consultant with 5 years of experience in cloud infrastructure, CI/CD automation, and DevOps solutions. He has helped numerous clients optimize AWS costs, implement Infrastructure as Code, and build reliable deployment pipelines.

Need help with your DevOps workflows? I'm available for consulting on CI/CD pipelines, infrastructure automation, and AWS architecture. Book a free 30-min call or email me.

Connect on LinkedIn Follow on X/Twitter GitHub

How cp Command Works?

Here’s the typical workflow of the cp command:

System Calls: Diving Deeper into the internals of cp.

open()

read()

write()

close()

fchmod()

futimens()

creat()

Replicating cp in Python: