Creating a Simple Pastebin Service in Python and Flask

Fri, 05 Jul 2024 00:00:00 +0000

In this blog post, we will be building a simple Pastebin service using Python and Flask. Pastebin is a popular web application used to store plain text or code snippets for a certain period of time. We’ll create a basic version that allows users to paste text, select the programming language, and get a URL to share the paste. I have also created a YouTube video about this, which you can view here.

Getting Starting

Before begin creating our application lets setup our environment and in order to setup your environment follow these steps:

First, Let’s create a virtual environment in the project directory.

python -m venv venv

Now, once we have created the virtual environment, let’s activate it and install all the required libraries that are going to be used by this project.

pip install Flask shortuuid pygments

We’ll also use shortuuid for generating unique IDs for each paste and pygments for syntax highlighting.

Now that we have installed all the required libraries, let’s create the necessary files and folders.

mkdir -p pastes templates static && touch index.py templates/index.html static/styles.css

This is how your folder structure should look:

pastebin/
│
├── app.py
├── pastes/
├── templates/
│   └── index.html
└── static/
    └── styles.css

The pastes directory will store the text files for each paste. The templates directory contains our HTML templates, and the static directory contains CSS for styling.

Now that we have set up the environment, it’s time to code.

Writing Code

Let’s dive into the code. Create a file named index.py and add the following code:

Once you have created the flask now let’s create html template in templates/index.html and style.css in static/style.css

templates/index.html

static/style.css

Now that we have created our application, before we run it, let’s try to understand how it works by breaking down the code.

Code Breakdown

First, we import the necessary libraries and modules. Flask is our web framework, shortuuid is used for generating unique IDs, and Pygments is for syntax highlighting. We also set up a directory to store our pastes/.

from flask import Flask, request, render_template, abort
import shortuuid
import os
from pygments import highlight
from pygments.lexers import get_lexer_by_name, get_all_lexers
from pygments.formatters import HtmlFormatter

app = Flask(__name__)

PASTE_DIR = 'pastes'
if not os.path.exists(PASTE_DIR):
    os.makedirs(PASTE_DIR)

Then we write a function that retrieves all available programming languages supported by Pygments for syntax highlighting and returns them as a sorted list of tuples.

def get_language_options():
    return sorted([(lexer[1][0], lexer[0]) for lexer in get_all_lexers() if lexer[1]])

Then we write the main route for our application. If the request method is POST (i.e., when the user submits a form), it saves the content and language to a new file with a unique ID. The URL for the new paste is generated and displayed to the user. If the request method is GET, it simply renders the form.

@app.route('/', methods=['GET', 'POST'])
def index():
    if request.method == 'POST':
        content = request.form['content']
        language = request.form['language']
        paste_id = shortuuid.uuid()
        file_path = os.path.join(PASTE_DIR, paste_id)

        with open(file_path, 'w') as f:
            f.write(f"{language}\n{content}")

        paste_url = request.url_root + paste_id
        return render_template('index.html', paste_url=paste_url, languages=get_language_options())

    return render_template('index.html', languages=get_language_options())

This route handles viewing a specific paste. It reads the paste file, applies syntax highlighting using pygments, and renders the highlighted content.

@app.route('/<paste_id>')
def view_paste(paste_id):
    file_path = os.path.join(PASTE_DIR, paste_id)
    if not os.path.exists(file_path):
        abort(404)

    with open(file_path, 'r') as f:
        language = f.readline().strip()
        content = f.read()

    lexer = get_lexer_by_name(language, stripall=True)
    formatter = HtmlFormatter(linenos=True, cssclass="source")
    highlighted_content = highlight(content, lexer, formatter)
    highlight_css = formatter.get_style_defs('.source')

    return render_template('index.html', paste_content=highlighted_content, highlight_css=highlight_css)

Now once we understand how everything works, now you can simply run the application using this command python index.py

Conclusion

You’ve built a simple Pastebin service using Python and Flask! This service allows users to paste text, select a programming language, and share the paste via a unique URL. You can expand this project by adding features like expiration times for pastes, user authentication, or even a database to store pastes more efficiently.

If you have any feedback, please feel free to leave a comment below. If you prefer not to comment publicly, you can always send me an email.

Announcements

- I started a youtube channel a year ago, and if you want me to create video series about interesting stuff like this feel free to subscribe.
I am open for Python Consulting as well, If you any interesting python project or even need advice regarding tech, you can always send an email
Lastly A huge thanks for reading this and supporting my work.

If you loved this post, you can always support my work by buying me a coffee. your support would mean the world to me! Also, if you end up sharing this on X, definitely tag me @muhammad_o7. Also follow me on LinkedIn

Note: If you like to be notified about the upcoming posts you can subscribe to the RSS or you can leave your email here

Understanding HTTP Server by implementing in Python

Sun, 09 Jun 2024 00:00:00 +0000

I have been programming professionally for about three years now, and I have been using Python for about four years. I started learning Python back in my sophomore year of college, and since then, I have built web applications, performed data analysis, and written automation scripts, all using Python. As a curious mind, I have always wanted to learn how things work behind the scenes and potentially write about them to help others understand as well.

Anyways, In this blog post, I will be exploring how http servers work behind the scenes and then potentially building a simple http server in python.

An HTTP server is an important part of web architecture that processes requests from clients, typically web browsers, and delivers the requested resources back to them. This server facilitates the communication between the client and the server, ensuring that web pages, images, and other web resources are accessible over the internet.

Now understanding how an HTTP server works can beneficial for several reasons. First and foremost, It forms the backbone of web development, helping you create efficient, secure, and reliable web applications. By understanding the basics, it can help you learn more complex web development frameworks and technologies.

Understanding the basics of an HTTP Server.

To simply understand how an HTTP server works, consider its following functionalities:

Handling Requests

The primary function of an HTTP server is to handle incoming HTTP requests from clients. Here’s a breakdown of this process:

Listening for Requests: The server constantly listens on a specific port (commonly port 80 for HTTP and port 443 for HTTPS) for incoming requests.
Receiving Requests: When a client sends a request, the server receives it and parses the request, headers, and body. The request specifies the HTTP method (e.g., GET, POST), and the requested resource (e.g., /index.html).
Interpreting the Request: The server interprets the request to determine what resource the client is asking for. This involves understanding the URL, the method used, and any parameters or data included in the request.
Processing Requests

Once the server has received and interpreted the request, it processes it accordingly:

Routing: The server determines which resource or endpoint should handle the request. For static content like HTML, CSS, or images, this may involve simply retrieving a file from the server’s file system. For dynamic content, this may involve invoking server-side scripts or applications (e.g., PHP, Python, Node.js) to generate the appropriate response.
Executing Server-Side Logic: For dynamic requests, the server may need to run server-side code to generate the response. This can include querying a database, performing calculations, or interacting with other web services.
Handling Security: The server often needs to handle security-related tasks, such as authentication and authorization, ensuring that only authorized users can access certain resources.
Sending Responses

After processing the request, the server sends an HTTP response back to the client. This response consists of several parts:

Status Line: The status line includes the HTTP version, a status code (e.g., 200 OK, 404 Not Found), and a status message. The status code indicates the result of the request, such as whether it was successful, if the resource was not found, or if there was a server error.
Headers: The response headers provide additional information about the response. Common headers include Content-Type (indicating the type of content, such as text/html or application/json), Content-Length (indicating the size of the response body), and Server (providing information about the server software).
Body: The body of the response contains the actual content being sent to the client. This can be an HTML document, an image, a JSON object, or any other type of web content.

Example Response

HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 123
Server: Apache/2.4.1 (Unix)

<html>
<head>
   <title>Example</title>
</head>
<body>
   <h1>Hello, World!</h1>
</body>
</html>

Now that we have a solid understanding of how an HTTP server works, let’s implement a simple HTTP server in python.

Implementation in Python

Here’s an implementation of our simple http server in python:

import socket

# Define server address and port
HOST, PORT = "", 8000

server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_socket.bind((HOST, PORT))
server_socket.listen(5)
print(f"Listening on port {PORT}")

while True:
    client_connection, client_address = server_socket.accept()
    request_data = client_connection.recv(1024)
    print(request_data.decode("utf-8"))
    response_body = """
    <html>
    <head><title>Hello</title></head>
    <body><h1>Hello, World!</h1></body>
    </html>
    """
    http_response = f"""
    HTTP/1.1 200 OK
    Content-Type: text/html; charset=UTF-8
    Content-Length: {len(response_body)}

    {response_body}
    """
    client_connection.sendall(http_response.encode("utf-8"))
    client_connection.close()

Lets breakdown the code.

import socket

This line imports the socket module, which provides low-level networking interfaces for creating and managing network connections and this comes in python standard library.

Define Server Address and Port

# Define server address and port

HOST, PORT = "", 8000

HOST is set to an empty string, meaning the server will accept connections on all available network interfaces.
PORT is set to 8000, specifying the port on which the server will listen for incoming connections.
Creating Socket Object

server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

This line creates a new socket object.
socket.AF_INET specifies the address family for IPv4.
socket.SOCK_STREAM specifies the socket type for TCP, which is a connection-oriented protocol.
Bind the Socket to the Address and Port

server_socket.bind((HOST, PORT))

The bind method associates the socket with the specified network interface and port as this prepares the socket to accept connections on this address and port.
Listen for Incoming Connections

server_socket.listen(5)
print(f"Listening on port {PORT}")

The listen method enables the server to accept incoming connections.
The parameter 5 specifies the maximum number of queued connections.
Understand Main Loop

while True:
    client_connection, client_address = server_socket.accept()
    request_data = client_connection.recv(1024)
    print(request_data.decode("utf-8"))
    response_body = """
    <html>
    <head><title>Hello</title></head>
    <body><h1>Hello, World!</h1></body>
    </html>
    """
    http_response = f"""
    HTTP/1.1 200 OK
    Content-Type: text/html; charset=UTF-8
    Content-Length: {len(response_body)}

    {response_body}
    """
    client_connection.sendall(http_response.encode("utf-8"))
    client_connection.close()

Now in the main loop, while true starts an infinite loop allowing the server to handle incoming connections. client_connection, client_address = server_socket.accept() accepts a connection from a client and returns a new socket client_connection to communicate with the client and the client’s address.

request_data receives data from the client, reading up to 1024 bytes. Last we define HTTP response body and header. Then send back to the client client_connection.sendall(http_response.encode("utf-8")) encoding the response from string to bytes. Lastly we close the connection. client_connection.close()

Conclusion

Finally, I hope you enjoyed reading this and had the opportunity to learn about how an HTTP server works. If you have any feedback, please feel free to leave a comment below. If you prefer not to comment publicly, you can always send me an email.

Furthermore, you can improve this further, for example you can add more functionality when a request is made and a server looks for html file and sends back in response.

If loved reading this blog post and would love to learn more stuff like this I highly recommend you join Code Crafters. They have amazing new challenges such as helping you learn how to build docker, how to build your own shell and much more.

Announcements

I started a youtube channel a year ago, and if you want me to create video series about interesting stuff like this feel free to subscribe and let me know your opinion in this anonymous survey
I am also open for Python Consulting as well, If you have a python project or even need advice regarding tech, you can always send an email and we can discuss.
Lastly A huge thanks for reading this and supporting my work.

Note: If you like to be notified about the upcoming posts you can subscribe to the RSS or you can leave your email here

Email Testing with Python's smtpd Module

Tue, 12 Mar 2024 00:00:00 +0000

As a seasoned Python developer, I am planning to start a new blog series where I will be covering different Python command-line modules which come pre-installed with your Python installation. In this blog, we will be looking into the Python smtpd module, which allows you to run your own local SMTP server for email testing.

The smtpd module, short for Simple Mail Transfer Protocol Daemon, allows developers to set up and run their own local SMTP server. This functionality is particularly useful for testing email-related features during development. Rather than relying on external email servers, developers can take advantage of smtpd to simulate email transactions in a local environment.

It’s part of Python’s standard library, making it readily available for use in any Python project without the need for installing additional dependencies. At its core, this module provides a simple and lightweight implementation of an SMTP (Simple Mail Transfer Protocol) server. SMTP is the protocol used for transmitting electronic mail over the internet, and the smtpd module allows developers to create their own custom SMTP servers.

Setting up `smtpd` server.

In order to run smtpd server locally you need to perform the following steps,

Open a Terminal and run the following command

python -m smtpd -n -c DebuggingServer localhost:1025

Breakdown of the command options:

-n: Prevents the server from attempting to verify the existence of the sender’s email address. (since we are testing with random email addresses).
-c DebuggingServer: Specifies the class to be used for the SMTP server, in this case, DebuggingServer as we are testing email functionality.
localhost:1025 : Sets the address and port on which the server will listen. You can choose a different port if needed.

Now this command while run our smtp server locally which we can use to test emails.

Writing a simple `python` script.

Once we have smtpd running, we can write a simple script to test it.

After creating that script you can simply run it. Once you run it you should see the following output on terminal where your smtpd command is running.

---------- MESSAGE FOLLOWS ----------
b'Content-Type: text/plain; charset="us-ascii"'
b'MIME-Version: 1.0'
b'Content-Transfer-Encoding: 7bit'
b'Subject: Test Email'
b'From: testing_email@xyz.com'
b'To: recipient_test@abc.com'
b'X-Peer: ::1'
b''
b'This is a test email'
b' Hello World'
------------ END MESSAGE ------------

Conclusion

Python is an awesome language, and it comes with lots of powerful command-line modules preinstalled. I hope you had a chance to learn something new! In future blog posts I will be covering more of these preinstalled command-line modules. If you have any feedback, please feel free to leave a comment below. If you prefer not to comment publicly, you can always send me an email.

Lastly, I have an exciting announcement about my YouTube channel. I launched this channel last year, but unfortunately, due to some personal reasons, I haven’t been very active on it. Your support means a lot to me, so I would genuinely appreciate it if you could subscribe. Keep an eye out for upcoming content on my YouTube channel – there’s more to come!

If you like to be notified about the upcoming posts you can subscribe to the RSS or you can leave your email here

If you love to learn how to build cool projects like Docker, BitTorrent, or even understand the internals of your favorite tools such as Git, grep and etc by recreating them in your preferred programming language, I highly recommend you join Code Crafters. It’s an amazing platform which helps you by building different projects

One Liners Python Edition

Sun, 19 Nov 2023 00:00:00 +0000

I’ve been immersed in the world of Python programming for approximately three years. During this time, I’ve come to appreciate the elegance and power of this versatile language. In this post, designed for both fun and education, I’ll be presenting a collection of one-liner Python code snippets. Whether you’re a seasoned developer or a beginner, these concise lines of code offer insights into the simplicity and effectiveness of Python, demonstrating how a single line can accomplish what might take several lines in other languages.

Reverse a String:

reversed_string = "Hello World"[::-1]

Check if a Number is Even:

is_even = lambda x: x % 2 == 0

Find the Intersection of Two Lists:

intersection = list(set(list1) & set(list2))

Remove Duplicates from a List:

no_duplicates = list(set(my_list))```

Calculate the Length of a String without Using len():

length = sum(1 for _ in 'Hello World')

Check if a List Contains All Elements of Another List:

contains_all = all(elem in list1 for elem in list2)

Generate a String of Random Characters:

import random; 
random_str = ''.join(random.choices('abcdefghijklmnopqrstuvwxyz', k=10))

Convert a List of Integers to a Single Number:

num = int(''.join(map(str, [1, 2, 3, 4, 5])))

Palindromic Check:

is_palindrome = lambda s: s == s[::-1]

List Flattening:

flatten_list = sum([[1, 2], [3, 4]], [])

Find the Most Frequent Element in a List:

most_frequent = max(set(my_list), key=my_list.count)

Merge Two Dictionaries:

merged_dict = {**dict1, **dict2}

Finally, I hope you enjoyed reading this and had the opportunity to learn something new. If you have any feedback, please feel free to leave a comment below. If you prefer not to comment publicly, you can always send me an email. Also I would love to see your favorite python one liner code snippets.

If you’re interested in advancing your programming skills and would love to learn how to build cool projects like Docker, BitTorrent, or even understand the internals of your favorite tools such as Git, grep and etc by recreating them in your preferred programming language, I highly recommend you join Code Crafters.

Note: If you like to be notified about the upcoming posts you can subscribe to the RSS or you can leave your email here

Understanding Linux cp Command and Implementing in Python

Thu, 09 Nov 2023 00:00:00 +0000

As a regular Linux user, I’ve been intrigued by the simplicity of the UNIX philosophy, which states, “Have one tool, and have that tool do its job well.” In this post, I will dive into how the cp command works in Linux, followed by a basic implementation of the command in Python for a deeper understanding.

How `cp` Command Works?

When you type cp into your terminal, you’re invoking a program that’s part of the GNU Core Utilities, equipped to handle a variety of file system operations.

Here’s the typical workflow of the cp command:

Argument Parsing: The command-line arguments, including any options (like -r for recursive copy or -i for interactive prompts), are parsed.
Path Resolution: The command resolves the absolute or relative paths provided for the source and destination.
Permission Checks: Before proceeding, cp checks if you have the read permission for the source file and the write permission for the destination directory.
File Opening: Utilizing the open() system call, cp opens the source file for reading. If the destination file doesn’t exist, it’s created using the creat() system call; otherwise, it’s opened with open().
Data Transfer: Through a loop that uses read() and write() system calls, data is transferred from the source to the destination file in chunks. This way, cp efficiently handles files of any size without consuming excessive memory.
Metadata Copying: The command duplicates the source file’s metadata, such as permissions and timestamps, to the new file using system calls like fchmod() and futimens().
Resource Cleanup: After the copy operation, cp closes both file descriptors using the close() system call.

Now, let’s dive deep into the system level to understand how system calls are utilized at a lower level when using the cp command.

System Calls: Diving Deeper into the internals of `cp`.

System calls provide the interface between a running process and the operating system. Here’s a closer look at the ones cp uses:

`open()`

The open() system call is used by cp to obtain a file descriptor for the source and destination files. This system call takes a pathname and flags as arguments, determining how the file should be accessed. When copying a file, cp opens the source file in read-only mode (O_RDONLY) to ensure the file is not modified. If the destination file does not exist, cp uses open() with the O_CREAT flag to create it, and with O_WRONLY to open it for writing.

`read()`

After opening the source file, cp uses the read() system call to read data from the file into a buffer. This buffer temporarily stores the data as it’s being copied. The read() function takes three arguments: the file descriptor, the buffer into which the data is read, and the number of bytes to read.

`write()`

write() is the system call used to transfer data from the buffer to the destination file. It takes a file descriptor for the destination file, the buffer with the data, and the number of bytes to write from the buffer. cp will repeatedly read from the source and write to the destination until all data is copied.

`close()`

Once the copy operation is complete, cp needs to release the file descriptors so they can be reused by the system. The close() system call is used for this purpose, closing both the source and destination file descriptors.

`fchmod()`

Copying a file also involves duplicating its permissions. The fchmod() system call is used by cp to set the permissions of the destination file to match those of the source file. It requires the file descriptor of the open file and the mode (permission settings) to be applied.

`futimens()`

The futimens() system call allows cp to preserve the timestamps of the source file, setting the access and modification times of the destination file to match. It takes a file descriptor and an array of timespec structures representing the new times.

`creat()`

creat() is worth mentioning as it’s often used as a shorthand for open() with flags set to create a new file or rewrite an old one. It’s equivalent to open() with O_CREAT | O_WRONLY | O_TRUNC flags.

These system calls are the building blocks that allow cp to function, orchestrating the process of duplicating file content, permissions, and timestamps from one location to another within linux.

Replicating `cp` in Python:

The script reads the source file in manageable chunks (1KB in this case) and writes these chunks to the destination file, ensuring that the content is preserved during the copy process.

If the destination path is a directory, the script appends the name of the source file to the destination path to maintain the original filename in the new location. This behavior mimics the standard cp command in Linux when the target is a directory.

Conclusion

The cp command is a great example of the Unix philosophy: simple tools that do one thing well. By understanding the system calls it leverages, we gain insight into the operating system’s inner workings. Moreover, by implementing its functionality in Python, we can appreciate the power and simplicity provided by high-level programming languages.

While our Python script does not cover all features of cp, such as recursive copying or interaction with the user, it serves as an understanding of the cp command.

If you’re interested in advancing your programming skills and would love to learn how to build cool projects like Docker, BitTorrent, or even understand the internals of your favorite tools such as Git by recreating them in your preferred programming language, I highly recommend you join Code Crafters.

Note: If you like to be notified about the upcoming posts you can subscribe to the RSS or you can leave your email here

My Useful Shell Functions

Tue, 31 Oct 2023 00:00:00 +0000

I have been working extensively in Linux recently and decided to write a post about my useful shell functions, which have significantly enhanced my workflow productivity. In this post, I will share my go-to shell functions that have improved the efficiency of my tasks. As a regular Linux user, I frequently use the command line for various daily operations, such as file creation, directory navigation, file movement, and text editing using vim.

Viewing CSV Files in a Better Format

function view_csv_pretty {
    if [ -z "$1" ]; then
        echo "Usage: view_csv_pretty <file.csv>"
    else
        cat "$1" | column -s, -t | less -F -S -X -K
    fi
}

This bash function comes in pretty handy when viewing csv files directly on the terminal. Here’s the explanation for this one liner.

cat "$1": Reads the content of the specified CSV file.
column -s, -t: Uses the column command to format the content into a table
1. -s,: Specifies that columns are separated by commas in the CSV file.
2. -t: Tells column to create the table output.
less -F -S -X -K:
1. less: Displays the formatted table output in the terminal.
2. -F: Quits if the entire file fits on one screen.
3. -S: Chops long lines to fit within the screen width.
4. -X: Leaves the screen’s contents intact upon exiting less
5. -K: Exits less on Ctrl+C.

Checking Recently Modified Files

This Bash function, recently_modified, proves to be quite handy for my team when keeping track of the latest modifications made to various files on the server.

function recently_modified() {
    recent_file=$(ls -t | head -n1)
    echo "Most recently modified file: $recent_file"
}

Compressing Multiple Files

function compress_files() {
    if [ -z "$1" ]; then
        echo "Usage: compress_files <archive_name.zip> <file1> <file2> ..."
    else
        zip -r "$1" "${@:2}"
    fi
}

Searching text in files

function search_text_in_files() {
    if [ -z "$1" ] || [ -z "$2" ]; then
        echo "Usage: search_text_in_files <directory> <search_term>"
    else
        grep -rnw "$1" -e "$2"
    fi
}

Checking high usage memory processes

function process_with_most_memory() {
    ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%mem | head
}

Listing Open Ports

function list_open_ports() {
    netstat -tuln
}

Listening Ports for specific process

function  find_listening_ports() {
    if [ -z "$1" ]; then
        echo "Usage: find_listening_ports <pid>"
    else
        ss -tulnp | grep "$1"
    fi
}

Finally, I hope you enjoyed reading this and had the opportunity to learn something new from this post. If you have any favorite shell functions that you use in your everyday workflow, I would love to see those in the comments. If you prefer not to comment, you can always send me an email.

If you loved this post, you can always support my work by buying me a coffee. your support would mean the world to me! Also, if you end up sharing this on Twitter, definitely tag me @muhammad_o7.

Note: If you like to be notified about the upcoming posts you can subscribe to the RSS or you can leave your email here

Understanding Python Variables: Namespaces and Variable Scope

Thu, 05 Oct 2023 00:00:00 +0000

I have been using Python extensively throughout my career. I wanted to write this post to provide an understanding of Namespaces and Variable Scope. Like most programming languages, Python offers a structured way to store and access data through variables. However, understanding where and how these variables exist and interact can sometimes be complicated. This post will help you grasp the fundamental concepts related to Python variables: namespaces and variable scope.

What is a Namespace?

In computer programming, and more explicitly in Python, understanding the concept of a namespace is pivotal to managing variable references and ensuring code clarity. At its core, a namespace serves as a fundamental structure, encapsulating and organizing identifiers to avoid potential naming conflicts.

In the simplest terms, a namespace is a container that holds a collection of identifiers. These identifiers can be variable names, function names, class names, and more. Each of these identifiers is associated with specific objects (values) in memory. Think of it as a dictionary where the keys represent variable names (or other identifiers) and the values correspond to the actual objects or references in memory

Unique Naming System: Namespaces ensure that there is no ambiguity in the naming system. For instance, you can have a function named calculate in one namespace and another function with the same name in a different namespace without any conflict.

Lifetime of a Namespace: The existence of a namespace is dependent on the scope of the objects. If the scope of an object ends, the namespace might also get deleted, and thus all the names defined in that namespace will be made unbound.

Types of Namespaces

Python has various namespaces, created and deleted at different times:

Built-in Namespace: Contains Python’s built-in functions and exceptions. Created when the Python interpreter starts up.
Global (Module) Namespace: Specific to a module or script. Created when the module is imported or the script is run.
Enclosing (Function) Namespace: Exists for nested functions. It chains multiple function namespaces from innermost to outermost.
Local Namespace: Created when a function is called. Once the function execution completes, the namespace is discarded.

Variable Scope

Scope defines the region of the code where a variable can be accessed or modified. Python has four primary variable scopes:

Local (L): Inside the current function.
Enclosing (E): Inside enclosing functions.
Global (G): At the top level of the module.
Built-in (B): In the built-in namespace.

These scopes form the LEGB rule, which Python follows when resolving variable names.

Understanding Scope with Examples

x = 10  # global variable

def outer_function():
    y = 5  # enclosing variable
    
    def inner_function():
        z = 3 # local var
        
        print(x, y, z)
    
    inner_function()

outer_function()

When inner_function is called, it accesses:

z from its local scope.
y from the enclosing scope of outer_function.
x from the global scope.

The `global` and `nonlocal` Keywords

To modify global or enclosing variables within a function, Python provides the global and nonlocal keywords:

x = 10

def modify_global():
    global x
    x = 20

def outer_function():
    y = 5
    
    def modify_enclosing():
        nonlocal y
        y = 15
    
    modify_enclosing()
    print(y)

modify_global()
outer_function()
print(x)

This code will output

15
20

The global keyword tells Python we’re referring to the global x, and the nonlocal keyword indicates we’re targeting the y from the enclosing function.

Avoid Variable Shadowing

If a local variable shares the same name as a global variable or a built-in, it shadows the global or built-in variable:

x = 10

def shadow_example():
    x = 5
    print(x)

shadow_example()  # Outputs: 5
print(x)  # Outputs: 10

Shadowing can lead to unexpected behaviors, so it’s recommended to avoid using the same names across different scopes.

Conclusion

Namespaces and variable scope form the bedrock of how Python manages and accesses data. By understanding these concepts, you can write clearer, more predictable code and avoid common pitfalls. Remember the LEGB rule, be cautious of shadowing, and use the global and nonlocal keywords judiciously to maintain clean and efficient code.

Happy Coding!

If you loved this post, you can always support my work by buying me a coffee. your support would mean the world to me! Also, if you end up sharing this on Twitter, definitely tag me @muhammad_o7.

You can now also book 30 min call with me here. I would love to talk to you or if you have any Open Source project you would like me to contribute to.

Note: If you like to be notified about the upcoming posts you can subscribe to the RSS or you can leave your email here

fzf - The Command-Line Fuzzy Finder

Tue, 19 Sep 2023 00:00:00 +0000

I’ve been using the command line extensively at my day job. I utilize various command line tools, enhancing my workflow and boosting my productivity. Therefore, I’m launching a new biweekly series where I’ll cover the tools I use, dedicating each post to a specific tool. In today’s post, we’ll explore fzf - The Command Line Fuzzy Finder, and discuss how it can improve daily your workflow.

What’s `fzf` and why use it?

Before I dive into fzf, I’d like to take a moment to explain what fuzzy matching is and discuss the algorithms used behind the scenes in fuzzy matching.

What’s Fuzzy Matching?

Fuzzy matching is a technique used in computing to find strings that are approximately equal or closely resemble each other. Unlike exact matching, where the aim is to find an exact match or replicate, fuzzy matching identifies matches that may not be perfect but are “close enough” based on a set criteria.

Fuzzy matching is frequently used in data cleaning, where it helps in identifying duplicate records in large databases, even when the entries aren’t exactly the same (e.g., “McDonald’s” vs. “Mc Donalds”).

It’s also useful in search engines and autocorrect features, where slight variations or typos in a search term can still yield the desired results.

How it works? In Simplest Terms

Fuzzy matching algorithms evaluate strings based on various metrics, such as the number of changes required to turn one string into another (edit distance) or the number of shared character sequences. The outcome is often a score that represents the similarity between the two strings, with higher scores indicating greater similarity.

Algorithms used within Fuzzy Matching

Edit Distance (Levenshtein Distance): This algorithm measures the similarity between two strings by determining the minimum number of single-character edits (i.e., insertions, deletions, or substitutions) required to change one string into the other. For example, the Levenshtein distance between “kitten” and “sitting” is 3.
Damerau-Levenshtein Distance: This is an extension of the Levenshtein distance, taking into account transpositions (swapping of two adjacent characters). For instance, the distance between “flaw” and “lawn” considering a transposition would be 1 using Damerau-Levenshtein.
Smith-Waterman Algorithm: Originally developed for bioinformatics, this local sequence alignment algorithm can also be used for text comparisons. It’s particularly effective for scoring the similarity of substrings.
Jaro and Jaro-Winkler Distance: These are measures of similarity between two strings. The Jaro-Winkler distance gives more weight to the prefix of the strings and is especially useful for short strings like person names.
n-gram Analysis: In this technique, strings are broken down into overlapping substrings of ‘n’ characters. These n-grams are then compared to identify similarities. For example, using 2-grams (or bigrams), the word “hello” can be broken down into [“he”, “el”, “ll”, “lo”].
Token-Based Matching: This approach involves breaking strings into tokens (typically words) and comparing these tokens for similarity. Techniques like cosine similarity or Jaccard similarity can then be applied on these tokens.
Tf-idf (Term Frequency-Inverse Document Frequency): While more common in information retrieval systems, it can be applied to fuzzy matching. It measures how important a word is within a document relative to a collection of documents. It can be used in conjunction with cosine similarity for document comparisons
Longest Common Subsequence (LCS): This algorithm identifies the longest sequence of characters that two strings have in common. The LCS of “ABCBDAB” and “BDCAB” is “BCAB”.

Different use-cases and applications may demand different algorithms or combinations of them. The choice often depends on the specific requirements of the task , such as the need for speed versus accuracy, the nature of the data, and the context in which the fuzzy matching is being applied.

So Now what’s `fzf` and why use it?

fzf is a flexible tool that allows you to search and navigate any list (files, command history, git branches, etc.) using fuzzy matching. In essence, fuzzy matching means that you don’t need to type exact search terms; instead, you can make typos or give partial input, and fzf will intelligently suggest matches.

For instance, if you have files named “important_document”, “imported_files”, and “impromptu_notes”, typing “imp doc” in fzf might highlight “important_document” as the top match even though the search isn’t an exact substring.

fzf is incredibly fast, enabling swift searches through files and command history. It offers an intuitive interface that lets you search through files in real-time as you type. Additionally, fzf provides numerous integrations with other tools, including Vim, among others.

Setting up and using `fzf`.

In order to use fzf you can simply follow this link which directs you to the installation instructions for fzf tailored to your OS. However, if you’re on macOS, you can install fzf with the command brew install fzf, then execute /opt/homebrew/opt/fzf/install to install the shell completions.

`fzf` usage

Here’s the basic usage of fzf,

File Search

fzf

Command History Search

Press CTRL + R in your terminal to interactively search through your command history.

Preview Window

$ find dir/ | fzf --preview 'cat {}'

Using fzf with Other Commands

$ ls -l $(fzf)  # List the details of a selected file

Select and Kill Processes

$ kill -9 $(ps aux | fzf | awk '{print $2}')

Filter Git Branches

$ git branch | fzf

Searching through your browser history (FireFox)

To search through your browser history, you can also utilize fzf. The SQLite database, which stores the history, is typically found at the following path on a Mac:

~/Library/Application Support/Firefox/Profiles/*.default-release.

After navigating to that directory, execute:

sqlite3 places.sqlite "SELECT url FROM moz_places" | fzf

Here we’ve covered the basic usage of fzf, but the tool offers so much more to explore and utilize. I hope this provided insight into how fzf can enhance your daily workflow. If you have any tips related to fzf, please share them in the comments. I’d also love to hear how you’ve been using fzf.

If you loved this post, you can always support my work by buying me a coffee. your support would mean the world to me! Also, if you end up sharing this on Twitter, definitely tag me @muhammad_o7.

Note: If you like to be notified about the upcoming posts you can subscribe to the RSS or you can leave your email here

WebScraping in Bash

Mon, 04 Sep 2023 00:00:00 +0000

In the realm of web scraping, Python often takes the spotlight with robust libraries such as BeautifulSoup and Scrapy. But did you know that web scraping can also be accomplished using Bash scripting? In this blog post, we’ll delve into a Bash script that extracts links and titles from a webpage and stores them in a CSV file.

Spending most of my workday in the terminal, I’ve become intimately familiar with writing Bash automation scripts. However, to add a creative twist, I ventured into the world of web scraping using Bash. While Bash excels at scripting, I discovered its hidden talents in web scraping, which I’m excited to share in this blog post.

The Bash Script

#!/bin/bash

# Define the URL to scrape
base_url="https://lite.cnn.com"
url="https://lite.cnn.com/"

# Create a CSV file and add a header
echo "Link,Title" > cnn_links.csv

# Extract links and titles and save them to the CSV file
link_array=($(curl -s "$url" | awk -F 'href="' '/<a/{gsub(/".*/, "", $2); print $2}'))

for link in "${link_array[@]}"; do
    full_link="${base_url}${link}"
    title=$(curl -s "$full_link" | grep -o '<title[^>]*>[^<]*</title>' | sed -e 's/<title>//g' -e 's/<\/title>//g')
    echo "\"$full_link\",\"$title\"" >> cnn_links.csv
done

echo "Scraping and CSV creation complete. Links and titles saved to 'cnn_links.csv'."

How it Works?

This Bash script accomplishes the following:

It defines the base URL and the URL of the webpage you want to scrape.
It creates a CSV file named cnn_links.csv with a header row containing “Link” and “Title” columns.
Using curl, it fetches the HTML content of the specified webpage and extracts all the links found within anchor tags (<a>) using awk.
It then iterates through the array of links and extracts the page titles by making additional curl requests to each link.
Finally, it appends the extracted links and titles to the CSV file in the desired format.

Breaking it down further

grep -o '<title[^>]*>[^<]*</title>' extracts the page title from the HTML content using regular expressions.:
1. -o option tells grep to only output the matched part of the input text.
2. <title[^>]*> matches the opening <title> tag and any attributes (e.g., <title attribute="value">), if present.
3. [^<]* matches any characters that are not < (i.e., the text within the <title> tag). </title> matches the closing </title> tag.
4. </title> matches the closing </title> tag.
sed -e 's/<title>//g' -e 's/<\/title>//g' removes the <title> and </title> tags from the extracted title.`:
1. -e option allows specifying multiple commands to be executed by sed.
2. 's/<title>//g' is a sed command that replaces all occurrences of <title> with an empty string (i.e., removes the opening <title> tag).
3. 's/<\/title>//g' is another sed command that replaces all occurrences of </title> with an empty string (i.e., removes the closing </title> tag).

Combining these commands:

grep extracts the text within the <title> and </title> tags.
sed then removes the tags themselves, leaving only the text content of the title.

This command also uses awk to extract URLs from an HTML document. Let’s break it down step by step:

awk -F 'href="':
- awk is a text processing tool that operates on text files or input streams.
- -F 'href="' sets the field separator to 'href="'. This means awk will treat 'href="' as the delimiter for splitting input lines into fields.
'/<a/{gsub(/".*/, "", $2); print $2}':
- /<a/ is a pattern that specifies a condition: lines containing <a>. This ensures that the following actions are only applied to lines containing anchor tags.
- gsub(/".*/, "", $2) is an awk function that globally substitutes (gsub) everything from the first double quote (") to the end of the field ($2) with an empty string. In this case, it effectively removes the opening ", and the result is the URL.
- print $2 prints the modified field (the extracted URL).

So, this awk command looks for lines containing anchor tags (<a>) and extracts the URLs by removing everything before the first double quote (") in the href attribute. The extracted URLs are then printed as output.

Conclusion

So here’s a simple web scraper written in bash and uses cli tools such as awk, sed , grep and curl. As bash is available on most Linux system so it can be useful for scraping data from web pages without having to install any additional software. However, it is not as powerful as Python or other programming languages when it comes to web scraping. But it can be useful for simple tasks such as extracting links and titles from a webpage and I would not recommend using it for complex web scraping tasks.

Anyways this was a fun little script I created while learning about bash scripting and using cli tools such as awk , grep , sed and curl. I would still consider myself a beginner at this.

Lastly I hope you enjoyed reading this and got a chance to learn something new from this post and if you have any bash tips or used bash for similar task feel free to comment below as I would love to hear it.

If you loved this post, you can always support my work by buying me a coffee. your support would mean the world to me!. Also you can follow me on Twitter, and definitely tag me @muhammad_o7. when you share this post on twitter

Exploring Lesser-Known Commands and Advanced Features of Homebrew

Sat, 02 Sep 2023 00:00:00 +0000

Over a year ago, I began extensively using macOS. I originally came from a Linux background, where I started my Linux journey with Ubuntu, primarily using the apt package manager, and later switching to pacman. However, after using macOS at my day job, I decided to purchase a Mac for myself. Since then, it has become my daily driver. Like many on macOS, I’ve chosen Homebrew as my preferred package manager for installing software. In this post, I will delve into some lesser-known commands and advanced features of Homebrew.

What is Homebrew?

Homebrew, often referred to as brew, is a popular package manager for macOS and Linux operating systems. It provides a convenient way to install, update, and manage software packages and libraries on your system. Homebrew simplifies the process of installing and maintaining software by automating the downloading, compiling (if necessary), and installation of packages.

Homebrew is written in Ruby and uses Git for version control. It was created by Max Howell in 2009 and is currently maintained by a team of developers. Homebrew is open-source and distributed under the BSD 2-Clause License.

Here are some key features and aspects of Homebrew:

Package Management: Homebrew allows you to easily install a wide range of software packages, including command-line utilities, libraries, and applications, from a central repository.
Dependency Management: It automatically handles dependencies for you, ensuring that all required components are installed when you install a package.
Casks: In addition to command-line software, Homebrew Cask extends Homebrew’s functionality to include GUI applications. This is especially useful for installing and updating graphical software.
Taps: Homebrew supports the concept of “taps,” which are additional repositories maintained by the community or individuals. Taps allow you to access packages not found in the core Homebrew repository.
Custom Formulas: Users can create custom formulas (recipes) to install software that isn’t in the main repository. This feature is handy for maintaining your own packages or adding packages from other sources.

Homebrew Lesser Known Commands

In this section, I will be discussing some lesser-known commands and some of the advanced features of Homebrew. You can also find more information about these commands by running brew help or man brew.

Create a Formula for a Package

Did you know brew create command in brew allows you to create a new formula for a software package by specifying a URL to the package’s source code.

brew create <URL_to_source>

When you run this command it does the following:

Downloads the source code from the specified URL.
Generates a new formula file in Homebrew’s formulae directory, typically named after the package, with a .rb extension.

This command is useful for adding packages to Homebrew that are not part of the official repository, allowing you to create and manage custom formulas for their preferred software packages.

Retrieving Package Information

brew info <package_name>

This command simply allows you to retrieve information about a package, including its version, dependencies, and installation path. It also displays the URL of the package’s homepage and the formula file used to install it.

Listing Installed Packages

brew list

This command allows you to list all installed packages on your system and you can also list cask by running brew list --cask.

Homebrew Interactive Shell Session

brew sh

This command allows you to start an interactive shell session with Homebrew. This is useful for testing out commands and experimenting with Homebrew’s functionality. As the shell is isolated and it does not affect your system’s global environment. It comes in pretty to test packages, or experiment with different configurations.

Brew Bundle Dump

brew bundle dump

This command is pretty handy as it allows you to dump all the installed packages into a Brewfile. This is useful when you want to share the list of packages you have installed with someone else or you want to install the same packages on another system.

brew gist-logs <package_name>

The brew gist-logs command in Homebrew is used to upload the logs for a specific formula (package) installation to a Gist on GitHub. This command is particularly helpful for debugging and troubleshooting Homebrew-related issues or for seeking assistance from the Homebrew community.

Last but not least here’s a custom that I created which using brew to list all the packages with their sizes and display them in a human-readable format.

brewpackages (){
  brew list --formula | xargs -n1 -P8 -I {} \
    sh -c "brew info {} | egrep '[0-9]* files, ' | sed 's/^.*[0-9]* files, \(.*\)).*$/{} \1/'" | \
    sort -h -r -k2 - | column -t
}

Whether you’re a seasoned Homebrew user or new to the world of package management, these commands and features can make your life easier and more productive. I hope you found this post useful and learned something new about Homebrew. If you have any questions or comments, feel free to leave them below.

Creating a Simple Pastebin Service in Python and Flask

Getting Starting

Writing Code

Code Breakdown

Conclusion

Announcements

Understanding HTTP Server by implementing in Python

Understanding the basics of an HTTP Server.

Handling Requests

Processing Requests

Sending Responses

Example Response

Implementation in Python

Lets breakdown the code.

Define Server Address and Port

Creating Socket Object

Bind the Socket to the Address and Port

Listen for Incoming Connections

Understand Main Loop

Conclusion

Announcements

Email Testing with Python's smtpd Module

Setting up smtpd server.

Writing a simple python script.

Conclusion

One Liners Python Edition

Understanding Linux cp Command and Implementing in Python

How cp Command Works?

Here’s the typical workflow of the cp command:

System Calls: Diving Deeper into the internals of cp.

open()

read()

write()

close()

fchmod()

futimens()

creat()

Replicating cp in Python:

Conclusion

My Useful Shell Functions

Viewing CSV Files in a Better Format

Checking Recently Modified Files

Compressing Multiple Files

Searching text in files

Checking high usage memory processes

Listing Open Ports

Listening Ports for specific process

Understanding Python Variables: Namespaces and Variable Scope

What is a Namespace?

Types of Namespaces

Variable Scope

Understanding Scope with Examples

The global and nonlocal Keywords

Avoid Variable Shadowing

Conclusion

fzf - The Command-Line Fuzzy Finder

What’s fzf and why use it?

What’s Fuzzy Matching?

How it works? In Simplest Terms

Algorithms used within Fuzzy Matching

So Now what’s fzf and why use it?

Setting up and using fzf.

fzf usage

WebScraping in Bash

The Bash Script

How it Works?

Breaking it down further

Conclusion

Exploring Lesser-Known Commands and Advanced Features of Homebrew

What is Homebrew?

Homebrew Lesser Known Commands

Setting up `smtpd` server.

Writing a simple `python` script.

How `cp` Command Works?

System Calls: Diving Deeper into the internals of `cp`.

`open()`

`read()`

`write()`

`close()`

`fchmod()`

`futimens()`

`creat()`

Replicating `cp` in Python:

The `global` and `nonlocal` Keywords

What’s `fzf` and why use it?

So Now what’s `fzf` and why use it?

Setting up and using `fzf`.

`fzf` usage