How Python uses Garbage Collection for Efficient Memory Management

How Python uses Garbage Collection for Efficient Memory Management

What are variables in Python?

A variable in Python is usually assumed to be a label of a value. Instead, a variable references an object that holds a value.

In Python, variables are references.

How are objects stored in Python?

An object can be defined as a block of memory with a specific value supporting specific type of operations.

In Python, everything is an object.

A Python object is stored in memory with names (not variables) and references

  • Name - Just a label for an object. An object can have multiple names.
  • References - A name referring an object.

Every object consists of : reference count, type, value.

Python object How variables are stored in memory. Image by author.

References Introduction

The following example assigns a number with value 10 to num variable

num = 10

Under the hood, Python creates a new integer object of type int in the memory. The variable num references to that memory address

To find the memory address of an object referenced by a variable we can use the built-in id() function.

The id() function returns memory address as a base-10 number. We will convert it into hexadecimal using in-built hex() function.

print(hex(id(num)))

--> 0x7ffdb446d448

Hex representation Hex representation of a reference’s memory address. Image by author.

Passing arguments in Python functions

In Python unlike other languages, there is no such thing as pass by value or pass by reference.

Instead, Python has the concept of pass by assignment or pass by object reference.

When a function is called with an argument, a new reference to the object is created and assigned to the parameter variable in the function. The parameter variable becomes a new reference to the same object in memory, not a copy of the object itself. Any modifications made to the object within the function will affect the original object outside the function.

The value of the reference (the memory address) is passed to the function, not the value of the object itself.

Example : The parameter is immutable

Immutable objects include built-in data types like int, float, complex, bool, strings, bytes and tuples.

def f(name):
    name = 'John'

new_name = 'Mary'
f(new_name)
print(new_name)

Output:

Mary

In the above example, both name and new_name point to Mary at the same time. But when name = ‘John‘, a new object is recreated with the value of John and name continues pointing to it, while new_name still points to Mary. Hence the value of new_name does not change.

Example : The parameter is mutable

Mutable objects include list, dict and set.

def f(students):
    students.append(3)

students = [0,1,2]
f(students)
print(students)

Output:

[0,1,2,3]

In the example above, as students is a list, changing the value of students will also change value of all variables that point to it. Hence students becomes [0,1,2,3]

Garbage Collection

Garbage collection in Python refers to the automatic process of reclaiming memory occupied by objects that are no longer in use. It is a mechanism that manages the allocation and deallocation of memory in Python.

Python uses a garbage collector to automatically detect and remove objects that are no longer referenced or reachable by the program. When an object is no longer needed, the garbage collector identifies it as garbage and frees up the memory occupied by that object.

The two strategies used for garbage collection are

  • reference counting
  • generational garbage collection

1. Reference Counting

It keeps track of the number of references to each object, and when the count reaches zero, indicating that no references to the object exist, the object is considered garbage and the memory is reclaimed.

To get the reference count of an object, we can use the built in ctypes module.

import ctypes

def count_references(address):
    """
    Count the number of references to the object at the given address.
    """
    return ctypes.c_long.from_address(address).value

students = 15
print(count_references(id(students)))

# Step 1
toppers = students 
print(count_references(id(students)))

# Step 2
toppers = 2
print(count_references(id(students))) 

# Step 3
students = 1
print(count_references(id(students)))

Step 1

Step 1: reference count of students = 2. Image by author.

Step 2

Step 2: reference count of students = 1. Image by author.

Step 3

Step 3: The number of references of the integer object with value of 15 will be 0. Image by author.

But reference counting cannot solve the problem of cyclical reference.

What is cyclical reference?

A cyclical reference, also known as a reference cycle or circular reference, occurs in Python when a group of objects reference each other in a way that forms a closed loop, preventing them from being garbage collected. This can lead to memory leaks as the objects involved are not eligible for automatic memory reclamation since their reference counts never reach zero.

Basic example of cyclical reference:

x = []
x.append(x)
print(x)

In the above example x is referring to itself, which makes it a cyclical reference.

To solve this problem Python uses Generational Garbage Collection.

2. Generational Garbage Collection

Generational Garbage Collection uses a trace-based garbage collection technique.

Trace-based garbage collection is a technique used in some garbage collection algorithms to identify and collect unreachable objects. It works by tracing the execution of a program and identifying live objects based on their accessibility from root references.

Generational Garbage Collection divides objects into different generations based on their age, with the assumption that most objects become garbage relatively quickly after they are created.

The main idea behind Generational Garbage Collection is that younger objects are more likely to become garbage than older objects. Python's garbage collector focuses its efforts on the younger generations, performing frequent garbage collection on them. Older generations are garbage collected less frequently since they are expected to contain objects that have survived multiple collections and are less likely to become garbage.

Generational Garbage Collection helps address the problem of cyclical references by periodically examining objects in different generations and collecting those that are no longer reachable. It detects and breaks cyclical references by identifying unreachable objects through a process known as "mark and sweep."

Generational Garbage Collection thus ensures:

  • no memory leaks
  • proper utilization of system resources
  • efficient garbage collection

Programmatically interact with Python’s garbage collector

In the example below, we create two classes Students and Boys referencing each other and perform garbage collection using in-built gc module (Garbage Collector interface).

You should never disable the garbage collector unless required.

import gc
import ctypes

def count_references(address):
    """
    Count the number of references to the object at the given address.
    """
    return ctypes.c_long.from_address(address).value

def object_exists(obj_id):
    """
    Return True if the object with the given id exists.
    """
    for obj in gc.get_objects():
        if id(obj) == obj_id:
            return True
    return False

class Students:
    def __init__(self):
        self.boys = Boys(self)
        print(f'Students: {hex(id(self))}, Boys: {hex(id(self.boys))}')

class Boys:
    def __init__(self, students):
        self.students = students
        print(f'Boys: {hex(id(self))}, Students: {hex(id(self.students))}')

gc.disable()

students = Students()

students_id = id(students)
boys_id = id(students.boys)

print(f'Number of references to students: {count_references(students_id)}') # 2

print(f'Number of references to boys: {count_references(boys_id)}') # 1

print(f'Does students exist? {object_exists(students_id)}') # True
print(f'Does boys exist? {object_exists(boys_id)}') # True

students = None

print(f'Number of references to students: {count_references(students_id)}') # 1

print(f'Number of references to boys: {count_references(boys_id)}') # 1

print(f'Does students exist? {object_exists(students_id)}') # True
print(f'Does boys exist? {object_exists(boys_id)}') # True

print('Collecting garbage...')
gc.collect()

print(f'Does students exist? {object_exists(students_id)}') # False
print(f'Does boys exist? {object_exists(boys_id)}') # False

print(f'Number of references to students: {count_references(students_id)}') # 0

print(f'Number of references to boys: {count_references(boys_id)}') # 0

Output:

Boys: 0x1e18b68c6d0, Students: 0x1e18b698510
Students: 0x1e18b698510, Boys: 0x1e18b68c6d0
Number of references to students: 2
Number of references to boys: 1
Does students exist? True
Does boys exist? True
Number of references to students: 1
Number of references to boys: 1
Does students exist? True
Does boys exist? True
Collecting garbage...
Does students exist? False
Does boys exist? False
Number of references to students: 0
Number of references to boys: 0

Conclusion

Garbage collection in Python helps manage memory efficiently, automatically freeing up resources and preventing memory leaks, so developers can focus on writing code without explicitly managing memory deallocation.


GitHub - github.com/karishmashuklaa Twitter - twitter.com/amhsirak_