Understanding Python's New pathlib Module

February 8, 2018

python dev

If you’re still using os.path for file operations in Python, it’s time for an upgrade. The pathlib module, introduced in Python 3.4 and improved through 3.5 and 3.6, provides an object-oriented approach to filesystem paths that’s cleaner, more intuitive, and more powerful.

The Problem with os.path

Here’s typical old-style file path manipulation:

import os

# Building paths
base_dir = '/home/user/project'
config_path = os.path.join(base_dir, 'config', 'settings.json')

# Getting file info
filename = os.path.basename(config_path)
directory = os.path.dirname(config_path)
extension = os.path.splitext(config_path)[1]

# Checking file properties
exists = os.path.exists(config_path)
is_file = os.path.isfile(config_path)

# Reading file
with open(config_path, 'r') as f:
    content = f.read()

It works, but it’s verbose. Functions are scattered across os, os.path, and shutil. And you’re constantly manipulating strings.

Enter pathlib

Here’s the same operations with pathlib:

from pathlib import Path

# Building paths
base_dir = Path('/home/user/project')
config_path = base_dir / 'config' / 'settings.json'

# Getting file info
filename = config_path.name
directory = config_path.parent
extension = config_path.suffix

# Checking file properties
exists = config_path.exists()
is_file = config_path.is_file()

# Reading file
content = config_path.read_text()

Notice the / operator for joining paths. That alone makes pathlib worth using.

Key Features

Path Creation

from pathlib import Path

# Absolute paths
home = Path('/home/user')
root = Path('/')

# Relative paths
config = Path('config/settings.json')

# Current directory
cwd = Path.cwd()

# Home directory
home = Path.home()

# From __file__
script_dir = Path(__file__).parent

Path Operations

path = Path('/home/user/project/src/main.py')

path.name          # 'main.py'
path.stem          # 'main'
path.suffix        # '.py'
path.suffixes      # ['.py'] (handles .tar.gz)
path.parent        # Path('/home/user/project/src')
path.parents       # Iterate up the directory tree
path.parts         # ('/', 'home', 'user', 'project', 'src', 'main.py')
path.anchor        # '/'

Joining Paths

The / operator is the star of the show:

base = Path('/var/www')

# Using /
full_path = base / 'html' / 'index.html'

# Using joinpath
full_path = base.joinpath('html', 'index.html')

# Both give: Path('/var/www/html/index.html')

File System Queries

path = Path('/some/path')

path.exists()      # Does it exist?
path.is_file()     # Is it a file?
path.is_dir()      # Is it a directory?
path.is_symlink()  # Is it a symlink?
path.is_absolute() # Is it absolute?

# File stats
stat = path.stat()
stat.st_size       # File size in bytes
stat.st_mtime      # Modification time

Reading and Writing

No more with open(...):

path = Path('data.txt')

# Reading
text = path.read_text()
binary = path.read_bytes()

# Writing
path.write_text('Hello, World!')
path.write_bytes(b'binary data')

For large files, you’ll still want streaming, but for small files this is beautifully concise.

Globbing and Iteration

Finding files is much cleaner:

project = Path('/home/user/project')

# Find all Python files
for py_file in project.glob('*.py'):
    print(py_file)

# Recursive search
for py_file in project.rglob('*.py'):
    print(py_file)

# Iterate directory contents
for child in project.iterdir():
    if child.is_file():
        print(f'File: {child.name}')

Directory Operations

path = Path('new_directory/nested/deep')

# Create directories (like mkdir -p)
path.mkdir(parents=True, exist_ok=True)

# Remove
path.rmdir()  # Only works on empty directories

# For non-empty directories, you still need shutil
import shutil
shutil.rmtree(path)

File Manipulation

source = Path('old_name.txt')
target = Path('new_name.txt')

# Rename/move
source.rename(target)

# Replace (overwrites if exists)
source.replace(target)

# Delete file
target.unlink()

# Touch (create empty file)
Path('new_file.txt').touch()

Practical Patterns

Working with Scripts

from pathlib import Path

# Get the directory containing this script
SCRIPT_DIR = Path(__file__).resolve().parent

# Reference files relative to script
CONFIG_FILE = SCRIPT_DIR / 'config.json'
DATA_DIR = SCRIPT_DIR / 'data'

Building Project Paths

from pathlib import Path

# Project root
PROJECT_ROOT = Path(__file__).resolve().parent.parent

# Common directories
SRC_DIR = PROJECT_ROOT / 'src'
TESTS_DIR = PROJECT_ROOT / 'tests'
BUILD_DIR = PROJECT_ROOT / 'build'

# Ensure directories exist
BUILD_DIR.mkdir(exist_ok=True)

Processing Multiple Files

from pathlib import Path

def process_json_files(directory):
    """Process all JSON files in a directory."""
    results = []
    
    for json_file in Path(directory).glob('*.json'):
        data = json.loads(json_file.read_text())
        results.append({
            'file': json_file.name,
            'data': data
        })
    
    return results

Compatibility with os.path

Many libraries still expect string paths. Convert when needed:

path = Path('/some/path')

# Convert to string
str(path)                    # '/some/path'
path.__fspath__()           # '/some/path' - explicit
os.fspath(path)             # '/some/path' - recommended

# Most modern libraries accept Path objects directly
with open(path, 'r') as f:  # Works!
    ...

Since Python 3.6, most standard library functions accept Path objects directly via the os.PathLike protocol.

When to Use os.path Still

pathlib doesn’t replace everything:

os.walk() for recursive directory traversal (though rglob often works)
os.scandir() when you need maximum performance
shutil for operations like copytree and rmtree

But for most daily path operations, pathlib is the better choice.

Final Thoughts

pathlib is one of Python’s best quality-of-life improvements. It makes code more readable, reduces errors from string manipulation, and feels natural once you’re used to it.

Start using it in new code. Refactor old code when you touch it. Your future self will thank you.

Paths should be objects, not strings.