Understanding Python's New pathlib Module
If you’re still using os.path for file operations in Python, it’s time for an upgrade. The pathlib module, introduced in Python 3.4 and improved through 3.5 and 3.6, provides an object-oriented approach to filesystem paths that’s cleaner, more intuitive, and more powerful.
The Problem with os.path
Here’s typical old-style file path manipulation:
import os
# Building paths
base_dir = '/home/user/project'
config_path = os.path.join(base_dir, 'config', 'settings.json')
# Getting file info
filename = os.path.basename(config_path)
directory = os.path.dirname(config_path)
extension = os.path.splitext(config_path)[1]
# Checking file properties
exists = os.path.exists(config_path)
is_file = os.path.isfile(config_path)
# Reading file
with open(config_path, 'r') as f:
content = f.read()
It works, but it’s verbose. Functions are scattered across os, os.path, and shutil. And you’re constantly manipulating strings.
Enter pathlib
Here’s the same operations with pathlib:
from pathlib import Path
# Building paths
base_dir = Path('/home/user/project')
config_path = base_dir / 'config' / 'settings.json'
# Getting file info
filename = config_path.name
directory = config_path.parent
extension = config_path.suffix
# Checking file properties
exists = config_path.exists()
is_file = config_path.is_file()
# Reading file
content = config_path.read_text()
Notice the / operator for joining paths. That alone makes pathlib worth using.
Key Features
Path Creation
from pathlib import Path
# Absolute paths
home = Path('/home/user')
root = Path('/')
# Relative paths
config = Path('config/settings.json')
# Current directory
cwd = Path.cwd()
# Home directory
home = Path.home()
# From __file__
script_dir = Path(__file__).parent
Path Operations
path = Path('/home/user/project/src/main.py')
path.name # 'main.py'
path.stem # 'main'
path.suffix # '.py'
path.suffixes # ['.py'] (handles .tar.gz)
path.parent # Path('/home/user/project/src')
path.parents # Iterate up the directory tree
path.parts # ('/', 'home', 'user', 'project', 'src', 'main.py')
path.anchor # '/'
Joining Paths
The / operator is the star of the show:
base = Path('/var/www')
# Using /
full_path = base / 'html' / 'index.html'
# Using joinpath
full_path = base.joinpath('html', 'index.html')
# Both give: Path('/var/www/html/index.html')
File System Queries
path = Path('/some/path')
path.exists() # Does it exist?
path.is_file() # Is it a file?
path.is_dir() # Is it a directory?
path.is_symlink() # Is it a symlink?
path.is_absolute() # Is it absolute?
# File stats
stat = path.stat()
stat.st_size # File size in bytes
stat.st_mtime # Modification time
Reading and Writing
No more with open(...):
path = Path('data.txt')
# Reading
text = path.read_text()
binary = path.read_bytes()
# Writing
path.write_text('Hello, World!')
path.write_bytes(b'binary data')
For large files, you’ll still want streaming, but for small files this is beautifully concise.
Globbing and Iteration
Finding files is much cleaner:
project = Path('/home/user/project')
# Find all Python files
for py_file in project.glob('*.py'):
print(py_file)
# Recursive search
for py_file in project.rglob('*.py'):
print(py_file)
# Iterate directory contents
for child in project.iterdir():
if child.is_file():
print(f'File: {child.name}')
Directory Operations
path = Path('new_directory/nested/deep')
# Create directories (like mkdir -p)
path.mkdir(parents=True, exist_ok=True)
# Remove
path.rmdir() # Only works on empty directories
# For non-empty directories, you still need shutil
import shutil
shutil.rmtree(path)
File Manipulation
source = Path('old_name.txt')
target = Path('new_name.txt')
# Rename/move
source.rename(target)
# Replace (overwrites if exists)
source.replace(target)
# Delete file
target.unlink()
# Touch (create empty file)
Path('new_file.txt').touch()
Practical Patterns
Working with Scripts
from pathlib import Path
# Get the directory containing this script
SCRIPT_DIR = Path(__file__).resolve().parent
# Reference files relative to script
CONFIG_FILE = SCRIPT_DIR / 'config.json'
DATA_DIR = SCRIPT_DIR / 'data'
Building Project Paths
from pathlib import Path
# Project root
PROJECT_ROOT = Path(__file__).resolve().parent.parent
# Common directories
SRC_DIR = PROJECT_ROOT / 'src'
TESTS_DIR = PROJECT_ROOT / 'tests'
BUILD_DIR = PROJECT_ROOT / 'build'
# Ensure directories exist
BUILD_DIR.mkdir(exist_ok=True)
Processing Multiple Files
from pathlib import Path
def process_json_files(directory):
"""Process all JSON files in a directory."""
results = []
for json_file in Path(directory).glob('*.json'):
data = json.loads(json_file.read_text())
results.append({
'file': json_file.name,
'data': data
})
return results
Compatibility with os.path
Many libraries still expect string paths. Convert when needed:
path = Path('/some/path')
# Convert to string
str(path) # '/some/path'
path.__fspath__() # '/some/path' - explicit
os.fspath(path) # '/some/path' - recommended
# Most modern libraries accept Path objects directly
with open(path, 'r') as f: # Works!
...
Since Python 3.6, most standard library functions accept Path objects directly via the os.PathLike protocol.
When to Use os.path Still
pathlib doesn’t replace everything:
os.walk()for recursive directory traversal (thoughrgloboften works)os.scandir()when you need maximum performanceshutilfor operations likecopytreeandrmtree
But for most daily path operations, pathlib is the better choice.
Final Thoughts
pathlib is one of Python’s best quality-of-life improvements. It makes code more readable, reduces errors from string manipulation, and feels natural once you’re used to it.
Start using it in new code. Refactor old code when you touch it. Your future self will thank you.
Paths should be objects, not strings.