Ai cost amplification attacks

A developer working on a new feature that integrates with a commercial Large Language Model (LLM) provider, like OpenAI or Anthropic, hardcodes an API key directly into a source file for a quick prototype. They commit the code to a feature branch and push it to a public GitHub repository. Within minutes, an attacker’s automated scanner, which constantly monitors public repositories for secret patterns (e.g., sk-... for OpenAI), discovers the key.

The attacker immediately starts a script that uses the key to make thousands of requests per minute to the most expensive model endpoint available (e.g., the latest GPT model for complex image analysis). The script’s goal is not to get meaningful results, but simply to maximize the cost of each API call. The developer’s company is unaware until the next morning when they receive an automated billing alert from their cloud provider showing a projected bill of tens of thousands of dollars, a massive spike from their usual daily spend. The API key hits its rate limit, but not before causing significant financial damage and a service outage for legitimate users.

Reconnaissance

Explanation

This is the discovery phase. Attackers aren’t manually browsing your code; they use automated tools to scan public data sources for secrets. The most common target is public source code repositories like GitHub, but keys can also be found in public S3 buckets, exposed container images, pastebin sites, or even in the minified JavaScript of a web application. These bots search for specific patterns that identify keys for popular services (AWS, Google Cloud, OpenAI, Stripe, etc.). Once found, the keys are tested for validity and then cataloged for immediate abuse or for sale on dark web marketplaces.

Insight

A secret is exposed the moment it’s committed, even if you immediately delete it or amend the commit. Git history is a permanent record. Attackers specifically scan the entire commit history of repositories, not just the latest version of the code. A key committed and removed months ago is just as vulnerable as one committed today.

Practical

Audit your code visibility: Understand which of your repositories are public vs. private. Be aware that even private repositories can be compromised if a developer’s account is breached.
Scan your history: It’s not enough to check your current main branch. You must scan the entire commit history of all branches for secrets that were committed and later removed.
Think beyond code: Consider where else secrets might be exposed: build logs in your CI/CD system, error messages, internal documentation, or shared Slack channels.

Tools/Techniques

Secret Scanners:
TruffleHog: Scans repository commit history for secrets. Can be run locally or in CI/CD.
gitleaks: A static analysis tool for detecting hardcoded secrets like passwords, API keys, and tokens in git repos.
Pre-commit Hooks:
git-secrets: Prevents you from committing passwords and other sensitive information to a git repository.
pre-commit: A framework for managing and maintaining multi-language pre-commit hooks. You can configure it to run a secret scanner before any commit is finalized.

Metrics/Signal

Alert Volume: Number of high-confidence secret alerts generated by your scanning tools. A high number indicates a systemic problem.
Historic Findings: Number of secrets found in git history vs. new commits. This helps you prioritize cleaning up old mistakes versus preventing new ones.

Evaluation

Explanation

This stage involves looking inward to assess your team’s vulnerability. It’s about asking hard questions: Where are we currently storing API keys and other secrets? Are they in code, in configuration files, or in environment variables? Who has access to them? What is our process for rotating keys? Answering these questions helps you understand your current risk posture before an attacker does.

Insight

Your “blast radius” is determined by your current practices. If you use one super-powered API key across all your applications (dev, staging, prod), the compromise of that one key is a catastrophic event. If keys are siloed and have limited permissions, you are inherently more resilient. Vulnerabilities often lie in the “temporary” solutions developers create that become permanent fixtures.

Practical

Run a comprehensive audit: Use a secret scanning tool to perform a one-time, deep scan across all your organization’s repositories.
Interview developers: Ask team members how they manage secrets for local development. You might uncover risky practices like storing keys in shell history (.bash_history), unencrypted files, or shared documents.
Review your CI/CD pipeline: Examine how secrets are passed to build and deployment scripts. Are they masked in logs? Are they stored securely?

Tools/Techniques

Centralized Secret Scanners:
GitGuardian: A platform that integrates with your SCM (like GitHub or GitLab) to provide real-time secret scanning and alerting.
Snyk Code: Scans your code for security vulnerabilities, including hardcoded secrets.
Manual Review:
Use grep or your IDE’s search function to look for common key prefixes (sk-, AKIA, etc.) and variable names (API_KEY, SECRET_TOKEN).

Metrics/Signal

Secrets per Repository: A raw count of secrets found during the audit, which helps prioritize the riskiest projects.
Mean Time to Remediate (MTTR): How long, on average, does it take from the moment a secret is discovered to when it is fully revoked and replaced? A high MTTR is a major risk indicator.

Fortify

Explanation

Fortification is about building defenses to prevent secrets from being exposed in the first place. The goal is to make the “right way” of handling secrets the “easy way” for developers. This means moving secrets out of the codebase and into a secure, managed environment, and automating the process of injecting them into your application at runtime.

Insight

Developers will always choose the path of least resistance to get their job done. If your security process is cumbersome, they will find workarounds. Therefore, the best security controls are those that are transparent and integrated directly into the developer’s workflow (e.g., in the IDE, in the CI/CD pipeline).

Practical

Adopt a Secrets Manager: Centralize all application secrets in a dedicated service. This should be the single source of truth.
Use Environment Variables for Local Dev: Store keys for local development in .env files and ensure .env is listed in your project’s .gitignore file. Use a library to load these variables at runtime.
Integrate Secrets Management into CI/CD: Your CI/CD system should securely inject secrets into the application environment during build or deployment. The secrets themselves should never be visible in pipeline logs.

Tools/Techniques

Secrets Management Platforms:
HashiCorp Vault: A powerful, open-source tool for managing secrets.
AWS Secrets Manager: A managed service for storing and rotating secrets on AWS.
Google Secret Manager: Google Cloud’s native secret management service.
Azure Key Vault: Microsoft Azure’s solution for secrets management.
Local Development Libraries:
Python: python-dotenv
Node.js: dotenv
Java: Libraries like dotenv-java can be used.
CI/CD Integration:
GitHub Actions Encrypted Secrets
GitLab CI/CD Variables

Metrics/Signal

Secrets Manager Adoption: Percentage of services/applications that pull their secrets from a central manager instead of configuration files.
Zero Hardcoded Secrets: Your secret scanner should report zero new hardcoded secrets in commits to your main branches.

Limit

Explanation

Limiting the blast radius assumes a breach will happen. The goal is to contain the damage when an API key is inevitably leaked. This involves applying the principle of least privilege: a key should only have the absolute minimum permissions, quota, and lifetime required to perform its specific task.

Insight

A leaked key’s potential for damage is directly proportional to its permissions. A key that can only read data is less dangerous than one that can write. A key with a $100/month spending cap cannot cause a $100,000 billing incident. You should treat API keys like you treat user permissions—with strict, granular control.

Practical

Set Billing Alerts and Hard Limits: Configure aggressive billing alerts in your cloud or AI provider’s dashboard. Where possible, set hard spending caps that automatically disable the key or service when exceeded.
Create Service-Specific Keys: Don’t use a single master key for all your applications. Create a unique API key for each service (e.g., product-recommendation-api-key, chatbot-dev-key).
Scope Permissions: If a key only needs to use a specific AI model or a read-only endpoint, restrict it to only that capability.
Use IP Allowlisting: If your application runs on a fixed set of IP addresses, configure the API provider to only accept requests from those IPs.

Tools/Techniques

Provider-Level Controls:
AWS Budgets: Set custom cost and usage budgets that alert you when thresholds are exceeded.
OpenAI Usage Limits: Configure monthly budget limits and rate limits for your organization.
Anthropic Rate Limits: Understand and monitor the default rate limits on your account.
Architectural Controls:
API Gateway: Use a gateway like Amazon API Gateway or Kong as a proxy. The gateway can manage a single, secure key to the external service while exposing internal, easily-revocable keys to your microservices. It can also enforce its own rate limiting and caching.

Metrics/Signal

Billing Alert Coverage: Percentage of critical third-party services that have a billing alert configured.
Key Granularity: Ratio of API keys to services. A ratio close to 1:1 is a good sign.
Quotas Set: Percentage of API keys with explicit usage or spending quotas configured.

Expose

Explanation

This is about visibility. How do you know an attack is happening right now? You need to collect the right signals and monitor them for anomalies. A cost-amplification attack has a very distinct signature: a sudden, massive spike in API calls and associated costs, often from unusual geographic locations. Effective exposure means having the logging, monitoring, and alerting in place to detect this signature in near real-time.

Insight

Your application logs are a valuable source of data, but they only tell part of the story. You must also ingest and monitor logs from the third-party service itself (e.g., API usage logs, audit logs) and your cloud provider (e.g., billing data). Correlating these different data sources is key to getting a complete picture of an attack.

Practical

Log Every API Call: In your application, log every outbound call to a high-cost AI endpoint. Include metadata like which internal service made the call, the source IP, and the user context.
Enable Provider-Level Logging: Turn on all available audit and usage logging from your AI/ML service provider.
Create Anomaly Detection Alerts: Set up dashboards and alerts that trigger on sudden deviations from the norm. Don’t just alert on a fixed threshold (e.g., “>1000 calls/min”); alert on a sudden percentage increase (e.g., “500% increase in calls over the last 15 minutes”).

Tools/Techniques

Observability & APM:
Datadog, New Relic: Monitor application performance and can be configured to track custom metrics like API call volume and costs.
Prometheus & Grafana: A popular open-source combination for time-series monitoring and dashboarding.
AI Observability Platforms:
LangSmith: Provides detailed tracing and cost analysis for LLM applications.
Arize AI: An ML observability platform that can help track model usage and performance metrics which often correlate with cost.
Cloud Provider Tools:
AWS CloudTrail: Logs all API activity within your AWS account.
Google Cloud’s operations suite: Provides logging, monitoring, and tracing for GCP services.

Metrics/Signal

API Call Rate: A sudden, sustained spike in the number of calls per second/minute to a specific API endpoint.
Cost Velocity: The rate at which your bill is increasing. Many dashboards can show a real-time forecast. A sharp upward trend is a primary indicator of attack.
Geographic Distribution: A sudden shift in API calls originating from unexpected countries or IP ranges.
Error Rate: A high rate of 429 (Too Many Requests) errors indicates your quota is being exhausted.

eXercise

Explanation

This is where you practice for failure. You wouldn’t expect a firefighter to be effective without drills; likewise, your development team can’t be expected to respond to a security incident without practice. By simulating an attack in a controlled way, you can test your defenses, identify weaknesses in your processes, and build the “muscle memory” needed to respond quickly and effectively when a real incident occurs.

Insight

The goal of an exercise is not to pass or fail, but to learn. A drill that goes perfectly might mean your test was too easy. A drill that uncovers a major flaw in your response plan is a huge success because you found it before an attacker did. It’s about building a culture of resilient engineering.

Practical

Tabletop Exercise: Gather the team and walk through the attack scenario verbally. “A developer just posted an alert in Slack: ‘GitHub found an OpenAI key in my last commit!’ What is our immediate first step? Who has the credentials to revoke the key? How do we assess the damage? Who communicates with leadership?” Document the answers in a runbook.
Live Fire Drill (Controlled):
1. Create a brand new, dummy API key with a very low, non-critical quota and a $1 spending limit.
2. Intentionally commit this key to a public test repository.
3. Start a timer and see how long it takes for (a) your automated Expose systems to detect and alert on it, and (b) for the on-call developer to follow the runbook and revoke the key.
Security Education: Use the results of these exercises to educate the wider team. Show them how quickly a leaked key can be found and abused. Reinforce the best practices from the Fortify stage.

Tools/Techniques

Runbook/Playbook Documentation:
Store your incident response plan in a central, accessible place like Confluence or Notion. The plan should have clear, step-by-step instructions.
Incident Management:
PagerDuty, Opsgenie: Tools for managing on-call schedules and incident response workflows.
Simulated Attack Tools:
For this specific scenario, a simple script using the “leaked” key is often sufficient. More advanced scenarios could use red team services or platforms.

Metrics/Signal

Time to Detect (TTD): The time from the simulated key leak to the first automated alert being generated.
Time to Remediate (TTR): The total time from the leak to the key being successfully revoked. Your goal should be to drive this down from hours to minutes.
Runbook Accuracy: During the exercise, did the team find the runbook helpful and accurate? Note any steps that were confusing or missing.