⚡ LECTURE 12

GenAI Commercial APIs

Use powerful AI models without training your own. Learn the OpenAI and Google AI APIs, how to manage API keys securely, how billing and rate limits work, and how to get embeddings via API.

Syllabus topics 45–49 ⏱ ~24 min read 12 practice questions

In this lecture

What are Commercial AI APIs?
OpenAI API & Google AI API
API Key Management
Costs, Billing & Rate Limits
Word Embeddings via API
Practice Questions

12.1 What are Commercial AI APIs?

Commercial AI API — a service that gives secure, reliable access to powerful cloud-hosted AI models through simple HTTP requests, without you having to train or host the model yourself.

The provider manages all the heavy infrastructure — GPUs, scaling, updates. You just send a request and get a response. Key characteristics:

REST-based — JSON requests and JSON responses over HTTP.
Pay-as-you-go — subscription or usage-based pricing; no fixed cost to start.
Authenticated — every request needs a valid API key.
Enterprise-grade — built for security and scalability.

12.2 OpenAI API & Google AI API

OpenAI API

Gives access to OpenAI's models via HTTP requests for text generation, summarization, translation and embeddings. Core components:

Component	Purpose
Client App	Sends API requests
API Gateway	Authentication & routing
AI Models	Generate responses
Usage Tracker	Billing & limits
Response Handler	Returns JSON output

OpenAI offers chat models (conversations, Q&A), embedding models (search, RAG), lightweight models (low-cost tasks) and advanced models (complex reasoning).

Google AI API (Gemini)

Gemini is Google's AI model family — text, reasoning, code, and some image support — accessed via Google AI Studio or Vertex AI, using REST or SDKs. It is part of the broader Google Cloud ecosystem and integrates with BigQuery and Cloud Storage.

OpenAI vs Google AI

Aspect	OpenAI API	Google AI API
Focus	Simple APIs, fast developer onboarding	Enterprise integration, cloud-native
Platform style	Standalone AI platform	Part of Google Cloud
Billing	Token-based pricing	Integrated with Google Cloud billing
Best for	Startups, rapid development	Large-scale enterprise applications

Python · calling the OpenAI API

import os
from openai import OpenAI

# Key is read from an environment variable - NEVER hard-coded
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Explain RAG in one line."}]
)
print(response.choices[0].message.content)

OutputRAG retrieves relevant documents and feeds them to an LLM so its answers are grounded in real, external data.

12.3 API Key Management

API Key — a unique secret identifier that authenticates each API request: it tells the provider who is calling, what access is allowed, and lets them track usage and billing.

API key lifecycle

Stage	What happens
Creation	Generate the key in the provider's dashboard
Configuration	Set permissions / usage restrictions
Usage	The application uses the key to authenticate calls
Rotation	Periodically replace the key to reduce risk
Revocation	Immediately disable compromised or unused keys

⚠️ Never hard-code API keys A key committed to a public GitHub repo can be found and abused within minutes, causing huge unexpected bills and letting attackers generate harmful content under your account. Keys are sensitive credentials — treat them like passwords.

Best practices for storing keys

Store keys in environment variables, not in source code.
Use a secrets manager for stronger protection.
Never expose keys in frontend code — the browser can be inspected by anyone.
Keep API calls on the backend server.
Rotate keys periodically and limit access by role/environment.

🔑 Why backend, not frontend? Frontend code runs in the user's browser — anyone can open developer tools and read it. If your API key is there, it is stolen instantly. Keeping the key and the API call on the backend server hides the key from users.

12.4 Costs, Billing & Rate Limits

Token-based pricing

OpenAI charges based on the number of tokens processed — and this counts both directions:

Input tokens — the size of your prompt.
Output tokens — the size of the model's response.

Different models have different per-token prices: lightweight models are cheaper but less capable; advanced models cost more but reason better. Pricing is usage-based with no fixed monthly fee by default.

Rate Limits

Rate Limit — a cap on how many API requests are allowed in a given time period. It protects systems from overload, ensures fair usage, and enforces pricing tiers.

Limits may apply per API key, per user, or per IP. Exceeding them temporarily blocks requests.

🔑 HTTP 429 — Too Many Requests When you exceed a rate limit, the API returns HTTP 429. The response may include a Retry-After header. Other common codes: 401 = authentication issue, 403 = access restricted.

Handling rate limits gracefully

Exponential backoff — on a 429, wait, retry; if it fails again wait longer, retry; keep doubling the wait. This avoids hammering the server.
Queue requests instead of sending bursts.
Cache results to avoid duplicate calls.
Monitor usage; upgrade the plan for higher throughput.

Python · exponential backoff on HTTP 429

import time

def call_with_backoff(make_request, max_retries=5):
    delay = 1
    for attempt in range(max_retries):
        response = make_request()
        if response.status_code != 429:        # success or other error
            return response
        print(f"Rate limited. Waiting {delay}s...")
        time.sleep(delay)
        delay *= 2                             # double the wait each time
    raise Exception("Still rate-limited after retries")

OutputRate limited. Waiting 1s... Rate limited. Waiting 2s... (request succeeds on 3rd attempt)

12.5 Word Embeddings via API

Both OpenAI and Google offer dedicated embedding models via API. They convert text into fixed-length numerical vectors that capture semantic meaning — the foundation of search, recommendations and RAG (Lecture 16).

Similar texts → vectors that are close in vector space.
Embeddings are generated once and reused — far cheaper than re-generating text.
They are stored in vector databases for fast similarity search.

Python · getting an embedding from the OpenAI API

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="How many vacation days do I get?"
)
vector = response.data[0].embedding
print("Embedding length:", len(vector))   # a fixed-length numeric vector

OutputEmbedding length: 1536

💡 Tip — why embeddings beat keyword search Keyword search needs the exact word. Embeddings match by meaning — a search for "vacation days" can find a document that says "earned leave", because the two phrases have similar vectors.

? Practice Questions

API security and rate limits are common MCQ material.

MCQQ1API basics

The main benefit of using a commercial AI API is that:

A You must buy your own GPUs
B You can use powerful models without training or hosting them yourself
C The model runs offline on your phone
D It is always completely free

Answer: B

The provider hosts the models and manages the GPUs/scaling. You just send HTTP requests — no training or infrastructure needed.

MCQQ2API key

An API key is primarily used to:

A Speed up the model
B Authenticate the request and track usage/billing
C Translate the response
D Store the model's weights

Answer: B

The key identifies who is calling, what access they have, and lets the provider track usage and apply billing and rate limits.

MCQQ3Security

Where should an API key be stored in a production application?

A Hard-coded directly in the source code
B In the frontend JavaScript
C In environment variables or a secrets manager, used by the backend
D In a public GitHub repository

Answer: C

Keys belong in environment variables / secrets managers and should only be used server-side. Hard-coding or exposing them in frontend/public repos leads to instant theft.

MCQQ4Frontend/backend

Why must the API call code be kept on the backend, not the frontend?

A The frontend cannot do maths
B Frontend code is visible to users, so the key would be exposed and stolen
C The backend is always faster
D APIs only work on servers

Answer: B

Anyone can open browser dev tools and read frontend code. Keeping the key and call on the backend hides the credential from users.

MCQQ5Pricing

OpenAI's token-based pricing charges for:

A Only the input prompt tokens
B Only the output response tokens
C Both input (prompt) and output (response) tokens
D Only the number of API keys you own

Answer: C

Both the prompt and the generated response count toward token usage — that is why concise prompts and outputs reduce cost.

MCQQ6Rate limits

Which HTTP status code means "Too Many Requests" (rate limit exceeded)?

A 401
B 403
C 429
D 200

Answer: C

HTTP 429 = Too Many Requests. 401 = authentication issue, 403 = access restricted, 200 = success.

MCQQ7Backoff

"Exponential backoff" means that after each failed retry you:

A Send more requests at once
B Wait progressively longer before retrying
C Give up immediately
D Change the API key

Answer: B

Exponential backoff doubles the wait time after each failure (1s, 2s, 4s…), reducing load on the server and improving the chance of eventual success.

MCQQ8Embeddings

Why are embeddings better than keyword search?

A They require the exact same word to match
B They match by meaning, so related phrases ("vacation" ≈ "earned leave") match
C They never need a vector database
D They only work on numbers

Answer: B

Embeddings capture semantic meaning, so semantically similar texts have nearby vectors — even when they share no exact keywords.

Short AnswerQ9Security

What happens if an API key is leaked publicly, and what two actions should you take?

Model answer

A leaked key can be used by attackers to make requests on your account — causing unexpected billing spikes and possibly generating harmful content under your name. You should immediately revoke (disable) the compromised key and generate a new one (rotate), then update your application to use it from a secure store.

CodingQ10Secure key

Write code that creates an OpenAI client by reading the API key securely from an environment variable.

Solution

Python

import os
from openai import OpenAI

# Read the key from the environment - never hard-code it
api_key = os.environ.get("OPENAI_API_KEY")
if api_key is None:
    raise ValueError("OPENAI_API_KEY not set")

client = OpenAI(api_key=api_key)
print("Client created securely.")

OutputClient created securely.

The key lives in the environment, not in the code, so it is never committed to version control.

CodingQ11API call

Write code to send a chat request to the OpenAI API asking it to "Summarise photosynthesis in one sentence" and print the reply.

Solution

Python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user",
         "content": "Summarise photosynthesis in one sentence"}
    ]
)
print(response.choices[0].message.content)

OutputPhotosynthesis is the process by which plants convert sunlight, water and carbon dioxide into glucose and oxygen.

Short AnswerQ12Concept

Why do AI providers enforce rate limits, and name two strategies to handle them gracefully.

Model answer

AI inference is compute-heavy, so rate limits prevent system overload during traffic spikes, ensure fair usage for all customers, maintain low latency, and enforce pricing tiers. Two graceful strategies: exponential backoff (wait progressively longer before each retry) and caching results / queuing requests to avoid bursts and duplicate calls.

🎯 Lecture 12 — must-remember Commercial APIs = hosted models via HTTP/JSON, pay-as-you-go. API key authenticates & tracks usage — store in env vars / secrets manager, never hard-code or expose in frontend. Lifecycle: create → configure → use → rotate → revoke. Token pricing = input + output tokens. HTTP 429 = rate limit → use exponential backoff. Embedding APIs power semantic search & RAG.

← Previous

Generative AI Modalities

Prompt Engineering