GenAI Exam Prep
Home Mock Exam
⚡ LECTURE 12

GenAI Commercial APIs

Use powerful AI models without training your own. Learn the OpenAI and Google AI APIs, how to manage API keys securely, how billing and rate limits work, and how to get embeddings via API.

Syllabus topics 45–49 ⏱ ~24 min read 12 practice questions

12.1 What are Commercial AI APIs?

Commercial AI API — a service that gives secure, reliable access to powerful cloud-hosted AI models through simple HTTP requests, without you having to train or host the model yourself.

The provider manages all the heavy infrastructure — GPUs, scaling, updates. You just send a request and get a response. Key characteristics:

12.2 OpenAI API & Google AI API

OpenAI API

Gives access to OpenAI's models via HTTP requests for text generation, summarization, translation and embeddings. Core components:

ComponentPurpose
Client AppSends API requests
API GatewayAuthentication & routing
AI ModelsGenerate responses
Usage TrackerBilling & limits
Response HandlerReturns JSON output

OpenAI offers chat models (conversations, Q&A), embedding models (search, RAG), lightweight models (low-cost tasks) and advanced models (complex reasoning).

Google AI API (Gemini)

Gemini is Google's AI model family — text, reasoning, code, and some image support — accessed via Google AI Studio or Vertex AI, using REST or SDKs. It is part of the broader Google Cloud ecosystem and integrates with BigQuery and Cloud Storage.

OpenAI vs Google AI

AspectOpenAI APIGoogle AI API
FocusSimple APIs, fast developer onboardingEnterprise integration, cloud-native
Platform styleStandalone AI platformPart of Google Cloud
BillingToken-based pricingIntegrated with Google Cloud billing
Best forStartups, rapid developmentLarge-scale enterprise applications
Python · calling the OpenAI API
import os
from openai import OpenAI

# Key is read from an environment variable - NEVER hard-coded
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Explain RAG in one line."}]
)
print(response.choices[0].message.content)
OutputRAG retrieves relevant documents and feeds them to an LLM so its answers are grounded in real, external data.

12.3 API Key Management

API Key — a unique secret identifier that authenticates each API request: it tells the provider who is calling, what access is allowed, and lets them track usage and billing.

API key lifecycle

StageWhat happens
CreationGenerate the key in the provider's dashboard
ConfigurationSet permissions / usage restrictions
UsageThe application uses the key to authenticate calls
RotationPeriodically replace the key to reduce risk
RevocationImmediately disable compromised or unused keys
⚠️ Never hard-code API keys A key committed to a public GitHub repo can be found and abused within minutes, causing huge unexpected bills and letting attackers generate harmful content under your account. Keys are sensitive credentials — treat them like passwords.

Best practices for storing keys

🔑 Why backend, not frontend? Frontend code runs in the user's browser — anyone can open developer tools and read it. If your API key is there, it is stolen instantly. Keeping the key and the API call on the backend server hides the key from users.

12.4 Costs, Billing & Rate Limits

Token-based pricing

OpenAI charges based on the number of tokens processed — and this counts both directions:

Different models have different per-token prices: lightweight models are cheaper but less capable; advanced models cost more but reason better. Pricing is usage-based with no fixed monthly fee by default.

Rate Limits

Rate Limit — a cap on how many API requests are allowed in a given time period. It protects systems from overload, ensures fair usage, and enforces pricing tiers.

Limits may apply per API key, per user, or per IP. Exceeding them temporarily blocks requests.

🔑 HTTP 429 — Too Many Requests When you exceed a rate limit, the API returns HTTP 429. The response may include a Retry-After header. Other common codes: 401 = authentication issue, 403 = access restricted.

Handling rate limits gracefully

Python · exponential backoff on HTTP 429
import time

def call_with_backoff(make_request, max_retries=5):
    delay = 1
    for attempt in range(max_retries):
        response = make_request()
        if response.status_code != 429:        # success or other error
            return response
        print(f"Rate limited. Waiting {delay}s...")
        time.sleep(delay)
        delay *= 2                             # double the wait each time
    raise Exception("Still rate-limited after retries")
OutputRate limited. Waiting 1s... Rate limited. Waiting 2s... (request succeeds on 3rd attempt)

12.5 Word Embeddings via API

Both OpenAI and Google offer dedicated embedding models via API. They convert text into fixed-length numerical vectors that capture semantic meaning — the foundation of search, recommendations and RAG (Lecture 16).

Python · getting an embedding from the OpenAI API
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="How many vacation days do I get?"
)
vector = response.data[0].embedding
print("Embedding length:", len(vector))   # a fixed-length numeric vector
OutputEmbedding length: 1536
💡 Tip — why embeddings beat keyword search Keyword search needs the exact word. Embeddings match by meaning — a search for "vacation days" can find a document that says "earned leave", because the two phrases have similar vectors.
? Practice Questions

API security and rate limits are common MCQ material.

MCQQ1API basics

The main benefit of using a commercial AI API is that:

  • A You must buy your own GPUs
  • B You can use powerful models without training or hosting them yourself
  • C The model runs offline on your phone
  • D It is always completely free
Answer: B

The provider hosts the models and manages the GPUs/scaling. You just send HTTP requests — no training or infrastructure needed.

MCQQ2API key

An API key is primarily used to:

  • A Speed up the model
  • B Authenticate the request and track usage/billing
  • C Translate the response
  • D Store the model's weights
Answer: B

The key identifies who is calling, what access they have, and lets the provider track usage and apply billing and rate limits.

MCQQ3Security

Where should an API key be stored in a production application?

  • A Hard-coded directly in the source code
  • B In the frontend JavaScript
  • C In environment variables or a secrets manager, used by the backend
  • D In a public GitHub repository
Answer: C

Keys belong in environment variables / secrets managers and should only be used server-side. Hard-coding or exposing them in frontend/public repos leads to instant theft.

MCQQ4Frontend/backend

Why must the API call code be kept on the backend, not the frontend?

  • A The frontend cannot do maths
  • B Frontend code is visible to users, so the key would be exposed and stolen
  • C The backend is always faster
  • D APIs only work on servers
Answer: B

Anyone can open browser dev tools and read frontend code. Keeping the key and call on the backend hides the credential from users.

MCQQ5Pricing

OpenAI's token-based pricing charges for:

  • A Only the input prompt tokens
  • B Only the output response tokens
  • C Both input (prompt) and output (response) tokens
  • D Only the number of API keys you own
Answer: C

Both the prompt and the generated response count toward token usage — that is why concise prompts and outputs reduce cost.

MCQQ6Rate limits

Which HTTP status code means "Too Many Requests" (rate limit exceeded)?

  • A 401
  • B 403
  • C 429
  • D 200
Answer: C

HTTP 429 = Too Many Requests. 401 = authentication issue, 403 = access restricted, 200 = success.

MCQQ7Backoff

"Exponential backoff" means that after each failed retry you:

  • A Send more requests at once
  • B Wait progressively longer before retrying
  • C Give up immediately
  • D Change the API key
Answer: B

Exponential backoff doubles the wait time after each failure (1s, 2s, 4s…), reducing load on the server and improving the chance of eventual success.

MCQQ8Embeddings

Why are embeddings better than keyword search?

  • A They require the exact same word to match
  • B They match by meaning, so related phrases ("vacation" ≈ "earned leave") match
  • C They never need a vector database
  • D They only work on numbers
Answer: B

Embeddings capture semantic meaning, so semantically similar texts have nearby vectors — even when they share no exact keywords.

Short AnswerQ9Security

What happens if an API key is leaked publicly, and what two actions should you take?

Model answer

A leaked key can be used by attackers to make requests on your account — causing unexpected billing spikes and possibly generating harmful content under your name. You should immediately revoke (disable) the compromised key and generate a new one (rotate), then update your application to use it from a secure store.

CodingQ10Secure key

Write code that creates an OpenAI client by reading the API key securely from an environment variable.

Solution
Python
import os
from openai import OpenAI

# Read the key from the environment - never hard-code it
api_key = os.environ.get("OPENAI_API_KEY")
if api_key is None:
    raise ValueError("OPENAI_API_KEY not set")

client = OpenAI(api_key=api_key)
print("Client created securely.")
OutputClient created securely.

The key lives in the environment, not in the code, so it is never committed to version control.

CodingQ11API call

Write code to send a chat request to the OpenAI API asking it to "Summarise photosynthesis in one sentence" and print the reply.

Solution
Python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user",
         "content": "Summarise photosynthesis in one sentence"}
    ]
)
print(response.choices[0].message.content)
OutputPhotosynthesis is the process by which plants convert sunlight, water and carbon dioxide into glucose and oxygen.
Short AnswerQ12Concept

Why do AI providers enforce rate limits, and name two strategies to handle them gracefully.

Model answer

AI inference is compute-heavy, so rate limits prevent system overload during traffic spikes, ensure fair usage for all customers, maintain low latency, and enforce pricing tiers. Two graceful strategies: exponential backoff (wait progressively longer before each retry) and caching results / queuing requests to avoid bursts and duplicate calls.

🎯 Lecture 12 — must-remember Commercial APIs = hosted models via HTTP/JSON, pay-as-you-go. API key authenticates & tracks usage — store in env vars / secrets manager, never hard-code or expose in frontend. Lifecycle: create → configure → use → rotate → revoke. Token pricing = input + output tokens. HTTP 429 = rate limit → use exponential backoff. Embedding APIs power semantic search & RAG.