What Is LLM Ad Infrastructure?
LLM ad infrastructure is the emerging stack of systems that make advertising possible inside AI assistants. It is the plumbing beneath the ad you see under a ChatGPT answer: the demand-and-auction engine that selects which ad appears, the context signals that decide whether your ad is relevant to the conversation, the creative-generation pipeline that produces the ad unit itself, and the measurement layer that ties an in-chat impression back to a conversion. When people ask “how do ads work inside AI,” the honest answer is that they work through this stack, most of which did not exist eighteen months ago.
The clearest way to understand the concept is to contrast it with the two ad stacks that came before it. Search advertising is built around a results page: a query, a set of keywords, an auction for positions on a SERP, and a click that can be tracked to a landing page. Social advertising is built around a feed: an interest and behavior profile assembled from cookies and pixels, an auction for impressions in an infinite scroll, and a creative sized for that feed. Both stacks assume a surface (a page of results, a feed of posts), a targeting primitive (keywords, audience profiles), and a measurable click.
Advertising inside an LLM has none of those. There is no SERP and no feed. There is a single conversation, rendered one turn at a time. There are no keywords to bid on and, increasingly, no third-party cookies to profile with. And there is often no clean click, because the assistant may answer the question, cite the brand, and shape the decision without the user ever leaving the chat. Remove the page, the feed, the keyword, the cookie, and the click, and almost every assumption of the old ad stack falls away. LLM ad infrastructure is what replaces those assumptions.
800M+
weekly active users of ChatGPT, the surface where in-conversation advertising is now live
This matters because the surface is enormous and it is monetizing fast. OpenAI began showing ads inside ChatGPT in February 2026 and opened self-serve buying at ads.openai.com in April 2026. With more than 800 million weekly users, ChatGPT is now one of the largest ad surfaces on the internet, and it behaves nothing like the surfaces advertisers spent two decades learning. Understanding the infrastructure is no longer academic; it is the prerequisite for spending money there without wasting it.
Why a New Stack Is Forming
Four structural shifts are forcing a new ad stack into existence at the same time. None of them is optional, and together they explain why the old tooling cannot simply be ported over.
The surface is conversational, not a page. A search result or a feed post is a fixed slot you can design once and reuse. A conversation is different every time. The ad has to fit the flow of a specific exchange, appear at the moment it is genuinely relevant, and read as a helpful continuation rather than an interruption. There is no evergreen “banner” that works across every conversation, which means the creative has to be produced for context, not just for placement.
The creative is generative, produced at machine speed. Because each assistant renders ads in its own native format, and because conversational relevance rewards specificity, the number of creatives an advertiser needs explodes. You are no longer making three hero images for a campaign; you are producing many tightly targeted units per assistant, per format, per topic cluster. That volume is only achievable with generative production.
Targeting is privacy-first and contextual. The industry is moving away from third-party cookies and cross-site identity graphs. Inside an LLM, targeting is done through context: the assistant matches an ad to the meaning of the live conversation rather than to a stored profile of the person. This is a genuinely different primitive, and it changes both who can be reached and how.
Attribution has a conversation gap. When an assistant can research, compare, and recommend without a click, the standard last-click model breaks. A meaningful share of the influence an ad has on a purchase now happens inside the conversation, invisible to a pixel that only fires on a landing page. Measurement has to be rebuilt to account for that.
$1.17 trillion
projected worldwide media ad spending in 2026, with digital now roughly three-quarters of the total and the fastest-growing engine
The stakes are set by the size of the market that is now in motion. Worldwide media ad spending is projected to reach roughly $1.17 trillion in 2026, and the AI portion of marketing technology is compounding faster than almost any other category. Industry estimates put the global AI-in-marketing market at around $47 billion in 2025, projected to more than double to roughly $107 billion by 2028. As assistants capture attention that used to flow to search and social, a proportional share of that spend will follow, and it will need infrastructure to land on. For a fuller tour of the surfaces themselves, see our guide to advertising in AI assistants across ChatGPT, Gemini, and Perplexity.
~$47B → ~$107B
projected growth of the global AI-in-marketing market from 2025 to 2028 as AI moves from experiment to core infrastructure
Layer 1: Demand and Auction
The demand-and-auction layer is the marketplace that decides which ad appears in a given moment. In the search and social world, this is Google Ads and Meta Ads Manager. Inside LLMs, it is being built by the model providers themselves, and it works differently enough to deserve close attention.
OpenAI’s system is the reference implementation. Ads appear as a single clearly labeled sponsored card beneath the assistant’s answer: at most one ad per response, so there is no cluttered results page to compete on. Placement is decided by a relevance-weighted second-price auction. That phrase does a lot of work. “Second-price” means the winner pays just above the next-highest relevant bid. “Relevance-weighted” means the bid is multiplied by how well the ad fits the conversation, so a tightly relevant ad at a lower bid can outrank a generic ad at a higher one. In practice, ranking is roughly bid times relevance, evaluated across signals like your context hints, your ad copy, and your landing page. Pricing runs on CPM (a default max bid around $60) or CPC (a recommended $3 to $5 range), with cost-per-action buying reported to be in development.
The strategic consequence is that specificity beats spend. Because relevance multiplies the bid, the advertiser who produces a precisely matched creative for a narrow context can win placements away from a bigger budget. That reward for specificity is exactly what makes the creative layer so important later in this stack.
Who Operates This Layer
This layer is owned by whoever owns the model and the surface, and the field is already sorting itself out. OpenAI operates the most developed self-serve marketplace. Google is inserting ads into its AI Mode and AI Overviews, largely by making existing campaigns eligible for those placements rather than by asking advertisers to buy them separately. Microsoft is enabling brand experiences and shopping agents inside Copilot. Perplexity was first to experiment with sponsored follow-up questions but stepped back from advertising to focus on a subscription model. And Anthropic has publicly committed to keeping Claude ad-free, drawing a deliberate line between assistants that monetize with ads and assistants that do not.
The takeaway for advertisers is blunt: you do not build this layer, and you cannot negotiate its rules. You buy into it on the operator’s terms. For a deeper look at how buying works on the largest of these marketplaces, see our complete guide to ChatGPT Ads.
Layer 2: Context and Identity
If Layer 1 is the auction, Layer 2 is what the auction matches on. This is the targeting primitive of the LLM ad stack, and it is the cleanest break from everything that came before.
In search, you bid on keywords. In social, you target audience profiles built from cookies, pixels, and cross-site behavior. Inside an LLM, you do neither. Instead, advertisers supply context hints: short, natural-language descriptions of the conversations where a product genuinely belongs, set at the ad-group level. The system uses those hints as semantic guidance, matching your ad to the meaning and intent of the live conversation. Hints are explicitly not exact-match keywords, and they do not guarantee delivery in any specific conversation. They describe a topic cluster and a mindset, and the relevance engine does the matching in real time.
This is contextual targeting reinvented for a semantic surface. Rather than following a person around the web with an identity graph, the system reads the situation the person is in right now and places a relevant ad into it. When the conversation ends, so does the signal. There is no durable profile being sold or synced across sites.
This has a practical implication that surprises most performance marketers: the language of your creative and your landing page is part of your targeting. Because the relevance engine reads copy and destination as signals, a vague ad is not just less persuasive, it is less deliverable. The ad and its context are welded together, which raises the bar for how much creative you need and how precisely each unit is written.
Layer 3: Creative Generation
Creative generation is where the stack meets a hard human limit, and it is the layer this article argues you must automate. In the search and social eras, creative was a bottleneck but a survivable one: a team could produce a handful of variants per campaign and lean on the platform to distribute them. In the LLM era, that math collapses.
Consider what conversational advertising actually demands. Each assistant renders ads in its own native format, so a single campaign now needs a ChatGPT sponsored card, and if you also run traditional placements, a Meta creative, a Google asset set, a Reddit unit, and a LinkedIn variant. Multiply those formats by the number of topic clusters you want to be relevant to, then multiply again by the variants each context needs to stay fresh and to win a relevance-weighted auction. A modest campaign that once needed five creatives now needs dozens or hundreds, each one specific, on-brand, and correctly sized for its surface.
No human team scales to that volume without either blowing the budget or lowering quality until relevance collapses. This is the layer humans cannot do by hand, and it is precisely why generative production stops being a convenience and becomes infrastructure. The same shift toward autonomous, goal-driven systems is happening across the workflow, as covered in our guide to agentic ads, but creative is where the volume pressure lands first and hardest.
87%
of high-performing marketing teams now use AI, versus 77% of underperformers, with creative production a leading use case
This is the layer Lapis owns. From a single prompt, Lapis generates production-ready, per-format creative for ChatGPT alongside Meta, Google, Reddit, and LinkedIn in under three minutes, with brand identity, copy, and correct sizing applied automatically. It is the first and only AI ad platform ready for ChatGPT, which means it produces the native sponsored-card format the new surface requires rather than forcing a resized feed ad into a place it does not fit. When the auction rewards specificity and the targeting engine reads your copy, a system that can produce many precise, on-brand variants on demand is not a nice-to-have. It is the difference between competing in the auction and being priced out of it.
Layer 4: Measurement and Attribution
The final layer answers the question every finance team will ask: did it work? In the LLM stack, that question is harder than it has ever been, because of what the industry now calls the conversation gap.
The conversation gap is the influence an assistant exerts that no click can capture. A user asks the assistant to compare options, the assistant surfaces and describes a brand, the user forms a preference, and then converts later through a branded search, a direct visit, or an offline channel. The ad did its job inside the conversation, but the last-click model credits whatever touch happened to be closest to the purchase. Industry estimates suggest that as much as 60% of the conversions influenced by an in-assistant ad occur outside the measurable click window, which means teams relying on last-click attribution will systematically undervalue this channel and underfund it.
Rebuilding measurement for this reality rests on three tools. First, server-side event tracking: the OpenAI pixel and Conversions API, alongside Meta’s Conversions API and Google’s Enhanced Conversions, so that conversion signals flow back even when browser-side tracking is blocked. Second, disciplined tagging and creative-level analytics that connect a specific generated variant to the sessions and outcomes it drove. Third, incrementality testing, geo-based holdouts and lift studies that measure the causal impact of the channel rather than trusting a click path. For a step-by-step implementation, see our ChatGPT Ads conversion tracking pixel and API guide.
~60%
of conversions influenced by an in-assistant ad are estimated to happen outside the measurable click window (the conversation gap)
The practical guidance is to measure this channel the way sophisticated brands already measure connected TV or influencer marketing: with a blend of server-side signals, incrementality, and modeled attribution rather than a single click-based number. The teams that adapt their measurement will keep investing while the gap is wide and cheap; the teams that do not will conclude the channel “does not convert” and cede it to competitors.
Who Plays Where in the Stack
Putting the four layers together produces a map of the whole stack, and of who controls each part of it. The pattern is consistent: model providers own the layers closest to the surface, and advertisers own the layers closest to the creative and the customer.
| Layer | What It Does | Targeting Primitive | Who Controls It |
|---|---|---|---|
| 1. Demand and auction | Selects which ad shows and at what price | Relevance-weighted second-price bid | Model providers |
| 2. Context and identity | Matches ads to the live conversation | Context hints and topic clusters | Model providers (advertisers supply hints) |
| 3. Creative generation | Produces per-assistant creative at volume | Prompt to per-format output | Advertisers (via platforms like Lapis) |
| 4. Measurement and attribution | Connects impressions to revenue | Server-side events and incrementality | Advertisers (with provider pixels) |
Mapping representative players to those layers shows where the market is dense and where it is thin. The demand and context layers are crowded with the largest technology companies in the world. The creative and campaign layers are where independent platforms operate, and where the advertiser actually has leverage.
| Layer | Representative Players | Advertiser’s Move |
|---|---|---|
| Demand and auction | OpenAI (ChatGPT Ads), Google (AI Mode and AI Overviews), Microsoft (Copilot) | Buy in on the operator’s terms |
| Context and identity | Provider targeting engines; answer-engine visibility tools | Write precise context hints |
| Creative and campaign | Lapis (creative generation, forecasting, Campaign Studio) | Own it; automate production at volume |
| Measurement and attribution | Provider pixels and Conversions APIs, GA4, Lapis Web Analytics, incrementality tools | Blend server-side signals with lift |
Build vs Adopt: What Advertisers Should Own
Once the stack is clear, the strategy question answers itself layer by layer. The rule is simple: do not try to build what the model providers own, and do not try to hand-produce what only automation can scale.
Demand and auction: adopt, always. You cannot build an ad marketplace inside someone else’s model. Your job is to buy into it well, which means understanding the auction mechanics rather than reinventing them. Treat the operator’s rules as fixed constraints and optimize within them.
Context and identity: adopt the engine, own the inputs. The matching engine belongs to the provider, but the context hints, ad copy, and landing pages that feed it are yours. This is where advertiser skill compounds: the brand that writes the most precise hints and aligns its creative to them wins relevance it did not have to pay for.
Creative generation: adopt a platform, do not build a factory. This is the layer most teams get wrong. In-house creative feels ownable, but the volume and format math of conversational advertising breaks any manual process. Building your own generation pipeline means competing with venture-funded platforms on model quality and format coverage, which is not a good use of a marketing budget. Adopt a purpose-built platform and redirect your people to strategy and judgment.
Measurement: own the discipline, adopt the tools. Provider pixels and Conversions APIs are adopted, but the measurement philosophy, insisting on incrementality and creative-level tagging rather than last-click, is a discipline you own. For a full walkthrough of assembling these pieces into a working system, see our guide on how to build an LLM advertising stack.
Where to start is the same for almost every team: adopt the auction as-is, sharpen your context inputs, automate creative first because it is the binding constraint, and rebuild measurement in parallel so you can prove the channel before competitors crowd in.
Lapis as the Creative and Campaign Layer
Lapis is built to be the creative and campaign infrastructure layer of this new stack. The thesis is straightforward: model providers will own demand, auction, and context, and they will keep those layers closed. The durable, ownable advantage for advertisers is the ability to produce precise, on-brand creative for every surface at machine speed, and to run the campaign workflow around it. That is the layer Lapis occupies.
In practice, it starts with one prompt. Lapis generates production-ready, per-format creative for ChatGPT alongside Meta, Google, Reddit, and LinkedIn in under three minutes, inheriting your brand identity automatically and sizing each unit for its surface, including the native ChatGPT sponsored-card format. Because it is the first and only AI ad platform ready for ChatGPT, it produces the format the new demand layer actually accepts rather than a resized social ad. On top of generation sit the rest of the campaign primitives: Performance Forecasting to predict which variants will perform before you spend, Competitor Tracking to see what rivals are running across surfaces, Web Analytics and attribution to close the conversation gap with creative-level data, and Campaign Studio to plan and orchestrate the whole workflow in one place.
The credibility behind that is concrete. Lapis is a Y Combinator company (F25), rated 5.0 on G2, and has powered more than 10,000 campaigns across 30-plus industries. It offers a free tier to start, with the Pro plan at $599 per month recommended for teams running paid campaigns at volume. The point is not that Lapis replaces the model providers; it is that Lapis is the layer they will never build for you, and the one that determines whether you compete in their auctions or get priced out of them.
Advertising has moved inside the assistant, and the infrastructure to reach people there is being poured right now. The demand and context layers are already spoken for. The creative and campaign layer is where the advertisers who move first will build a lasting edge. Start with Lapis and own it.