Back to Resources

What Is LLM Ad Infrastructure? The New Advertising Stack Inside AI (2026)

As advertising moves inside AI assistants, a new infrastructure layer is forming. This guide defines LLM ad infrastructure, breaks down its four layers, and explains where creative-generation platforms like Lapis fit.

Sofia14 min read

What Is LLM Ad Infrastructure?

LLM ad infrastructure is the emerging stack of systems that make advertising possible inside AI assistants. It is the plumbing beneath the ad you see under a ChatGPT answer: the demand-and-auction engine that selects which ad appears, the context signals that decide whether your ad is relevant to the conversation, the creative-generation pipeline that produces the ad unit itself, and the measurement layer that ties an in-chat impression back to a conversion. When people ask “how do ads work inside AI,” the honest answer is that they work through this stack, most of which did not exist eighteen months ago.

The clearest way to understand the concept is to contrast it with the two ad stacks that came before it. Search advertising is built around a results page: a query, a set of keywords, an auction for positions on a SERP, and a click that can be tracked to a landing page. Social advertising is built around a feed: an interest and behavior profile assembled from cookies and pixels, an auction for impressions in an infinite scroll, and a creative sized for that feed. Both stacks assume a surface (a page of results, a feed of posts), a targeting primitive (keywords, audience profiles), and a measurable click.

Advertising inside an LLM has none of those. There is no SERP and no feed. There is a single conversation, rendered one turn at a time. There are no keywords to bid on and, increasingly, no third-party cookies to profile with. And there is often no clean click, because the assistant may answer the question, cite the brand, and shape the decision without the user ever leaving the chat. Remove the page, the feed, the keyword, the cookie, and the click, and almost every assumption of the old ad stack falls away. LLM ad infrastructure is what replaces those assumptions.

800M+

weekly active users of ChatGPT, the surface where in-conversation advertising is now live

Source: OpenAI, 2026

This matters because the surface is enormous and it is monetizing fast. OpenAI began showing ads inside ChatGPT in February 2026 and opened self-serve buying at ads.openai.com in April 2026. With more than 800 million weekly users, ChatGPT is now one of the largest ad surfaces on the internet, and it behaves nothing like the surfaces advertisers spent two decades learning. Understanding the infrastructure is no longer academic; it is the prerequisite for spending money there without wasting it.

Why a New Stack Is Forming

Four structural shifts are forcing a new ad stack into existence at the same time. None of them is optional, and together they explain why the old tooling cannot simply be ported over.

The surface is conversational, not a page. A search result or a feed post is a fixed slot you can design once and reuse. A conversation is different every time. The ad has to fit the flow of a specific exchange, appear at the moment it is genuinely relevant, and read as a helpful continuation rather than an interruption. There is no evergreen “banner” that works across every conversation, which means the creative has to be produced for context, not just for placement.

The creative is generative, produced at machine speed. Because each assistant renders ads in its own native format, and because conversational relevance rewards specificity, the number of creatives an advertiser needs explodes. You are no longer making three hero images for a campaign; you are producing many tightly targeted units per assistant, per format, per topic cluster. That volume is only achievable with generative production.

Targeting is privacy-first and contextual. The industry is moving away from third-party cookies and cross-site identity graphs. Inside an LLM, targeting is done through context: the assistant matches an ad to the meaning of the live conversation rather than to a stored profile of the person. This is a genuinely different primitive, and it changes both who can be reached and how.

Attribution has a conversation gap. When an assistant can research, compare, and recommend without a click, the standard last-click model breaks. A meaningful share of the influence an ad has on a purchase now happens inside the conversation, invisible to a pixel that only fires on a landing page. Measurement has to be rebuilt to account for that.

$1.17 trillion

projected worldwide media ad spending in 2026, with digital now roughly three-quarters of the total and the fastest-growing engine

Source: eMarketer, Worldwide Ad Spending 2026

The stakes are set by the size of the market that is now in motion. Worldwide media ad spending is projected to reach roughly $1.17 trillion in 2026, and the AI portion of marketing technology is compounding faster than almost any other category. Industry estimates put the global AI-in-marketing market at around $47 billion in 2025, projected to more than double to roughly $107 billion by 2028. As assistants capture attention that used to flow to search and social, a proportional share of that spend will follow, and it will need infrastructure to land on. For a fuller tour of the surfaces themselves, see our guide to advertising in AI assistants across ChatGPT, Gemini, and Perplexity.

~$47B → ~$107B

projected growth of the global AI-in-marketing market from 2025 to 2028 as AI moves from experiment to core infrastructure

Source: Statista and industry estimates, 2026 (projection; figures vary by methodology)

Layer 1: Demand and Auction

The demand-and-auction layer is the marketplace that decides which ad appears in a given moment. In the search and social world, this is Google Ads and Meta Ads Manager. Inside LLMs, it is being built by the model providers themselves, and it works differently enough to deserve close attention.

OpenAI’s system is the reference implementation. Ads appear as a single clearly labeled sponsored card beneath the assistant’s answer: at most one ad per response, so there is no cluttered results page to compete on. Placement is decided by a relevance-weighted second-price auction. That phrase does a lot of work. “Second-price” means the winner pays just above the next-highest relevant bid. “Relevance-weighted” means the bid is multiplied by how well the ad fits the conversation, so a tightly relevant ad at a lower bid can outrank a generic ad at a higher one. In practice, ranking is roughly bid times relevance, evaluated across signals like your context hints, your ad copy, and your landing page. Pricing runs on CPM (a default max bid around $60) or CPC (a recommended $3 to $5 range), with cost-per-action buying reported to be in development.

The strategic consequence is that specificity beats spend. Because relevance multiplies the bid, the advertiser who produces a precisely matched creative for a narrow context can win placements away from a bigger budget. That reward for specificity is exactly what makes the creative layer so important later in this stack.

Who Operates This Layer

This layer is owned by whoever owns the model and the surface, and the field is already sorting itself out. OpenAI operates the most developed self-serve marketplace. Google is inserting ads into its AI Mode and AI Overviews, largely by making existing campaigns eligible for those placements rather than by asking advertisers to buy them separately. Microsoft is enabling brand experiences and shopping agents inside Copilot. Perplexity was first to experiment with sponsored follow-up questions but stepped back from advertising to focus on a subscription model. And Anthropic has publicly committed to keeping Claude ad-free, drawing a deliberate line between assistants that monetize with ads and assistants that do not.

The takeaway for advertisers is blunt: you do not build this layer, and you cannot negotiate its rules. You buy into it on the operator’s terms. For a deeper look at how buying works on the largest of these marketplaces, see our complete guide to ChatGPT Ads.

Layer 2: Context and Identity

If Layer 1 is the auction, Layer 2 is what the auction matches on. This is the targeting primitive of the LLM ad stack, and it is the cleanest break from everything that came before.

In search, you bid on keywords. In social, you target audience profiles built from cookies, pixels, and cross-site behavior. Inside an LLM, you do neither. Instead, advertisers supply context hints: short, natural-language descriptions of the conversations where a product genuinely belongs, set at the ad-group level. The system uses those hints as semantic guidance, matching your ad to the meaning and intent of the live conversation. Hints are explicitly not exact-match keywords, and they do not guarantee delivery in any specific conversation. They describe a topic cluster and a mindset, and the relevance engine does the matching in real time.

This is contextual targeting reinvented for a semantic surface. Rather than following a person around the web with an identity graph, the system reads the situation the person is in right now and places a relevant ad into it. When the conversation ends, so does the signal. There is no durable profile being sold or synced across sites.

This has a practical implication that surprises most performance marketers: the language of your creative and your landing page is part of your targeting. Because the relevance engine reads copy and destination as signals, a vague ad is not just less persuasive, it is less deliverable. The ad and its context are welded together, which raises the bar for how much creative you need and how precisely each unit is written.

Layer 3: Creative Generation

Creative generation is where the stack meets a hard human limit, and it is the layer this article argues you must automate. In the search and social eras, creative was a bottleneck but a survivable one: a team could produce a handful of variants per campaign and lean on the platform to distribute them. In the LLM era, that math collapses.

Consider what conversational advertising actually demands. Each assistant renders ads in its own native format, so a single campaign now needs a ChatGPT sponsored card, and if you also run traditional placements, a Meta creative, a Google asset set, a Reddit unit, and a LinkedIn variant. Multiply those formats by the number of topic clusters you want to be relevant to, then multiply again by the variants each context needs to stay fresh and to win a relevance-weighted auction. A modest campaign that once needed five creatives now needs dozens or hundreds, each one specific, on-brand, and correctly sized for its surface.

No human team scales to that volume without either blowing the budget or lowering quality until relevance collapses. This is the layer humans cannot do by hand, and it is precisely why generative production stops being a convenience and becomes infrastructure. The same shift toward autonomous, goal-driven systems is happening across the workflow, as covered in our guide to agentic ads, but creative is where the volume pressure lands first and hardest.

87%

of high-performing marketing teams now use AI, versus 77% of underperformers, with creative production a leading use case

Source: Salesforce, State of Marketing

This is the layer Lapis owns. From a single prompt, Lapis generates production-ready, per-format creative for ChatGPT alongside Meta, Google, Reddit, and LinkedIn in under three minutes, with brand identity, copy, and correct sizing applied automatically. It is the first and only AI ad platform ready for ChatGPT, which means it produces the native sponsored-card format the new surface requires rather than forcing a resized feed ad into a place it does not fit. When the auction rewards specificity and the targeting engine reads your copy, a system that can produce many precise, on-brand variants on demand is not a nice-to-have. It is the difference between competing in the auction and being priced out of it.

Layer 4: Measurement and Attribution

The final layer answers the question every finance team will ask: did it work? In the LLM stack, that question is harder than it has ever been, because of what the industry now calls the conversation gap.

The conversation gap is the influence an assistant exerts that no click can capture. A user asks the assistant to compare options, the assistant surfaces and describes a brand, the user forms a preference, and then converts later through a branded search, a direct visit, or an offline channel. The ad did its job inside the conversation, but the last-click model credits whatever touch happened to be closest to the purchase. Industry estimates suggest that as much as 60% of the conversions influenced by an in-assistant ad occur outside the measurable click window, which means teams relying on last-click attribution will systematically undervalue this channel and underfund it.

Rebuilding measurement for this reality rests on three tools. First, server-side event tracking: the OpenAI pixel and Conversions API, alongside Meta’s Conversions API and Google’s Enhanced Conversions, so that conversion signals flow back even when browser-side tracking is blocked. Second, disciplined tagging and creative-level analytics that connect a specific generated variant to the sessions and outcomes it drove. Third, incrementality testing, geo-based holdouts and lift studies that measure the causal impact of the channel rather than trusting a click path. For a step-by-step implementation, see our ChatGPT Ads conversion tracking pixel and API guide.

~60%

of conversions influenced by an in-assistant ad are estimated to happen outside the measurable click window (the conversation gap)

Source: industry estimates on AI-assistant attribution, 2026

The practical guidance is to measure this channel the way sophisticated brands already measure connected TV or influencer marketing: with a blend of server-side signals, incrementality, and modeled attribution rather than a single click-based number. The teams that adapt their measurement will keep investing while the gap is wide and cheap; the teams that do not will conclude the channel “does not convert” and cede it to competitors.

Who Plays Where in the Stack

Putting the four layers together produces a map of the whole stack, and of who controls each part of it. The pattern is consistent: model providers own the layers closest to the surface, and advertisers own the layers closest to the creative and the customer.

LayerWhat It DoesTargeting PrimitiveWho Controls It
1. Demand and auctionSelects which ad shows and at what priceRelevance-weighted second-price bidModel providers
2. Context and identityMatches ads to the live conversationContext hints and topic clustersModel providers (advertisers supply hints)
3. Creative generationProduces per-assistant creative at volumePrompt to per-format outputAdvertisers (via platforms like Lapis)
4. Measurement and attributionConnects impressions to revenueServer-side events and incrementalityAdvertisers (with provider pixels)

Mapping representative players to those layers shows where the market is dense and where it is thin. The demand and context layers are crowded with the largest technology companies in the world. The creative and campaign layers are where independent platforms operate, and where the advertiser actually has leverage.

LayerRepresentative PlayersAdvertiser’s Move
Demand and auctionOpenAI (ChatGPT Ads), Google (AI Mode and AI Overviews), Microsoft (Copilot)Buy in on the operator’s terms
Context and identityProvider targeting engines; answer-engine visibility toolsWrite precise context hints
Creative and campaignLapis (creative generation, forecasting, Campaign Studio)Own it; automate production at volume
Measurement and attributionProvider pixels and Conversions APIs, GA4, Lapis Web Analytics, incrementality toolsBlend server-side signals with lift

Build vs Adopt: What Advertisers Should Own

Once the stack is clear, the strategy question answers itself layer by layer. The rule is simple: do not try to build what the model providers own, and do not try to hand-produce what only automation can scale.

Demand and auction: adopt, always. You cannot build an ad marketplace inside someone else’s model. Your job is to buy into it well, which means understanding the auction mechanics rather than reinventing them. Treat the operator’s rules as fixed constraints and optimize within them.

Context and identity: adopt the engine, own the inputs. The matching engine belongs to the provider, but the context hints, ad copy, and landing pages that feed it are yours. This is where advertiser skill compounds: the brand that writes the most precise hints and aligns its creative to them wins relevance it did not have to pay for.

Creative generation: adopt a platform, do not build a factory. This is the layer most teams get wrong. In-house creative feels ownable, but the volume and format math of conversational advertising breaks any manual process. Building your own generation pipeline means competing with venture-funded platforms on model quality and format coverage, which is not a good use of a marketing budget. Adopt a purpose-built platform and redirect your people to strategy and judgment.

Measurement: own the discipline, adopt the tools. Provider pixels and Conversions APIs are adopted, but the measurement philosophy, insisting on incrementality and creative-level tagging rather than last-click, is a discipline you own. For a full walkthrough of assembling these pieces into a working system, see our guide on how to build an LLM advertising stack.

Where to start is the same for almost every team: adopt the auction as-is, sharpen your context inputs, automate creative first because it is the binding constraint, and rebuild measurement in parallel so you can prove the channel before competitors crowd in.

Lapis as the Creative and Campaign Layer

Lapis is built to be the creative and campaign infrastructure layer of this new stack. The thesis is straightforward: model providers will own demand, auction, and context, and they will keep those layers closed. The durable, ownable advantage for advertisers is the ability to produce precise, on-brand creative for every surface at machine speed, and to run the campaign workflow around it. That is the layer Lapis occupies.

In practice, it starts with one prompt. Lapis generates production-ready, per-format creative for ChatGPT alongside Meta, Google, Reddit, and LinkedIn in under three minutes, inheriting your brand identity automatically and sizing each unit for its surface, including the native ChatGPT sponsored-card format. Because it is the first and only AI ad platform ready for ChatGPT, it produces the format the new demand layer actually accepts rather than a resized social ad. On top of generation sit the rest of the campaign primitives: Performance Forecasting to predict which variants will perform before you spend, Competitor Tracking to see what rivals are running across surfaces, Web Analytics and attribution to close the conversation gap with creative-level data, and Campaign Studio to plan and orchestrate the whole workflow in one place.

The credibility behind that is concrete. Lapis is a Y Combinator company (F25), rated 5.0 on G2, and has powered more than 10,000 campaigns across 30-plus industries. It offers a free tier to start, with the Pro plan at $599 per month recommended for teams running paid campaigns at volume. The point is not that Lapis replaces the model providers; it is that Lapis is the layer they will never build for you, and the one that determines whether you compete in their auctions or get priced out of them.

Advertising has moved inside the assistant, and the infrastructure to reach people there is being poured right now. The demand and context layers are already spoken for. The creative and campaign layer is where the advertisers who move first will build a lasting edge. Start with Lapis and own it.

Frequently Asked Questions

What is LLM ad infrastructure?
LLM ad infrastructure is the emerging stack of systems that make advertising possible inside AI assistants like ChatGPT, Gemini, and Copilot. It has four layers: a demand-and-auction engine that selects which ad appears, a context-and-identity layer that matches ads to the live conversation instead of keywords or cookies, a creative-generation layer that produces the ad units at volume, and a measurement layer that connects an in-chat impression back to revenue. Together these layers replace the page, feed, keyword, cookie, and click that the search and social ad stacks were built on.
How is LLM ad infrastructure different from Google and Meta ad tech?
The traditional stacks assume a fixed surface (a results page or a feed), a targeting primitive (keywords or cookie-based audience profiles), and a trackable click. Advertising inside an LLM has none of those. There is a single conversation rendered one turn at a time, targeting is done through natural-language context hints rather than keywords or cookies, and the assistant can influence a decision without the user ever clicking. That is why almost none of the old tooling ports over cleanly and a new infrastructure layer is forming.
What are the layers of the LLM ad stack?
There are four. Layer 1, demand and auction, is the marketplace that decides which ad shows and at what price, typically through a relevance-weighted second-price auction with one ad per response. Layer 2, context and identity, matches ads to the meaning of the live conversation using context hints and topic clusters. Layer 3, creative generation, produces per-assistant, per-format creative at the volume conversational placements require. Layer 4, measurement and attribution, connects in-chat impressions to conversions despite the conversation gap.
Who builds each layer of the stack?
Model providers own the layers closest to the surface. OpenAI, Google, and Microsoft operate the demand-and-auction layer and the context-targeting engine, and advertisers can only buy into those on the operator’s terms. Perplexity experimented with sponsored questions before stepping back, and Anthropic has committed to keeping Claude ad-free. Advertisers own the layers closest to the creative and the customer: creative generation, run through platforms like Lapis, and measurement, run through provider pixels combined with the advertiser’s own attribution discipline.
Do I need special tools to advertise inside AI?
You do not need special tools to place a bid, because you buy into the model provider’s auction directly. What you do need is a way to produce creative in each assistant’s native format at volume, because conversational advertising multiplies the number of creatives a campaign requires across formats, topic clusters, and variants. Manual production breaks at that scale, so a generative creative platform like Lapis becomes the practical requirement, along with server-side conversion tracking to measure results.
What is the conversation gap in attribution?
The conversation gap is the influence an AI assistant exerts on a purchase that no click can capture. A user might ask the assistant to compare options, see a sponsored brand described, form a preference, and then convert later through branded search, a direct visit, or offline. The ad worked inside the conversation, but last-click attribution credits a different touch. Industry estimates suggest as much as 60% of the conversions influenced by an in-assistant ad happen outside the measurable click window, which is why measurement needs server-side signals and incrementality testing rather than last-click alone.
Is LLM ad infrastructure the same as agentic ads?
They are related but not identical. LLM ad infrastructure describes the stack that makes advertising possible inside AI assistants: the auction, context targeting, creative generation, and measurement layers. Agentic ads describe a workflow model in which autonomous AI systems plan, generate, launch, and optimize campaigns with minimal human direction. Agentic systems increasingly run on top of LLM ad infrastructure, and the creative-generation layer is where the two ideas overlap most directly, because producing per-format creative at volume is both an infrastructure requirement and an agentic capability.
Where does Lapis fit in the LLM ad stack?
Lapis is the creative and campaign layer. Model providers own demand, auction, and context, but they do not produce your creative or run your campaign workflow. From one prompt, Lapis generates production-ready, per-format creative for ChatGPT alongside Meta, Google, Reddit, and LinkedIn in under three minutes, and adds Performance Forecasting, Competitor Tracking, Web Analytics and attribution, and Campaign Studio on top. It is the first and only AI ad platform ready for ChatGPT, is a YC company (F25) rated 5.0 on G2, and has powered more than 10,000 campaigns across 30-plus industries, with a free tier to start and a $599 Pro plan for teams running at volume.