Skip to content

AI-Powered Stack — My Sitecore delivery stack for 2025

Created:
Updated:

Audience: Sitecore architects, project managers, and developers
Goal: Document the exact tools, services, and patterns I lean on as a company of one delivering XM Cloud, Search, Content Hub, and Customer Data Platform (CDP) / Personalize projects with AI coding agents.
Prerequisites: XM Cloud access, GitHub, and an Azure subscription for On Your Data networking.


AI-powered project delivery

Most of my Sitecore work now runs as a “company of one.” On paper I am the only engineer on the project; in practice I work with a small bench of AI coding agents that take on the roles a team used to cover: business analyst, architect, tech lead, full‑stack developer, and quality engineer. That mix only works because I treat agents as specialists, not generalists, and I wire them into a delivery loop that stays grounded in official documentation such as the Next.js App Router Content SDK guide, Bring your own components, Experience Edge best practices, and the Search JavaScript SDK for React integration guide.

On a typical sprint I start with reconnaissance prompts that map the legacy site, move into AI‑assisted requirements and estimates, and finish with agents scaffolding bring-your-own components (BYOC), tests, and deployment runbooks that I review. I still own every architectural decision and every merge, but I let agents handle the heavy lifting: crawling, normalization, boilerplate, and cross‑file edits. That is how I make one architect’s calendar cover discovery, build, and operations without burning out.

At a high level, the way I think about this stack looks like this:

Client brief

Planning agents

ChatGPT Pro, NotebookLM

Architecture & design agents

Claude Code, Figma + Model Context Protocol

Coding agents

ChatGPT Pro, Claude Code, Cursor

Testing agents

Playwright + Model Context Protocol

Delivery & governance

XM Cloud command-line tools, Warp


Core planning and coding agents

ChatGPT Pro – structured planning autopilot

ChatGPT Pro is my default planning surface. In one UI I get multiple GPT‑class models, a dedicated code profile, and tools like code execution and file uploads. I feed it sitemap crawls, GraphQL schemas, or Content Hub taxonomies and ask for delivery plans, migration checklists, or refactors that I can then push into version control. When I add up the Pro subscription, API usage, and occasional team add‑ons, I budget roughly $200/month for ChatGPT. That is a serious line item for a company of one, but it is still dramatically cheaper than keeping a part‑time architect or senior developer on retainer.

I used to treat ChatGPT as my “secondary” agent, but over the last year it has become my primary cockpit. The five models exposed in the Pro UI map neatly onto how I split work: a lighter model for quick edits and glue code, a full‑strength model for architecture reasoning, a code‑optimized model for refactors, and so on. I can keep one conversation focused on sprint planning, another on Content SDK integration details, and a third on test generation without cross‑contamination.

Claude Code – long-form reasoning desk

Claude Code is still the agent I trust most for narrative explanations and big diffs. The desktop app indexes my repo locally, so I can highlight a serialized template and ask it to map the fields into a Content SDK request while it cites the relevant rule from the docs. Historically I reached for Claude first and used ChatGPT as the backup; these days that has flipped. I lean on Claude when I need long‑form design notes, careful reasoning across many files, or a second opinion on an architecture proposal that ChatGPT drafted.

In practice I budget about $200/month for Claude as well once Pro, team seats, and API usage are factored in. That sounds expensive until you remember I am effectively buying a senior solution architect who can context‑switch across multiple XM Cloud repos in minutes.

NotebookLM – living project brief

NotebookLM is where I dump requirements, estimates, stakeholder notes, and recordings for each engagement. Because it behaves like a project‑specific retrieval‑augmented generation (RAG) workspace, I can ask “How did we model authors last sprint?” or “What did we promise in the governance workshop?” and get an answer sourced to the correct doc or transcript. Every agent query stays grounded in our own material, which keeps compliance and tone consistent. At the time of writing it is still offered as a US‑only beta rather than a paid SKU, so I treat it as “effectively free” until Google formalizes a price.


UI and UX research, critique, and scaffolds

Figma + Model Context Protocol bridge

Figma stays the single source of truth for design on my projects, and the Model Context Protocol bridge lets agents read component trees, measurements, and tokens directly. I ask Claude Code to double‑check spacing or typography using the same IDs designers see, which eliminates most “what token did you mean?” back‑and‑forth that used to burn hours in review.

UIzard – disposable ideation canvas

When stakeholders want three layout options by noon, I lean on UIzard to generate throwaway wireframes that I later recreate in Figma. I export the winning variant as PNG or HTML and feed it into ChatGPT or Claude, so my coding agents can see the same layout the designer approved and scaffold Storybook stories around it.

Claude Artifacts + OpenAI Codex-style presets – screenshot-to-component loop

Claude’s Artifacts view is still my favorite way to critique an existing UI: I upload a screenshot, ask for accessibility fixes, and it flags contrast, aria‑label, or focus‑order issues. When I need “instant” scaffolds, the Codex‑style presets inside ChatGPT Pro convert the same screenshot into React/Storybook code, which I harden manually and push into the component library.


Image generation and Content Hub automations

Vertex AI Imagen 3 – production hero shots

Imagen 3 handles anything that needs photorealism or strict brand fidelity. I trigger it from a small Content Hub extension so creatives never leave the DAM; SynthID verifies the watermark, and metadata syncs back into Content Hub for rights tracking. In practice I treat Imagen as the “production grade” model in my stack: it is not the cheapest option, but it reliably produces on‑brand hero shots once the prompts are tuned.

“Nano Banana” checkpoints + Gemini 2.5 Flash Image

For cheaper variations I lean on Google’s lighter “Nano Banana” Imagen checkpoints. When I need compositing or reference‑based edits, I switch to Gemini 2.5 Flash Image. Both live on Vertex AI, inherit the same IAM and monitoring setup, and push their metadata right back into Content Hub so usage stays auditable.


Fast-think ideation

Google Gemini – quick brainstorming

Gemini runs nonstop inside AI Studio for outlines, ADR drafts, or QA test ideas. Free and low‑cost tiers cover most of my “how might we…” prompts; when a thread looks promising, I usually promote the results into Claude or ChatGPT for deeper reasoning and code generation. This keeps my more expensive agents focused on work that directly shapes the XM Cloud implementation.


Automations via MCP and Playwright

Model Context Protocol + Playwright tooling

I use MCP (with varying degrees of success) to expose internal tools. I keep a Playwright MCP endpoint that crawls a target site, dumps headings/classes/aria labels into JSON, and feeds that data into NotebookLM or ChatGPT for content-model inference. Another MCP wraps my XM Cloud scripts (ser pull, serialized diffs, deploy guards) so agents interact through explicit commands instead of raw shells.


IDE and terminal flow

Cursor IDE – solo agent cockpit

Cursor’s paid tier indexes the repo locally and pipes that context into GPT‑class models automatically. I default to Cursor when I am the only engineer on a repo and need multi‑file edits without juggling prompt windows. Most of my “implement this refactor across ten files” work now happens as a three‑way conversation between Cursor, ChatGPT Pro, and Claude Code.

Windsurf IDE – plan-first pair programming

Windsurf shines when I am onboarding new contributors or running riskier changes. Every edit starts with a plan the agent proposes, I approve or tweak it, and only then does the diff apply. That extra review step feels slower when I am working solo, but it is perfect when I need a second pair of eyes on serialization or deployment scripts.

Warp – templatized command-line interface

Warp keeps my command-line interface (CLI) templates—ser syncs, Experience Edge key rotations, GraphQL scripts—ready for both humans and agents. Shared workflows mean anyone (or any agent) can rerun the exact command with the right environment variables and auditing hooks. I treat it as the command‑line front end for my DevOps agents.


Choosing the right coding agent per task

I usually start with the least expensive model that can handle the job, escalate to Claude Code for thorny migrations, and pin every model version and temperature so reviewers can reproduce the output later.


Security, privacy, and RBAC

Azure OpenAI On Your Data hosts private runbooks, deployment guides, and serialized inventories without exposing secrets. Role-based access control for Azure OpenAI keeps the index limited to the right team, and the documented On your data network configuration patterns let me keep traffic inside a virtual network (VNET) with private endpoints. Before I let editors touch an agent, I run prompt‑injection drills, check log retention, and confirm secrets never enter any prompt.


Prompt hygiene and knowledge capture

Prompts live beside code (/prompts/<area>/<task>.md) with inputs, outputs, and required citations. Execution logs (model, temperature, files touched) land under /runs/, which means I can replay a migration or audit who approved a scaffold. Treating prompts like code reviews keeps the “secret sauce” in Git, not in someone’s chat sidebar, and gives my agents a stable playbook from sprint to sprint.


Closing notes

Every tool above is one I keep open daily, pay for, and wrap in Sitecore’s official guidance. AI accelerates the work, but it never replaces reviews, governance, or security. For me, the win is that this stack lets a single architect behave like a full delivery team without cutting corners on quality. In the next posts in this theme I walk through how I apply the same agents and patterns:

References

Related posts in the AI-Powered Stack series:


Previous Post
AI-Powered Stack — Sprint zero for XM Cloud with agentic AI
Next Post
AI Code Generation — Series overview and roadmap