How to Automate SKU Creation and Image Tagging for New Products
How to Automate SKU Creation and Image Tagging for New Products

Every time you onboard a new product, someone on your team opens a spreadsheet, types a SKU by hand, cross-references a naming convention doc that hasn't been updated since 2022, uploads product images one by one, tags them with alt text they half-remember the SEO guidelines for, and then copies everything into Shopify, your warehouse system, and maybe Amazon. This takes anywhere from 20 minutes for a simple product to six hours for something with 50+ variants.
You already know this is a bad use of human time. Let's talk about what to do about it.
The Manual Workflow Most Teams Are Actually Running
Before we get into automation, let's be honest about what "creating a new SKU" actually involves. It's not one step. It's ten, and most of them are tedious.
Here's what a typical product onboarding workflow looks like:
1. Product intake. Someone receives specs from a supplier, designer, or merchandiser. This arrives as a PDF, a spreadsheet, a WeChat message, or sometimes just photos with handwritten notes.
2. Attribute normalization. The supplier calls it "Navy Blue." Your catalog says "NVY." Amazon wants "Dark Blue." Someone has to decide which is canonical and map everything to it. This step is almost always manual, and almost always where errors creep in.
3. SKU format decision. Are you using descriptive SKUs like BRAND-TSHIRT-RED-M-2026 or sequential ones like ITEM-0047281? Someone has to know the convention and apply it correctly.
4. Variant matrix creation. Eight colors times seven sizes equals 56 SKUs. But not every combination is valid—maybe you don't stock XXS in every color, or a specific fabric only comes in three shades. A human has to map this out.
5. SKU generation. This is where people bust out =CONCATENATE() in Excel or, worse, just type each one manually.
6. Data enrichment. Add descriptions, pricing, cost, dimensions, weight, HS codes, compliance info. Each field has its own source and its own format requirements.
7. Uniqueness validation. Check against your existing catalog to make sure you haven't accidentally duplicated a SKU. If you have 10,000+ products, this is non-trivial.
8. Image processing. Rename files to match SKUs, resize for different channels, write alt text, assign images to the correct variants. For a product with eight colors, that's potentially 40+ images to handle.
9. System entry. Push everything into Shopify, Amazon, your ERP, your warehouse management system, and your accounting software. Each system has its own field mapping.
10. Review and approval. A merchandising or operations manager eyeballs everything for brand consistency and logic errors.
Time cost, based on real-world data:
- Simple product with 3–5 variants: 15–40 minutes
- Complex product with 30–100 variants (apparel, furniture, electronics): 2–6 hours per style
- Fashion brand launching a 200-style seasonal collection: 200–600 person-hours of data work
- A 2022 Salsify study found retailers spend an average of 26 days to bring a new product live, with data entry and SKU creation as a top bottleneck
If you're running a catalog of any meaningful size, this isn't just annoying. It's a structural drag on your business.
Why This Hurts More Than You Think
The obvious cost is time. But the less obvious costs are the ones that actually kill you.
Error rates are higher than anyone admits. GS1 and Gartner estimate manual product data entry error rates between 1.5% and 4%. Each error can cost $50–$300 in mis-shipments, returns, or lost inventory visibility. At 5,000 SKUs with a 3% error rate, that's 150 errors. At $100 average cost per error, you're burning $15,000 a year on mistakes that shouldn't exist.
Inconsistency destroys your data over time. When three different people create SKUs using three slightly different conventions, your reporting breaks. Your inventory search breaks. Your ability to analyze product performance by category breaks. This compounds. By the time you notice it, you're looking at a catalog cleanup project that takes months.
Supplier data is chaos. If you work with more than a handful of suppliers, you know this. Every supplier sends data in a different format. Some give you Excel files with merged cells. Some send PDFs. Some send you a link to a Google Drive folder full of unorganized images named IMG_4392.jpg. Normalizing this data is the single biggest time sink in the whole process, and it's the step most teams underestimate.
It delays your time-to-market. Every day a product sits in the "waiting for data entry" queue is a day you're not selling it. For seasonal products or trend-driven categories, that delay can mean missing the window entirely.
It doesn't scale. One apparel brand reported spending roughly 35% of their merchandising team's time on catalog maintenance. That's not merchandising. That's data entry wearing a merchandising hat.
What AI Can Handle Right Now
Here's where I want to be precise, because the AI hype cycle has made people either believe everything is magic or nothing works. The truth is specific: some parts of this workflow are extremely well-suited to AI automation today, and some parts still need a human.
What AI does well in this context:
-
Attribute extraction from unstructured data. Feed it a supplier PDF, a product spec sheet, or even a product photo, and it can pull out material, color, dimensions, weight, and other attributes with high accuracy. This is where large language models genuinely shine—they're pattern recognition engines, and product data is full of patterns.
-
Normalization and standardization. Mapping "navy," "nvy," "Navy Blue," and "#000080" to a single canonical value is exactly the kind of fuzzy matching task AI handles better than VLOOKUP formulas.
-
Bulk SKU generation from rules. Give it your naming convention and an attribute matrix, and it generates every valid SKU combination in seconds. No more CONCATENATE nightmares.
-
Image tagging and alt text generation. Computer vision models can identify product type, color, material, and context from photos and generate SEO-friendly alt text and file names automatically.
-
Duplicate detection. AI can score similarity between a new SKU and your existing catalog, flagging potential collisions before they cause problems.
-
Description and metadata generation. Product titles, bullet points, meta descriptions—this is bread and butter for language models, especially when given your brand guidelines as context.
-
Variant matrix optimization. Flag combinations that are mathematically possible but commercially nonsensical (e.g., "wool" in a swimwear line, or size XXXL in a children's collection).
How to Build This with OpenClaw: Step by Step
Here's the practical part. I'm going to walk through how to build a SKU creation and image tagging agent using OpenClaw that handles the bulk of this workflow, so your team only touches the exceptions.
Step 1: Define Your SKU Schema
Before you automate anything, you need a clear, documented SKU format. If your current format is inconsistent, now is the time to fix it.
Create a schema document that specifies:
SKU Format: [BRAND]-[CATEGORY]-[SUBCATEGORY]-[COLOR]-[SIZE]-[YEAR]
Example: CLWM-APP-TEE-NVY-M-25
Attribute Rules:
- BRAND: 4-char abbreviation (CLWM = Claw Mart)
- CATEGORY: 3-char code from approved list (APP, ACC, HOM, ELC)
- SUBCATEGORY: 3-char code (TEE, HOD, JKT, PNT)
- COLOR: 3-char code from canonical color map
- SIZE: Standard sizing codes (XS, S, M, L, XL, 2X, 3X)
- YEAR: 2-digit year
This schema becomes the foundation of your OpenClaw agent's instructions.
Step 2: Build Your Canonical Attribute Maps
Create reference tables that your agent will use for normalization. These live as knowledge base documents in OpenClaw:
Color Canonical Map:
- Navy Blue, Navy, Nvy, Dark Blue, #000080 → NVY
- Red, Crimson, Cherry, Scarlet, #FF0000 → RED
- Black, Blk, Onyx, Jet, #000000 → BLK
- White, Wht, Ivory, Snow, #FFFFFF → WHT
Category Map:
- T-shirt, Tee, T-Shirt, Tshirt → TEE
- Hoodie, Hooded Sweatshirt, Pullover Hoodie → HOD
- Jacket, Coat, Outerwear → JKT
Size Map:
- Extra Small, XSmall → XS
- Small, Sm → S
- Medium, Med → M
The beauty of using OpenClaw here is that these maps aren't static lookup tables—the agent understands context. If a supplier sends "dark midnight blue" and that's not in your map, the agent can infer the closest canonical match and flag it for confirmation rather than just failing.
Step 3: Configure Your OpenClaw Agent for Product Intake
Set up an OpenClaw agent with the following instruction set:
Agent Role: Product Data Processor
Instructions:
- Accept product data in any format (CSV, JSON, plain text description, or image)
- Extract all product attributes: name, category, subcategory, material,
colors available, sizes available, weight, dimensions, price, cost
- Normalize all attributes against the canonical maps in your knowledge base
- Generate SKUs following the schema: [BRAND]-[CATEGORY]-[SUBCATEGORY]-[COLOR]-[SIZE]-[YEAR]
- Create the full variant matrix, excluding invalid combinations
- Flag any attributes that don't have a clear canonical match
- Output structured JSON for each product with all variants
Validation Rules:
- No duplicate SKUs against existing catalog (reference: current_catalog.csv)
- All SKU segments must use approved codes only
- Flag but don't reject products with missing non-critical attributes
- Reject products missing: category, at least one color, at least one size
Upload your existing catalog as a reference file so the agent can check for collisions in real time.
Step 4: Add Image Processing to the Pipeline
This is where the workflow gets powerful. Configure a second agent function (or extend the same agent) to handle image tagging:
Image Processing Instructions:
- Accept product images in batch
- For each image, identify: product type, primary color, secondary colors,
material (if visible), context (lifestyle, flat lay, model, detail shot)
- Generate file name following convention: [SKU]-[image_type]-[sequence].jpg
Example: CLWM-APP-TEE-NVY-M-25-lifestyle-01.jpg
- Generate alt text following SEO template:
"[Brand] [Product Name] in [Color] - [Description of what's shown]"
Example: "Claw Mart Classic Tee in Navy - front view on model"
- Generate meta descriptions for each product's primary image
- Map images to correct SKU variants based on color matching
The agent uses vision capabilities to match product images to the correct color variants—something that previously required a human to do manually for every single photo.
Step 5: Set Up the Workflow
Here's how the end-to-end flow works in practice:
Input: Your team receives supplier data. Instead of opening Excel, they drop the file (PDF, spreadsheet, whatever) into the OpenClaw agent.
Processing: The agent extracts attributes, normalizes them, generates the full variant matrix, creates all SKUs, and produces a structured output. If product images are included, it processes those simultaneously—tagging, renaming, generating alt text, and mapping images to variants.
Output: You get a clean, structured file ready to import into Shopify, Amazon, or your ERP. Something like:
{
"product_name": "Classic Cotton Tee",
"brand": "CLWM",
"category": "APP",
"subcategory": "TEE",
"variants": [
{
"sku": "CLWM-APP-TEE-NVY-S-25",
"color": "Navy",
"color_code": "NVY",
"size": "S",
"price": 29.99,
"weight_oz": 6.2,
"images": [
{
"filename": "CLWM-APP-TEE-NVY-S-25-flat-01.jpg",
"alt_text": "Claw Mart Classic Cotton Tee in Navy - flat lay front view",
"type": "primary"
}
]
},
{
"sku": "CLWM-APP-TEE-NVY-M-25",
"color": "Navy",
"color_code": "NVY",
"size": "M",
"price": 29.99,
"weight_oz": 6.5
}
],
"flags": [],
"validation": {
"duplicate_check": "PASS",
"schema_compliance": "PASS",
"missing_attributes": []
}
}
Human review: Your merchandising lead reviews the output, approves it, and pushes to your systems. They're now spending 5 minutes reviewing instead of 3 hours building.
Step 6: Handle Edge Cases with Escalation Logic
Configure your OpenClaw agent to escalate rather than guess when it hits ambiguity:
Escalation Rules:
- Confidence < 85% on any attribute normalization → flag for human review
- New category or subcategory not in existing maps → suggest code, await approval
- Supplier data missing critical fields → generate partial record, list gaps
- Image color doesn't match any existing variant → flag mismatch
- Potential SKU collision with existing catalog → halt and alert
This is critical. The agent should never silently make a bad decision. The goal is to automate the 80–90% that's straightforward and surface the 10–20% that genuinely needs human judgment.
What Still Needs a Human
Let me be direct about this, because overselling AI capabilities is how you end up with a mess.
Humans should still own:
-
SKU strategy decisions. Whether to use descriptive vs. sequential SKUs, what attributes to encode, how to handle brand acquisitions—these are business decisions, not data tasks.
-
Commercial variant selection. AI can generate every possible combination. A human decides which ones you actually want to stock and sell. "Should we offer this jacket in chartreuse?" is a merchandising call, not a data call.
-
Compliance and safety. Especially in food, cosmetics, electronics, or children's products. Regulatory requirements vary by jurisdiction and change frequently. Don't delegate this to an AI agent.
-
Brand voice and aesthetic. The agent can generate product descriptions and alt text, but someone should be reviewing them for tone and accuracy, especially when you're establishing your voice.
-
Novel products. When you're launching something that doesn't fit any existing category in your taxonomy, a human needs to extend the schema before the agent can handle it.
-
Pricing and margin logic. SKU-level pricing decisions involve competitive intelligence, supplier negotiations, and strategic positioning that AI doesn't have context for.
The pattern here is clear: AI handles data transformation, humans handle business judgment. When you draw the line correctly, both sides work better.
Expected Time and Cost Savings
Based on the industry benchmarks and what teams are reporting with AI-assisted product data workflows:
Time reduction: 60–85% reduction in data entry and SKU creation time. A product that took 4 hours to onboard drops to 30–45 minutes, with most of that being review rather than creation.
Error reduction: Teams report error rates dropping from the 1.5–4% range to under 0.5% when AI handles normalization and validation, largely because the agent applies rules consistently every time. It doesn't get tired at 4pm on Friday and accidentally type "NVY" as "NVT."
Concrete math for a mid-size catalog:
- 500 new products per year, average 20 variants each = 10,000 new SKUs
- At 3 hours average per product manually = 1,500 hours/year
- At 45 minutes per product with OpenClaw = 375 hours/year
- Savings: 1,125 hours/year
- At $35/hour fully loaded labor cost = ~$39,000/year in direct time savings
- Plus reduced error costs, faster time-to-market, and better data consistency
For larger catalogs, these numbers scale linearly. A brand launching 2,000 products per year is looking at $150,000+ in annual savings from this single workflow.
Time-to-market impact: Products that used to take 2–4 weeks from supplier delivery to live listing can go live in 2–3 days. For trend-driven categories, this alone can be worth more than the labor savings.
Getting Started
You don't need to automate everything on day one. Start with one product category where you have the cleanest data and the most volume. Build your canonical maps for that category, configure the OpenClaw agent, run 20 products through it alongside your manual process, and compare outputs.
Once you trust the results, expand to additional categories. Within a few weeks, you'll have an agent that handles the vast majority of your product onboarding while your team focuses on the decisions that actually need human brains.
If you want help building this out—or you'd rather skip the DIY phase entirely—browse the Claw Mart marketplace for pre-built OpenClaw agents designed for product data workflows. Or, if you've already built something like this and want to sell it to other operators, check out Clawsourcing to list your agent and start earning from the workflows you've already figured out.