AI Tools

Custom Claude skills: when to write one (and when not to)

Claude skills are reusable agent capabilities. They're powerful — but writing one for the wrong workflow is wasted effort. Here's the practical guide.

Honam Kang5 min read

Custom Claude skills became prominent in 2025 as a way to make recurring agent workflows reusable. They're powerful but easy to misuse — writing a skill for a one-off task is over-engineering, while writing prompts for a recurring complex workflow is under-engineering.

This post is the practical decision guide.

What a skill actually is

A skill is a structured directory containing:

  • SKILL.md — instructions the agent reads when the skill is invoked.
  • Supporting files — templates, scripts, reference data.
  • Trigger conditions — when the LLM should consider using this skill.

When you describe a task, the LLM checks if any skill matches; if one does, it's invoked, and the SKILL.md instructions guide the agent's behavior.

This is more structured than a prompt. Prompts are conversational; skills are reusable, named capabilities.

When to write a skill

Three signals that a workflow merits a skill:

1. You've used the same prompt 10+ times

If you've copy-pasted the same prompt template more than 10 times across sessions, it's earning a skill upgrade. The cost of writing the skill is amortized across future use.

2. The workflow is multi-step

If the task is "do X, then check Y, then do Z," a skill that codifies the sequence is more reliable than re-explaining the steps each time.

3. The workflow needs specific files or tools

If the task always involves reading specific reference files or invoking specific tools, a skill that captures the setup is faster than re-specifying each time.

When NOT to write a skill

Most workflows shouldn't be skills.

One-off tasks

A task you'll do once or twice this quarter doesn't warrant a skill. A regular prompt is faster to write and adequate.

Highly variable tasks

If each instance of "the same task" actually needs different instructions, a skill won't help — you'll customize each time anyway.

Tasks that already have great built-in tools

Don't write a skill for "read this file" — Claude Code's built-in Read is already the right answer.

Tasks where the agent already does well

If you ask Claude "write me a TypeScript type for this object" and it works 95% of the time, don't add a skill. Skills are for the 5% workflows where unstructured prompting fails.

What good skills look like

Skill: "code-review-strict"

---
name: Code review (strict)
when_to_use: When user asks for a strict / thorough / critical code review of a diff or file.
---

You are reviewing code for defects. Produce a severity-rated list (P0/P1/P2):

- P0: bugs, security issues, broken behavior
- P1: significant code quality issues, design problems
- P2: minor stylistic issues

For each finding:
1. Quote the line(s) in a fenced block
2. State the defect in one sentence
3. Suggest the smallest fix
4. Prefix [STYLE] if it's a stylistic preference, not a defect

Don't list things that are correct. Don't summarize the PR.

Why this works as a skill: the workflow is recurring (you review code regularly), multi-step (severity rating + structured output), and the prompt is specific enough to not need customization per instance.

Skill: "blog-post-from-notes"

---
name: Blog post from notes
when_to_use: When user asks to convert notes/outline into a publishable blog post in our voice.
---

Convert the provided notes into a blog post following these rules:

1. Read `~/dev/_shared/references/brand-voice.md` for tone.
2. Length: 1500-2500 words.
3. Structure: lead → context → 3-5 main sections (H2s) → conclusion.
4. Include:
   - First paragraph that states the problem and the take.
   - Concrete examples in each main section.
   - One "what we'd skip" honest critique.
5. Don't use:
   - Hype language ("game-changer", "revolutionize").
   - Bullet-point-heavy style (prefer paragraphs).
   - Buried lede.

Why this works: highly recurring (you publish weekly), specific (length, structure, voice rules), and references a brand-voice file that the skill auto-loads.

Skill: "swift-codable-migration"

---
name: Swift Codable migration
when_to_use: When adding a field to a persisted Codable struct in mqdirCore.
---

When adding a field to a Codable struct in mqdirCore:

1. Add the field with a default value.
2. Hand-roll `init(from decoder: Decoder)` if not already present.
3. Use `try container.decodeIfPresent(<Type>.self, forKey: .<key>) ?? <default>` for the new field.
4. Add a test in `<Module>MigrationTests.swift`:
   ```swift
   func testMigration_vN_to_vN+1_preservesAllFields() throws {
     let v_N_json = """{ ... }"""
     let decoded = try JSONDecoder().decode(<Type>.self, from: ...)
     XCTAssertEqual(decoded.<existing field>, ...)  // old fields preserved
     XCTAssertEqual(decoded.<new field>, <default>) // new field defaulted
   }
  1. Bump the version number in <Type>.currentVersion.
  2. Run swift test --filter <ModuleMigrationTests> to verify.

Why this works: project-specific recurring task with strict pattern, where mistakes cost user data. Codifying as a skill ensures every instance follows the right shape.

## What bad skills look like

### Skill: "be helpful"

```md
---
name: Be helpful
when_to_use: Always.
---

Be helpful. Answer the user's question.

Adds no information. Wastes the skill mechanism.

Skill: "write tests"

---
name: Write tests
when_to_use: When user asks for tests.
---

Write tests for the function provided.

Too vague. The agent already knows how to write tests; this skill doesn't add specifics.

Skill: "do everything"

A 500-line SKILL.md that tries to encompass an entire workflow domain. Skills should be focused — one workflow per skill.

If your skill has more than ~150 lines, you've probably bundled too much. Split it into multiple focused skills.

Skill structure recommendations

Keep SKILL.md short

Aim for <150 lines. Concise skills are reliable; sprawling skills get partially-followed.

Define when_to_use precisely

The LLM uses this to decide when to invoke. Vague triggers ("when relevant") cause skill misfires; precise triggers ("when reviewing TypeScript code in our auth module") are reliable.

Reference external files for long content

Don't embed a 500-word brand voice guide in SKILL.md. Reference an external file:

- Read `~/dev/_shared/references/brand-voice.md` for tone.

Skills should be instructions; the data they reference can be elsewhere.

Include negative examples

Don't:
- Use the phrase "moving forward" — corporate-speak.
- Start posts with "In today's fast-paced world" — tired opener.

Negative examples anchor the skill's standards.

Where skills live

Anthropic's skill format places them in a structured directory. Common pattern:

~/.claude/skills/
├── code-review-strict/
│   └── SKILL.md
├── blog-post-from-notes/
│   ├── SKILL.md
│   └── voice-reference.md
└── swift-codable-migration/
    └── SKILL.md

Each skill is a directory. SKILL.md is the entry point. Supporting files live alongside.

For team-shared skills, commit the directory tree to a shared repo and point each team member's Claude Code config at it.

Maintenance

Skills aren't write-once. Two practices:

When the agent doesn't follow a skill instruction, fix the instruction.

If you wrote Don't use bullet-heavy style and the agent still produces bullets, the instruction wasn't strong enough. Update:

Don't use bullet-heavy style. If you find yourself producing 3+ bullets in
a row, convert to a paragraph instead. Bullets are for genuine lists (>5
items, or hierarchical), not paragraph chunking.

The clarification anchors what "bullet-heavy" means.

When you find yourself overriding a skill instruction, update the skill.

If you regularly say "ignore the part about X" when invoking the skill, X shouldn't be in the skill. Remove it.

Common skill patterns

Pattern 1: review skills

code-review-strict, code-review-style, pr-description-writer. One per review type.

Pattern 2: generation skills

generate-react-component, generate-jest-test, generate-changelog. One per output type.

Pattern 3: transformation skills

markdown-to-blog, feature-spec-to-tickets, api-error-to-user-message. One per transformation.

Pattern 4: project-specific patterns

swift-codable-migration, react-form-with-zod, nextjs-route-with-tests. One per recurring pattern in your project.

File-manager setup for skill development

mq-dir's pane setup helps when authoring skills:

  • Pane 1: skills directory ~/.claude/skills/.
  • Pane 2: a SKILL.md you're editing.
  • Pane 3: a reference file the skill points to.
  • Pane 4: cmux session where you test the skill.

You see the skill list, the current edit, the referenced data, and the test session at once.

Verdict

Skills are a powerful Claude Code feature for workflows that:

  • Recur weekly+.
  • Have multi-step structure.
  • Have specific output shapes.
  • Reference specific files or tools.

They're misused when written for:

  • One-off tasks.
  • Highly variable tasks.
  • Tasks the agent already handles well.

Aim for ~10-30 well-crafted skills, each focused, each <150 lines. Don't sprawl.

The investment is real (30 min - 2 hours per skill) but the daily compounding return is significant for recurring workflows.

mq-dir is the navigation layer for the artifacts your skill-driven workflows produce. Free, MIT.

Open source

mq-dir is fully open source.

MIT licensed, zero telemetry. Read the source, file an issue, send a PR.

★ Star on GitHub →

Frequently asked questions

A prompt is something you write inline in a conversation. A skill is a structured, reusable capability with a defined entry point, instructions, and (optionally) tool integrations. Skills are for repeating workflows; prompts are for one-off tasks. Skills also expose 'when to use' triggers to the LLM.

References

  1. [1]

Ready to try mq-dir?

A native quad-pane file manager built for AI multi-tasking on macOS. Free, MIT licensed, zero telemetry.

v0.1.0-beta.12 · MIT · macOS 14.0+ · github