Domain Modeling and Data Modeling with AI

Getting your domain model right is the highest-leverage work in software design — a good model makes everything downstream easier, and AI can dramatically accelerate the discovery process.

Why Domain Modeling Comes Before Data Modeling

Most engineers, when handed requirements, open their database tool and start designing tables. This is a mistake. Database tables are an implementation detail. The domain model — the conceptual representation of the business problem — should exist independently of any storage technology. If you design tables first, you will end up with a schema that reflects your first (usually wrong) understanding of the problem domain rather than the actual business reality.

Domain modeling is the practice of identifying the key concepts in a problem domain, the relationships between them, and the rules that govern them. It is a conversation between engineers and business stakeholders, mediated by a shared vocabulary. AI can accelerate this conversation by generating candidate models from requirements text and asking clarifying questions that force precision.

The discipline of domain-driven design (DDD), introduced by Eric Evans, provides a vocabulary and set of techniques for this work: entities, value objects, aggregates, bounded contexts, and domain events. You do not need to be a DDD expert to benefit from these concepts, but understanding them helps you use AI more effectively for domain modeling because you can ask for outputs using the right vocabulary.

Learning tip: Write your domain model as a glossary first — a list of key terms with one-sentence definitions. Ask the AI to critique this glossary for ambiguity and inconsistency. Fixing the vocabulary fixes most of the subsequent modeling problems.

Identifying Domain Entities and Relationships from Requirements

Domain entities are the things in your system that have identity — they persist over time and are distinguishable from other things of the same type. A User is an entity because user #42 and user #43 are different, even if they have the same name. A Money value is not an entity — $50 is $50 regardless of which instance it is.

Given a requirements document, AI can identify candidate entities, their attributes, and their relationships. The output will not be perfect, but it will be a strong starting point that surfaces 80% of the entities you need and raises the right questions about the remaining 20%.

The key to getting good entity identification is to give the AI requirements that include concrete examples of the system working. Abstract requirements ("users can manage their subscriptions") generate weaker models than concrete scenarios ("Alice upgrades from a Free plan to a Pro plan mid-billing-cycle; she should be charged a prorated amount and her new features should activate immediately").

Relationships between entities should be analyzed along three dimensions: cardinality (one-to-one, one-to-many, many-to-many), lifecycle dependency (does deleting a User delete their Orders?), and directionality (does a Subscription know about its Invoices, or do Invoices know about their Subscription, or both?).

Learning tip: For every relationship your AI identifies, ask: "Who is responsible for maintaining this relationship?" The answer tells you which entity owns the foreign key and which service owns the relationship in a distributed system.

Generating Entity Relationship Diagrams with Mermaid

Mermaid is a text-based diagramming syntax that renders directly in GitHub, Notion, and most modern documentation tools. It is the ideal format for AI-generated ERDs because it is easy to produce, diff, and iterate on. A Mermaid ERD captures tables, columns, types, and relationships in a format that is both human-readable and renderable.

When asking AI to generate a Mermaid ERD, provide your domain model (entities and relationships) and specify the level of detail you want. For a conceptual ERD, you want entities and relationships only. For a logical ERD, you want attributes and data types. For a physical ERD, you want all of the above plus indexes, constraints, and foreign keys.

A good Mermaid ERD prompt will also specify what to exclude: internal implementation columns (created_at, updated_at) can be left out of early-stage diagrams to reduce noise. You can add them later.

After receiving the generated ERD, validate it against your requirements: can you trace every user story through the data model? If a user story requires information that cannot be derived from the model, you are missing an entity or relationship.

Learning tip: Print or export the ERD and walk through your top five user stories manually, tracing the data joins. If any story requires more than three or four joins, consider whether there is a denormalization or aggregate that would make the read path cleaner.

AI-Assisted Schema Design: Normalization, Indexing, and Data Types

Translating a logical domain model into a physical database schema involves a set of engineering decisions that directly affect performance, maintainability, and correctness. AI can help reason through these decisions with much more context than a generic database textbook can provide.

Normalization: AI can analyze a schema and identify normalization violations (repeating groups, partial dependencies, transitive dependencies) and explain the implications of each. More importantly, it can explain why you might deliberately denormalize for performance — and what consistency trade-offs that introduces.

Indexing strategy: Given your most common read patterns (which AI can infer from your user stories and API design), AI can suggest a starting index strategy. This is valuable because index decisions are often made by gut feeling rather than systematic analysis of query patterns.

Data types: Type choices have significant performance implications. A VARCHAR(255) used for a field that always stores UUID values wastes space and index efficiency. An INTEGER used for a field that will exceed 2 billion rows will overflow. AI can review a schema and flag type choices that are likely to cause problems.

The caveat is that AI schema advice is generic unless you give it context about your specific database engine and version, your expected data volumes, and your query patterns. Advice optimized for PostgreSQL 15 may be wrong for MySQL 8, and advice appropriate for 1 million rows may be wrong for 1 billion.

Learning tip: When asking for indexing advice, always state your database engine, expected row counts, and your top five most frequent query patterns. The more context you give, the more useful the advice.

Event Storming with AI

Event storming is a collaborative design workshop technique where a group of stakeholders maps out the system by identifying domain events (things that happened, expressed as past-tense verbs: "OrderPlaced", "PaymentProcessed", "ItemShipped"). The output is a timeline of events that reveals the full business process and surfaces the commands and actors that trigger each event.

AI can simulate a lightweight version of event storming by analyzing requirements and generating a candidate event stream. This is particularly useful for:

Identifying events that are missing from requirements
Finding events that imply state machines (an Order goes from Draft to Confirmed to Shipped to Delivered)
Discovering where bounded contexts should be drawn (events that require a lot of context from adjacent domains often indicate a boundary problem)

To run an AI event storm, provide your domain description and ask the AI to generate all domain events in chronological order for a typical business flow. Then ask it to identify the commands (user actions or system triggers) that produce each event, and the read models (projections) that downstream systems need to consume.

Learning tip: After the AI generates an event stream, look for events that have no corresponding command. These often indicate missing requirements — something causes the event but you have not described what.

Domain-Driven Design Concepts with AI Assistance

Even if you are not implementing full DDD, three concepts are immediately practical for any system design: aggregates, bounded contexts, and domain events.

Aggregates are clusters of entities and value objects that are always treated as a unit for data changes. The aggregate root is the only entry point for modifications. AI can help identify aggregate boundaries by looking for which entities are always changed together and which consistency rules span multiple entities. A common mistake is making aggregates too large — if your aggregate contains ten entities, transactions will be slow and contention will be high.

Bounded contexts are explicit boundaries within which a model is consistent. The same word ("customer") can mean different things in different bounded contexts (a customer in billing is a payment account; a customer in fulfillment is a shipping address and a list of orders). AI can identify candidate bounded context boundaries by looking for places where the same term appears in the requirements but means different things to different stakeholders.

Domain events are the mechanism by which bounded contexts communicate. Rather than direct service-to-service calls (which create coupling), a bounded context publishes events that other contexts can subscribe to. AI can help design the event schema and identify which events each bounded context needs to publish and consume.

Learning tip: Ask the AI to give you a "ubiquitous language glossary" for your domain — a list of terms with definitions that would mean the same thing to both engineers and business stakeholders. Disagreements about this glossary reveal bounded context boundaries.

Validating Data Models Against Business Rules

A data model that cannot enforce the business rules of your domain is not a good data model. It just moves the enforcement problem into application code, where it is inconsistent and hard to audit.

Business rules include things like: "A user can have at most one active subscription," "An order total cannot be negative," "An invoice must be associated with a customer before it can be sent." Some of these can be enforced at the database level (unique constraints, check constraints, foreign keys). Others require application-level enforcement. AI can help you categorize which rules belong where and generate the constraint definitions.

Ask the AI to analyze your data model against a provided list of business rules and identify for each rule: whether it can be enforced at the database level, what constraint definition would implement it, and what gaps exist if the rule cannot be fully enforced in the schema.

Learning tip: For every business rule that cannot be enforced at the schema level, make sure there is an explicit application-level enforcement point and a corresponding test. Undocumented rules enforced only in scattered application code are the most common source of data integrity bugs.

Hands-On: Domain Modeling Session from Requirements

Work through this exercise with a simple e-commerce requirements set to practice the full domain modeling workflow.

Step 1: Provide requirements and ask for domain entities

Here are the requirements for an e-commerce order management system:

- Customers can place orders containing one or more products
- Each product has a name, description, price, and stock quantity
- Orders go through states: Draft, Confirmed, Paid, Shipped, Delivered, Cancelled
- When an order is Confirmed, inventory is reserved; when Cancelled, inventory is released
- Customers can apply discount codes; each code has a discount type (percentage or fixed amount) and can be single-use or multi-use
- Each order generates an invoice; invoices can be paid by credit card or bank transfer
- Customers can request returns within 30 days of delivery

Identify all domain entities, their key attributes, and the relationships between them. For each entity, indicate whether it is an Entity (has identity) or a Value Object (defined only by its attributes). Do not generate a schema yet.

Step 2: Generate a Mermaid ERD

Based on the domain model you just created, generate a Mermaid ERD (logical level — include attributes and data types, but omit created_at/updated_at columns). Use snake_case for table and column names.

After the diagram, list any assumptions you made about data types or relationships that I should validate.

Step 3: Identify aggregate boundaries

Looking at this domain model, propose aggregate boundaries. For each aggregate:
- Name the aggregate root
- List the entities and value objects within it
- Explain what consistency rule makes them a unit
- Describe the trade-offs of this boundary choice

Also flag any entities you are uncertain about, and explain the options for where they could belong.

Step 4: Run an event storming pass

Generate the complete domain event stream for a customer placing and receiving an order in this system, including the happy path and the cancellation/return paths. For each event:
- Event name (past tense, PascalCase)
- What command or trigger caused it
- What state change it represents
- What other bounded contexts would need to know about it

Step 5: Validate business rules against the schema

Here is a list of business rules:
1. A customer can have at most one active shopping cart
2. A discount code cannot be applied to an order that is already in Paid state
3. An order cannot transition from Shipped back to Confirmed
4. Product stock quantity cannot go below zero

For each rule: can it be enforced at the database level? If yes, provide the SQL constraint definition. If no, describe where in the application it must be enforced and what tests are needed.

Key Takeaways

Domain modeling should precede data modeling — get the business concepts right before thinking about tables and schemas.
AI can identify domain entities, relationships, and aggregate boundaries from requirements text, but the output requires validation against concrete user scenarios.
Mermaid ERDs are the ideal format for AI-generated data models — they are diffable, renderable in most documentation tools, and easy to iterate on.
Event storming with AI surfaces missing requirements and bounded context boundaries that are invisible in a static data model.
Every business rule should be explicitly categorized as schema-enforced or application-enforced, with no rules left to ad-hoc implementation.