RF
Corsello Research Foundation
Data Modeling Interviews
Lines of Questioning
Basics
Data modeling is about defining standard structures for data
Many data sets may share a common structure
Each thing in the real world should have only one data structure Each data structure may appear in multiple data models
Data models come in 2 primary flavors
Domain model
Models all entities specific to a domain Aligns to task automation and workflows
Entity model
Models entities regardless of domain
RF
Corsello Research Foundation
Software
Software works with or on data
Software is actually a form of data as well Software should be keyed to a data model
Software may be built for dynamic data models
Allows for mapping to a specific implementation of a data model Results in general purpose software
Generally, lower performance
Lower specialization, Higher generality
Software may be keyed to a specific data model
Allows for high-performance, specialized tooling Allows for integration with workflows specific to the domain Lower generality
Neither model is better, just different
RF
Corsello Research Foundation
Data Stores
A collection of data based upon a single data model in a coherent repository is a data store A relational database is a form of repository for data stores
A single RDBMS instance may contain multiple data stores
Data stores may be abstracted by software in numerous ways to enable access
Web-based services (SOAP/REST/JSON/RSS) Database API (e.g. ODBC/JDBC)
Remote Service (non-web, like CORBA/DCOM/IIOP)
Native API (e.g. code library/dll/jar)
Corsello Research Foundation
RF
Data Models
Data models serve several purposes
Standard data models enable standard data formats, which enable sharing
Standard data models enable standard software implementations, which enable application integration Data models provide standard vocabularies for communicating Data models provide references for standardizing workflows
A workflow will require and produce data from the model Better enables defining standard entry and exit criteria
Data format standards are not data models
A standard XML schema is a standard encoding of data that implies structure, but is not itself a data model A data model is more abstract, it does not constrain implementation, encoding or use
Data models are only part of the bigger picture of standardization of practice
RF
Corsello Research Foundation
Parts and Pieces
The goal is consistency, repeatability, measurability and reuse (sharing)
This goal requires multiple facets:
Standard data models Standard methodologies
Technical models, algorithms and approaches
Standard business processes
Delineation of responsibility Processes and procedures Workflow models
In short, standards
Does not require agreement, only acceptance Standards do not need to fit everyones needs, only the cross-section of needs Standards should be composable to get more detail thats how to support everyone (a web of standards)
RF
Corsello Research Foundation
People
All activities are performed for and/or by people An task is automated to remove a person from needing to perform the task, however the result of the task will flow to a person People will appreciate the results of standardization, if done well but:
There is a fear that automation is meant to put them out of work There is a dislike for being required to do things in a different manner than we are used to (xenophobia) People want results, standardization is not quick
RF
Corsello Research Foundation
Coping with change
To enable standardization to work well, expect long time lines
Expect people to not support the time lines
Deliver results in the interim, without the promise of the standardization The grand vision of the resulting utopia from standardization should be avoided
There is no silver bullet, only hard work and good intent Dont hide the goals, but emphasize the short-term goals Dont let the short-term goals undermine the grand vision
The long-term goals are the most important to maintain relevance
The short-term goals are the most important to maintain support
RF
Corsello Research Foundation
Interviewing (finally)
When holding a data modeling / business process session, remember it is a collaborative interview Get relevant people involved:
Average user in the domain Hotshot or Hero in the domain Trouble child or Technophobe in the domain Minimal managers in general meetings Meet with management in a separate meeting both before and after for differing views
Get a cross-section of what the domain is
RF
Corsello Research Foundation
The Session(s)
Ask questions to spur discussion
The people are a cross-section of the domain to ensure active discussion
The facilitator / modeler do not actually create the model, the audience does
Maintain enough control and direction to stay on topic Some discussion need to go off-topic to get to a point
The modeler guides the model development based upon their knowledge of modeling practices, not the domain
The modeler should understand the domain well enough to know what is on or off track
The outcome of the meetings is a high-level abstract data model and process model
One is of little use without the other in a specific domain Entity data model sessions
Should result in a domain map indicating what domains this entity model is relevant to
Map should directly intersect the audience
RF
Corsello Research Foundation
Questions
There are no fixed questions to ask
It is imperative to teach data model basics in most cases
The line of questioning should be exploratory Try to answer
What does your domain do (and not do)
Establishes boundary of the domain
Who does your domain contain (and not contain)
Establishes a list of organizations of responsibility and regulatory environment Establishes a relative size for the domain
Who do you serve and interact with
Establishes a list of consumers of what domain produces Establishes a list of suppliers the domain consumes from
How does your domain accomplish this
Establishes a list of processes / practices
RF
Corsello Research Foundation
Modeling
Continue elaborating the previous questions
Extract from the answers
What do you use (tools, data, techniques) Where do you use X (for each data entity X) What is the same/different about each data entity
Establish a baseline of entities
Forms the core data model Extract fields/attributes
Extract metadata (descriptions)
Extract relations/multiplicities
Corsello Research Foundation
RF
Build a Model
Still during the meeting
Depict graphically:
Data entities Entity relations Process uses / domain mappings
Probe users for issues with the model
Whats missing What is not always true with the model What is domain specific about the model
What cannot be lived without
What is too costly to require or is inherently optional
RF
Corsello Research Foundation
Build the Real Model
After the meeting is over
Decompose the model into a logical data model representation (e.g. in UML)
Partition the model
Find natural break points in the entities Isolate each entity
Resolve dependencies into a parent and child
Extends the relational concept in that the parent data model owns the link to the child, the child is not required to know about the link
Address partition consistency issues
Define any mandatory constraints in the model
Expect implementations will not be 100% able to enforce contstraints
Expect implementations to be fully distributed, loosely coupled and inter-organizational
RF
Corsello Research Foundation
Review and Splanations
Provide the real model to the community
Expect concerns and issues
No word generally means nobody understands, or nobody cares Expect most issues will be addressed not by changing the model, but by explaining the concepts of the model
Educate, explain and provide examples
Most users will want to directly relate a model to an implementation of the model It is extremely hard to convey the difference It is critical to maintain a complete separation of the model from its implementation If (when) example implementations are shown, they should test the boundary of what is compliant with the model
RF
Corsello Research Foundation
Questions
RF
Corsello Research Foundation