Improve Your Data Modeling Skills
    Data professionals should know that improving their data modeling skills
    increases productivity and efficiency. Certifications can demonstrate these
    skills, which also improves your marketability.
    Motivation
    o Essential data modeling skills:
    Data Modeling
    o process of analyzing data-oriented structures
    
    o includes variety of specific model types
    - 
        
            types range from models for physical data to models for high-level
            concepts
            
 o similar to class modeling for object-oriented (OO) design
 
- 
        
            data modelers versus OO developers:
            
 . model entity types versus classes
 . assign attributes to entity types versus attributes and
            operations to classes
 . associations between entities versus between classes in OO
            design: similar
 
    Entity types
    o understanding entity types is fundamental skill for data models
    
    o entity types represent:
    - 
        
            collection of similar objects (such as people, places, and things)
         
- 
        
            non-physical concepts (such as events)
         
- 
        
            example: in order entry database: Customer, Order, and Item are
            common entity types
            
 o entity types only represent data whereas classes also describe
            object's behavior
 
    Attributes
    o entity types have at least one attribute
    
    o example: attributes for entity type Customer typically include attributes
    First Name and Last Name
    
    o developers typically implement attributes as columns in database tables
    - 
        
            achieving optimum level of detail is often challenging
            
 o expressing single attributes with multiple columns:
 
- 
        
            can provide greater control over data
         
- 
        
            incurs development and maintenance costs
            
 o example: phone number in North America
 
- 
        
            has three components: Area Code, Prefix, and Line Number
         
- 
        
            rarely need to assign each component to separate columns
         
    Naming Conventions
    o naming conventions for data modeling:
    - 
        
            typically maintained by enterprise administrators
         
- 
        
            essential for making code easy to understand and modify
         
- 
        
            physical and logical data models typically have different naming
            conventions since they have different purposes
            
 o example:
 
- 
        
            for logical data models: give greater priority to human readability
         
- 
        
            for physical models: focus more on technical considerations
         
    Relationships
    o relationships between entities:
    - 
        
            key requirement for developing data modeling skills
         
- 
        
            conceptually identical to associations between objects in OO
            programming
         
- 
        
            example: order entry system:
            
 . Customers place Orders, so placement is typical relationship
            between customers and orders
 . Customers live at Address, and Zip Code is part of Address
 o naming relationships often becomes unnecessary when specifying
            entities’ role in relationships with sufficient clarity
 
    Key Assignment
    o data modeling uses two basic strategies to assign keys to tables:
    - 
        
            assigning natural key:
            
 . usually best option when table has at least one attribute that is
            unique to table’s business concept
 
- 
        
            create surrogate key:
            
 . data modelers need to add new column for tables without such
            attribute
 ~ no business meaning
 ~ merely serves to identify entity type
 . example:
 ~ addresses do not have obvious natural key because needs entire
            address to identify it
 ~ data modelers often identify addresses with surrogate key called
            something like Address Identifier
 
    Normalization
    o process of organizing data within data models
    
    o make entity types of data models more cohesive
    
    o generally involves reducing data redundancy
    - 
        
            highly beneficial for application development
         
- 
        
            storing objects in relational databases becomes much easier when
            information about those objects is maintained in only one place
            
 o first three levels of normalization are most common
 o higher levels are possible
 o progressive hierarchy: next level meets all requirements of
            previous level
 o example:
 
- 
        
            entity type in first normal form (1NF):
            
 . does not contain repeating data groups
 
- 
        
            entity type in second normal form (2NF)
            
 . in 1NF
 . its non-key attributes fully dependent on its primary key
 
- 
        
            entity type is in third normal form (3NF):
            
 . in 2NF
 . its attributes directly dependent on primary key
 o incurs performance cost
 
- 
        
            denormalization also important skill for data modelers
         
- 
        
            data models often bear little resemblance to their normalized
            schema
         
    In addition to proper training, the key to improving data modeling skills
    is practice.