I and C Architecture Design 2020
L1: Introduction
L2: Overview architecture frameworks
L3: Meta architecture and design framework
E1: Guest lecture; Digital Sandbox Bharosa
L4: Modular architectures and technology (!)
L6: Big data and data quality
L7: From BPMN to orchestration
L8 = E2: Guest lecture; CRANIUM
L9: Middleware (!!)
L10: Transactions concurrency and blockchain
L11 = E3: Guest Lecture; VKA (missed this one, check effect on exam grade)
L12: Project presentation
L14: Exam questions
Lecture 1
    •    Course objectives
    •    History and developments of ICT in public and private organizations
    •    EDI, XML, XSLT (S2S data exchange)
    •    Need for ICT-systems architecting
Path dependencies: explains how the set of decisions one faces for any given circumstance is
limited by the decisions one has made in the past.
       à Within ICT, this could be the installed base of systems, chosen standards,
procedures and routines that influence future behavior.
       à First movers’ advantage is temporary, and first movers are blocked by their
progress which will ultimately cause them to lag behind. ‘Wet van remmende voorsprong’.
Coherency Management: Architecting the Enterprise for Alignment, Agility, and Assurance.
Starting points for I&C design:
    • Multi actor situation                             •   Strategic fit: interrelation
    • Limited influence/authority of all                    internal/external components
        stakeholder                                     •   Translation from strategy to ICT
    • All kinds and types of systems are                    and vice versa
        already available                               •   Switching between views:
    • Need for understanding the big                        technological, economical,
        picture                                             organizational, psychological, user.
    • Creating a shared understanding                   •   Attention to issues like security
    • No ‘optimal’ but negotiated                           and privacy, scalability, robustness,
        solution                                            flexibility, and standards.
History of I&C development
Case: Ohra
Insurance company, direct writer, product-oriented information systems, large transaction
processing systems, large number of players in the market, no market transparency, large
number of insurance and banking products.
‘MainFrame’ situation in the 1980’s; Characteristics:
   • Fully centralized architecture                     •   Dumb terminals
   • One mainframe built as monolithic                  •   Central control and maintenance
       entity                                           •   Only employees have access
   • All applications reside on the                     •   Simple client/server architecture
       mainframe
Towards Distributed networks in 1990; Characteristics:
   • Multiple applications on various geographic locations
   • More complex architectures
   • Basic interactions
          o File transfer
          o Remote printing
           o Terminal transfer
           o Remote file access
Functional applications 1995:
   • For each product (car, medical, .. insurance) a separate information system
   • For each department (accounting, human resources, ..) a separate information
       system
   • Each department selects technology and solutions independent of other
       departments
   • ‘Management by magazine’
   • No communication between applications
   • However,
           – Similar data in applications
           – Similar functionality used
Insurance companies have many different products as well: B2B, B2C, Direct/Indirect.
Mission critical legacy systems:
Built in Cobol, still running and are reliable, yet very hard to change. Wait to get the RoI
back.
Need for integration and architectures.
Electronic Data Interchange (EDI): As a way of communication by means of which formatted
business documents are sent electronically from one organization’s computer to another
organizations computer. Characteristics are;
                   - Data standards
                   - Transfer of structured data
                   - Between organizations
                   - Application to application
                   - Across heterogeneous computer platforms
Middleware technology:
    • Hides the complexity of source and target systems.
    • Makes systems even more complex.
    • Deals with protocols.
    • Focus on sharing data between heterogeneous information systems.
à use architecture and modularity to implement middleware technology to simplify the
system! (Need to gain an overview in the mess / Information redundancy / understandable).
XML Extensible Markup Language (Need to know the concepts!!)
Content <-> Presentation <-> Structure <-> Content (Format transformation)
‘Separation of concerns’
Check for errors: <x><y> … </x> </y> is false.
Difference between HTML and XML: HTML is only for the presentation. What you see in a
web browser. In an XML browser, a stylesheet (XSL) can be used to create HTML, but also
can be transferred to pdf for example.
By splitting the structure:
    • Modularization: compose systems and enterprises of readily available components.
    • Adaptive enterprise: adapt to changing circumstances.
    • Networks of organizations: enterprise architectures need to be interoperable among
        organizations.
Namespaces: A collection of all element types and attributes names for a certain domain.
   • Prevent naming conflicts
   • Easier to assemble large schemata from smaller ones
   • Each namespace is tied to a uniform resource identifier (URI) (=some sort of URL)
   • The namespace name and the local name of the element together form a globally
      unique name known as a qualified name
à Always use namespaces to avoid collisions
Cloud computing (2012)
With just 1 Application Programming Interface (API):
   • Infrastructure becomes less important
   • Software as a Service (SaaS)
   • Scalability is easier
   • Saving costs
   • Many issues.. : Long term sustainability, lock in, …
Lecture 2
   •   Knowledge of design science principles
   •   Understand relationship design and architecture
   •   Knowledge of various conceptualizations of architecture
   •   Kowledge about EA frameworks
                § Zachman
                § Tapscott
                § Togaff
                § Archimate
What is architecture?
All the parts are connected to one another and ensuring that you have an overview of what
is going on. In ICT architecture, the architecture is not tangible… Architecture can refer to
the structure, the process, or a profession. Architecting is a process.
Check Hevner document – Design science
on Rigor (theories and models into account) vs. Relevance (addressing the practical problem)
Designing in socio-technical setting in between the rigor and relevance.
Prototyping is important to make designs tangible.
Levels of design
   • Conceptual design
   • Implementation design
   • Implementation
Business process (re)design? Database design? (Laws, regulations, culture à rules)
Goals: Functions and specifications for process/product. (can be conflicting / Tradeoffs)
Design space: Options, Alternatives (Decision variables, values, attributes, and ranges) (DoF)
Test or models: Agreed to procedure, computer program à used to transform the values for
the decision variables into an evaluation of the proposed design alternatives.
Starting points: Existing solutions, goals and tests. (Path dependencies / ‘No green field’)
Herder & Stikkelman: Elements of design process model
Analyzing the design space
*often a maximum of three months is given for an architecture project. (Due to changes over
time)
Goals Enterprise Architecture (EA) frameworks
   • Dealing with complexity
   • Defines and interrelates the various elements from multiple (stakeholders’) views
   • Related sub architectures
   • Means to order architecture results
   • A means to guard their completeness, both in terms of scoping and level of detail
   • Insight into the interrelationships of architecture results, enabling the traceability of
       decisions and their impact
   • Refrain from technological details
   • Helps to translate to implementation
Zachman Framework – the first architecture framework
Integration and coordination across enterprise: (Matrix)
    • Rows define stakeholders’ views. The five rows address the perspectives of the
       planner, owner, designer, builder, programming and those involved in operation.
    • The columns define various abstractions of the system. Describes using interrogative
       words, insight may be gained into different aspects of an enterprise (Actors, timing,
       processes, functionality, … )
Disadvantages:
   • Not relating the cells to each other. No relationships are being shown.
   • Time horizon is also not shown.
Tapscott Framework – The five views
Business context: Is the information; those are the business parts and responsibilities. Show
the actual work processes. (Requirements and needs)
IT context: the applications, the software, the infrastructure that support the actions and
processes. (IT solution, supply)
Dynamic Enterprise Architecture (DYA) framework – Show a process
BIT = Business, Information, Technical.
Model-driven architecture (OMG-MDA) – not a real framework. Independent of the
platform. Enables reuse.
IEEE1471 Framework – formal elements of software architecture
It takes a system level view – can be decomposed.
     • System: A collection of components organized to accomplish a specific function or set
        of functions.
     • Architecture: The fundamental organization of a system embodied in its components,
        their relationships to each other, and to the environment, and the principles guiding
        its design and evolution.
     • Architecture description (AD): A collection of products to document an architecture.
     • View: A representation of a whole system from the perspective of a related set of
        concerns.
Basic concepts IEEE1471
TOGAF – it’s the standard – The Open Group’s Architectural Framework
à ADM: Architecture Development method. à ADM cycle = Process model.
Collection of best practices, models, and checklists. Architecture function central. IT
centered, slowly adding more business architecture.
Disadvantage: Too much too handle. Very bureaucratic. ‘All is included’.
Archimate – a description language – Closely related to TOGAF.
ArchiMate connects architectural domains:
   • Broader s cope, but less detail than UML (software) and BPMN (processes)
   • Does not replace more specialized languages such as UML, BPMN, and others.
ArchiMate layers and Aspects
Resource-based view
   • Resources as organizational assets
   • Resource attributes: Valuable, Rare, In-Imitable, Non-substitutable (VRN)
   • à Human resources, budget, …
Dynamic capabilities
   • To change the resources to comply with new environment
   • Aspects: path dependencies, …
A business event is something that happens (externally) and may influence business
processes, functions, or interactions.
A business process represents a sequence of business behaviors that achieves a specific
outcome such as a defined set of products or business services.
Lecture 3
Enterprise ICT-architecture = to support the design (not the actual design itself)
(Meta) Framework:
    • Understand architectural framework in this course
    • Understand the core concepts of architecture (framework, layers, views, principles)
In the exam; elements and principles from certain architectures.
Principles; guide.                   Standards; can be used.
Typical balancing aspects:
   • Reasonable level of abstraction                   •   Describes the current situations,
   • Adequate coverage of the real                         evolution project, and prescribes
       world                                               desired situations
   • Reasonable familiar and assessable                •   Defines standards, principles and
       concepts                                            guidelines
   • Communication vehicle                             •   Entities differ while maintaining
   • Link to both strategy and                             similarities in domains.
       implementation
Know the environment, drives and development; market, customer and segments; available
resources and expertise; distribution channels; products. à these are situational factors
influencing the architecture. Thereafter, make a set of Business Requirements.
Programme of Business Demands (PBD)
Bridge between the; business environment and strategic objectives – and the enterprise
architecture.
Serves as a guide. à Include a Goal Hierarchy.
MoSCoW analysis:
Must, Should, Could, Would.
à what kind of tradeoffs are you expecting?
Layered-based engineering:
   • Each layer can be used to represent one type of entities
   • Reduce complexity and scope or understand relationships
   • Design each layer independent of other layers
   • Use of different views and objectives
   • Reduce complexity
   • One layer can be designed relatively independently of others
How are the layers connected to each other?
Layers can be split or merged, depends on what you want to show.
Grouping: element aggregates or composes concepts that belong together based on some
common characteristics.
Business architecture – the highest layer
   •   Architecture as strategic capability (as core competences), a vision to guide
       development of information systems in a ‘complex’ organization.
           o Single capability of the firm cannot provide a sustainable competitive
              advantage to the firm
           o Competititve capabilities of the firm should be “complementary” or
              “synergistic”
   •   Business Architecture takes into consideration the businesses strategy of the firm, its
       long-term goals and objectives, the technological environment, and the external
       environment
   •   Business Architecture: the arrangements of the responsibilities around the most
       important business activities (fe; production, distribution, marketing) or the
       economic activities (fe; manufacturing, assembly, transport)
   •   Per business domain a different goal hierarchy is possible.
   Business process architecture
   • Collection of business processes triggered by events:
          o Each customer interaction results often in a business process
          o Periodically triggers
          o Internal triggers
   • Interdependencies among sequences of tasks
   • Operational (primary) and control (secondary) business process
   • Include human as well as automated tasks
   • Process decomposition: from value chain to detailed tasks
Information architecture
Describes the relationship between the business processes, applications and information
sources aimed at storing, processing, reusing and distribution of information across
information resources.
à Information architecture is the organization of information to aid information sharing
among actors. Fe; Vital records registry.
Application architecture
   • Describes the software applications, components and objects, and the relationship
       between these parts.
   • Best-of-breed vs. frameworks
   • Integration and middleware
   • IT systems vs. enterprise architecture
           o IT systems architecture: decompose into individual functional software
              components
           o Enterprise architecture: decompose into manageable parts
Technical architecture
   • Technical architecture Is about generic facilities, used by many application systems. It
       is about functionality that is a common need of many different systems.
   • Topics include Next Generation Infrastructure (NGI), grids, wireless networks
   •  In a NGI no new hardware is bought for each new system, but a standard
      infrastructure is provided.
à changes due to the cloud!
Implementation, Control, and Maintenance
   • After architecture has been designed it needs to be implemented, controlled, and
      maintained.
   • Involves further development of systems (new releases)
   • Often consumes most of the resources and is usually the bulk of the IT expenses
   • Implementation will likely deviate from intentions and result in a revised architecture
Architectural guidelines principles and standards (!!)
   • Design guidelines
           o Supporting design, eg; use of an open source
           o Often cannot completely be followed and need a trade-off (access vs security)
           o Direct design decisions and are based on experiences of other designers
   • Architecture principles
           o Rules one has to follow, eg; front office focus’ on customers while back office
               focus’ on efficiency
           o Emphasize ‘doing the right things’ or give direction to behavior and are often
               based on proven practices
           o Are expected to give significant improvement
   • Implementation principles
           o Helps to translate the architecture into implementation
           o Can be used to develop prototypes
   • Standards
           o Technology standards, eg; HTTP, XML
           o Data standards, eg; Name before address, address contains street, no, zip
           o Application standards, eg; Oracle for databases
These can all be categorized using layers or other categories
Principle-based Design and Architecting
Principles are leading instead of models. Useful when solving ill-structured ‘complex’
problems, which cannot be formulated in explicit and quantitative terms.
à Principles guide the designer in a certain direction, are generic by nature and do not
constrain creativity or possible solutions.
Name of the principle
Statement; what we do
Rationale; why we do it
Implications; when we (don’t) do it
E1: Digital Sandbox, Bharosa
Building the next generation of public services requires a Digital Sandbox
The government is there to help and support the citizens. Yearly, 125 billion euros on public
services in the Netherlands, these costs are rising. While citizens expect these services to be
free, available, and well-established (personalized services, digital inclusion, proactive
services, responsible data sharing, life event support). To provide these services, data
exchange takes place. Atm; most agencies use portals. One-way vs Two-way information
flow portals. Companies do not have these portals yet for Standards Business reporting.
There are also many calls for improvement; too difficult and vague to apply for subsidies or
services. Unfortunately, many recent big ICT projects have failed to improve this situation.
What to do?
GovTech: Startups and SMEs want to provide public services directly to citizens,
(HuurPaspoort, Cleverbase).
Barriers for public service innovation:
   1. Lack of guidance
   2. Access to data at government agencies
   3. Building blocks from government agencies
   4. No shared learning experimentation platform
   5. Heterogenous/ non interoperable public service building blocks
   6. Complex mix of rules and regulations
   7. Lack of funding
   8. Unclear process for transition from experiments to role out
Way of working: Innovation pipeline. Connecting research with policy making; the policy
cycle.
    1. What is Digicampus?
The quadruple helix approach to public service innovation.
Combines: Government Agencies, User groups, Software providers (corporate & startups),
and Academia. The Digicampus is the Digital Sandbox for learning and experimentation.
Digicampus is helping to formulate the right agenda en policies for services. Combine
research with policy making.
   2. Why do we need a digital sandbox?
Barriers 1-5 can be solved by using a digital sandbox.
   3. What is a digital sandbox?
Goal; smoother transition from prototype to implementation.
   4. What are the high-level requirements of the digital sandbox?
   5. What are the use cases?
Over 100 calls for collaboration. Fe; Help elderly with digital authorization, help citizens with
personal with financial management, enable less tech savvy citizens to use voice
authentication via the phone, identify the barriers for digital inclusion.
Lecture 4
Modular Architectures and Technology – makes it possible for re-use of certain
(sub)systems. Thereby it only has to be developed once. Thereafter, it can be provided to
many other institutions. You do need one well-working UI.
   •   Understand principles of modular architectures
   •   Being able to modularize
   •   Understand basics of web services
   •   Knowledge about the main web services protocols (XML, SOAP, REST, WSDL, UDDI,
       BPEL4WS)
   •   Understand how web services can be used to create a loosely-coupled, modular
       application architecture
   •   Lean developments in web services protocol stack
Why modular architecture?
  • Reuse of “working and proven” modules – many ICT projects do fail
  • Shorter development time by reuse
  • Focus on integration and orchestrating of modules
  • Dealing with complexity: higher reliable systems
  • Flexibility to modify and alter systems
  • Building for modularity looks easy, but is challenging
        o Interface design and configuration is a key aspect
        o Information hiding: high cohesion within modules and loose coupling
             between module (Parnas, 1972)
                 § Providing the intended user with all the information needed to use the
                    module correctly and nothing more
                 § Providing the implementer with all the information needed to
                    implement the module correctly and nothing more
What is a module?
Objects and components
If something is developed and used as an object or component depends on your viewpoint
   •   Component-oriented                            •   Object-oriented (OO)
       programming focuses on                            programming focuses on the
       interchangeable code modules that                 relationships between classes
       work independently                            •   Objects are at a more granular
   •   Black box: Don’t require you to be                (smaller) level and serve as
       familiar with their inner workings                building blocks of larger
       to use them, but focus on the                     components/systems
       interface                                     •   OO enables software reuse
   •   Components typically serve a                  •   Once those classes are compiled,
       specific purpose and functionality                the result is monolithic binary code
       (fe; identification)
A component
   • is language independent
   •   A way of organizing and thinking about the runtime structures of a system
   •   Loosely coupling with components, you loosen the coupling between classes and the
       developers responsible for them
   •   Stateless: components can be replaced and substituted in near real-time dependent
       on interface
   •   Self-contained: enabling black-box reuse
   •   Component-and-connector model
   •   Combinations of components can be new components
An object
   • Can be viewed as a type of component
   • Is an abstraction and needs to be given context
   • An object’s class instance has specific attributes and behaviors
   • Encapsulation: implementation details are hidden, and only methods are exposed
   • Inheritance as a way to reuse – this requires knowledge about the implementation
       details of the base class
   • Polymorphism (many forms): subclasses can define its behaviors and attributes while
       retaining some of the functionality of its parent class
   • If multiple developers work on the same code base, they have to share source files.
       In such an application, a change made to one class can trigger a massive re-linking of
       the entire application and necessitate retesting and redeployment of all the other
       classes (White-box reuse)
Modularization Guidelines of Parnas
  • The effectiveness of a “modularization” is dependent upon the criteria used in
      dividing the system into modules
  • Parnas (1972) recommends that systems should be decomposed along lines
      encapsulating design decisions. Design decisions that are likely to result into changes
      need to be hidden.
  1. Minimize the interactions with the environment and standardize the services
      interfaces
  2. Create a well-defined interface and make a set of service level agreements
  3. Every component contains a logical cluster of business objects and information needs
      that can be used to operate a business process autonomously
  4. There should be clear interfaces describing the inputs, outputs and responsibilities to
      ensure accountability
  5. Establish governance mechanisms to integrate the components not only at the
      technical level, but also at the organizational level
Design principles for software modules
   • A module should capture a business function
   • A module should be self-contained (no information)
   • Communication between modules should be minimized (loosely coupled)
   • A module should be reusable, this is determined by:
           o The scalability
           o Interface extendibility
           o Ability to configure
           o Ability to replace
   •   Number of interactions
   •   Don’t forget the system response to the actions of actors
   •   Alternate courses of action are important
Design principles for creating a modular architecture
   • Information should be captured only once at the source and reused by other
       modules (coordination)
   • There should be a (central) process control component integrating business process
       steps with functionality provided by modules
   • The module should, whenever possible, be offered as reliable and proven
       commercial-off-the-shelf (COTS) software products supplied by a vendor
   • Be able to manage the quality of modules (QoS, performance, security, ..)
   • A module should be reusable and capture a business function
   • Use of versioning (extensibility, multiple instances)
   • Develop domain-specific modules (use of namespaces)
Orchestration: Is a way of controlling the dependencies between the modules.
Protocols
HTTP = Hypertext Transfer Protocol.
-) no service level support.
SOAP: Simple Object Access Protocol
    • Platform-independent protocol
    • Soap messages provide envelope in order to exchange structured data
           o Header: meta-information to process its contents
           o Body: data
-) not simple enough, therefore not a success
Web services
  • Convergence of technology streams
          o Ubiquitous infrastructure (IP, HTTP)
          o Proven approaches (COBRA, RPC)
          o XML
          o Business standards (EDIFACT, X.12)
  • Middleware for middleware
  • Middleware agnostic
  • RPC or messaging-based
  • Access remote applications
  • Accepted by most software vendors
Goal: To abstract business logic from implementation
   • Web services perform encapsulated business functions
   • Loosely coupled, self-contained, stateless properties (independent)
SOAP: provides an envelope around XML message in order to exchange structured
information.
UDDI: is a directory that offer a way to localize and register web-services.
WSDL: an XML-based protocol that shows the possibilities of a web-service.
REST: Representational State Transfer;
Based on underlying architecture of the WWW and its two-core specification; URIs and
HTTP. à simple and scalable, yet due to XML not able to read easily. More governance is
needed.
Difference SOAP request vs REST request
SOAP
   • Not bound to HTTP                           REST
   • WSDL interface and contracting                 • Simple
     support                                        • Suitable for simple CRUD (Create,
   • Performance                                      read, update, delete) applications
   • POST statement is needed and not               • High performance
     URL can be used
API: Application Programming Interface
API vs. Webservices
   • “An API is a set of functions and procedures that allow the creation of applications
        which access the features or data of an operating system, application, or other
        service”
   • An API acts as an interface to an application to enable communication
   • A webservice exposes an API over HTTP
   •   Only difference with a Webservice is that the latter facilitates interaction between
       two machines of a net
   •   In general, all webservices are APIs but not all APIs are webservices
   •   API can use any style of communication
   •   API is often part of the application, whereas webservice is only a wrapper
Webservice vs. Micro service
  • Both are language and platform independent
  • Microservices often perform a single function
  • Microservices have often a lower granularity and used at the programming level
  • Microservices are often used to breakdown a monolith software application into
      reusable components
  • Webservices are often HTTP-based, whereas microservices might not be
Typical roles of an ICT-architect
   • Create a library of reusable components
   • Managing library of components
   • Enabling reuse of components when projects develop a new component (which
        comes at a price)
   • Ensuring interoperability, adaptability, scalability, security, etc of components
   • Stimulating reuse
How to determine services?
  • Coarse or fine grained services?
          o Business / Composite / Application
  • Top down. Vs bottom up
          o TD; Strategy processes and areas of business (DCE) (preferred, yet not always
             possible because ‘no green field’)
          o BU; existing applicating services
  • Functional vs. process based
          o Function; derived using use-case diagrams
          o Business processes; take interdependencies into account
          o For both case and alternative scenarios to support the life-cycle of systems
Lecture 5
Hints and tips on mid-term presentations;
   • Do not forget the societal side of the problem (not only the technology)
   • Opposing and conflicting requirements (also between stakeholders)
Delft’s Architectural and Design Framework
Continues on presentations from other groups
Lecture 6
Big data quality and data architecture; quality is enhanced when the architecture is stable.
Big data makes for the need for a better architecture.
   •   Understand Big Data characteristics and impact on information quality
   •   Being able to evaluate a dataset and an information architecture based on
       information and systems quality
   •   Understand why various views need to be taken into account when designing on
       information architecture and understand its limitations and benefits
   •   Gain an overview of information quality improvement methods
   •   Know information architecture basics: decoupling point, information flow vs. store
       approach, stewardship
Why is information quality such an issue?
à Data glitches = systemic changes to data which are external to the recorded process
    • Changes in data layout / data types
           o Integer becomes a string, fields swap positions, etc
    • Changes in scale / format
           o Dollars vs. euros
    • Temporary reversion to defaults
           o Failure of a processing step
    • Missing and default values
           o Application programs do not handle NULL values well..
    • Gaps in time series
           o Especially when records represent incremental changes
    • Missing data
           o Match data specification against data – are all attributes present?
System data /= real data
Can be due to the system (crashing, sending, recovering) as well as due to human mistakes.
Definitions
Information quality (IQ) is the characteristic of information to meet the functional,
technical, cognitive, and aesthetic requirements of information producers, administrators,
consumers and experts.
Quality information is information that meets specifications or requirements.
Information quality: a set of dimensions describing the quality of the information produced
by the information system. Information quality is one of the six factors that are used to
measure information systems success.
à Information Quality is fit for use.
Not all information is always needed, think of the dimensions that are needed. Fe: accuracy,
completeness, timeless.
It is subjective. Think of what is quality?
     • Depends on the stakeholders’ view and context
     • The dimensions (has many aspects)
     • ‘how’ it has been measured.
IQ framework
 Perspective                  Criteria
 Content                      Relevance, obtainability, clarity of definition
 Scope                        Comprehensiveness, essentialness
 Level of detail              Attribute granularity, precision of domains
 Composition                  Naturalness, identifiability, homogeneity, minimum
                              unnecessary redundancy
 Vies consistency             Semantic consistency, structural consistency, conceptual
                              view
 Reaction to change           Robustness, flexibility
 Values                       Accuracy, completeness, consistency, currency/ cycle time
IQ dimensions
    • Accuracy
    • Timeliness
    • Relevance
    • Quantity
    • Completeness
IQ issues (system)
    • Format
    • Security
    • Consistency
    • Availability
System quality issues
    • Accessibility
    • Response time
    • Reliability
    • Flexibility
    • Integration (Inter-operability)
Quality improvement techniques
   • Standardization
   • Record linkage (connect data referring to the same object)
   • Data and schema integration (master data management)
   • Source trustworthiness (stewardship, selecting trustworthy data, recollecting of data
       at source)
   • Process control (checks and control procedures)
   • Process redesign (reward accurate data entry)
Retrospective improvement techniques
   • Data audits and reviews
   • Cleaning focus (duplicate removal, merge/purge, name & address matching, field
       value standardization)
   • Acquisition of new data
   •   Error localization and correction
   •   Cost optimization (cost-benefits)
V’s of big data à influence the quality
    1. Volume
    2. Velocity
    3. Variety
    4. Variability
    5. Veracity (accuracy)
    6. Validity
    7. Volatility
    8. Visibility
    9. Viability
    10. Vast resources
    11. Value
General Data Protection Regulation (GDPR)
Data storage vs. deletion
Individual vs. aggregated data
Storage of transactions (proof)
Privacy-by-design
Data portability – how well can you transfer the data to other systems?
Information as an asset – it is the ‘glue’ in many processes and organizations.
à the vision is to have an information infrastructure that is able to answer all kind of
questions.
A variety of information needs
   • States of a product request (operational)
   • Improvement of service (operational)
   • Making of special offers (marketing)
   • Do your customers like your services (sentiment analysis)
   • Who are the most beneficial customer (customer management)
   • To gain an overview of all products bought by one user (customer management)
   • To identify trends and developments (business intelligence)
   • To determine if decisions are just and fair (control and accountability)
   • And many more
Who is responsible for maintaining the data?
What is the information quality?
When there are conflicts between sources: which source has the right information?
How can redundancy be avoided?
Response time and speed?
Who has the authority to remove low quality or hackers?
Chinese walls = not being allowed to collect data from one site to another
Always think about the system and information quality, it differs per context!
Opposing and complementary approaches
   • The information systems solution
          o Brings the information required to the persons using information systems
          o Technology is used to solve the problem
          o Fe: create multi-purpose portal
   • Organizational redesign solution
          o Requires the redesign of organization structures
          o Decision-making is where the pertinent information is
          o Fe: decision-making as much as possible in the front office
   • “information management” and “knowledge management”
          o Information management: focuses on the business processes and functions
             that create, manipulate, and manage information
          o Knowledge management: focuses on how organizational units interact and
             how organizational units add to the store of information
Information architecture
   •   Is a blueprint describing the relationship between the business processes,
       applications and information sources aimed at storing, processing, reusing and
       distribution of information across information resources
   •   The organization of information to aid information sharing among actors
   •   The information architecture determines which information will be stored in which
       database, application, software components and so on. It is a meta information
       model
   •   Helps to navigate and find the right information. The art and science of structuring,
       organizing and labeling information so people can find it
   •   High-level map of information requirements of an organization
Information stewardship principle
Information needs and roles
    • Managers
    • Customers
    • Administrative staff
    • ICT staff
    • …
What to do with the information?
  • Compile a registry
  • Develop a “yellow page” (library)
  • Construct a proto ontology
  • Map flows, sequences, and dependencies among organizational units and business
      processes
  • Identify:
          o Knowledge stewards
          o Gatekeepers
          o Isolated islands
          o Narrow communication channels
          o Improvement points
Examples of information architecture principles
Information store vs. Information flow approach
    • ‘Need to know’ principle
    • Data is driven by incoming data instead of queries
Information decoupling point
Where is the information being stored?
Lecture 7
Process architecture: from business processes management to webservice orchestration.
Business process management tools – BPM tools
Know difference between BPM and webservice orchestration… Depends on application and
technology.
   •   Understand complexity of determining services
   •   Understand relationship between processes and services
   •   Know the terms; workflow, WFM and BPM, BPMN, XPDL, BPEL and webservice
       orchestration
   •   Know typical trade-offs
   •   Being able to make trade-offs given a case study
Process architecture provides an abstract overview of all processes, the business
units/departments/persons involved and their relationship with other processes, businesses,
information and application architecture.
    • Abstract: No too detailed, detailed enough to determine the impact.
    • A process: represents the sequence of activities performed to accomplish a certain
       business function as a hierarchical representation of process steps, subprocesses,
       activities, tasks and decisions.
    • At least three types of processes:
           o Product or primary process (“critical processes”)
           o Supporting processes (finance, HRM, …) (secondary processes)
           o Control and management processes (secondary processes)
Where to put to “needed information” (user, case, product?) in the overall process.
“True Business Process Management is an amalgam of traditional workflow and the ‘new’
BPM technology. It then follows that as BPM is a natural extension of – and not separate
technology to – workflow, BPM is in fact the merging of process technology covering 3
process categories: interactions between (i) people-to-people; (ii) systems-to-systems (iii)
systems-to-people – all from a process-centric perspective. This is what true BPM is all
about.” – Jon Pyke, CTO Staffware.
In combination with Lean Six Sigma – Who is in control à process owner and authority?
Basic idea of Workflow Management Systems (WFMS)
   • Separation of processes, resources and applications
   • Focus on the logistics of work processes, not on the contents of individual tasks
   • Process perspective (tasks and the routing of cases)
   • Resource perspective (workers, roles)
   • Case/data perspective (process instances and their attributes)
   • Operation/application perspective (forms, application integration)
   • Control perspective (progress, tracking and tracing)
Separation of Control and Execution
Control layer = WFM, Application layer = execution
Users in BPM
 WSDL             Service presentation                    Functionality description
 UDDI             Service registration and publications   Publication of service interfaces
                                                          (service level)
 BPMN             Service selection and composition       Selection of service providers,
                                                          comparison, and customization
 BPEL4WS          Service execution                       Combining existing and new
                                                          services to execute a process
Is this a service architecture of a service library?
A display of all process that are being handled by several applications.
à consider Zachman framework; No time axes, no (little) relation between cells. As is to to
be situation not shown. Therefore, more a library / overview of the services, not an
architecture.
Web Service Orchestration (WSO) defines the control and data flow between web services
to achieve a business process. Orchestration defines an “executable process” or the rules for
a business process flow defined in an XML document which can be given to a business
process engine to “orchestrate” the process, from the viewpoint of one participant. (Carol
McDonald, SUN)
Business Process Execution Language for Web Service (BPEL4WS):
XML Process Definition Language (XPDL):
  • A XML-based language to interchange business process definitions between different
      BPM products
  • Standardized and maintained by the Workflow Management Coalition (WfMC)
  • XPDL defines an XML schema for specifying the declarative part of workflow /
      business process
  • XPDL is often used for the exchange of BPMN diagrams
  • XPDL contains elements to hold graphical information and executable aspects,
      whereas BPEL focuses exclusively on the executable aspects of the process (no
      graphical aspect)
Business Process Modeling Notation (BPMN)
   • is a standardized graphical notation for drawing business processes in a workflow.
   • Maintained by the Object Management Group (OMG)
   • Enables communication between business and ICT
   • BPMN is constrained to support only the concepts of modeling that are applicable to
       business processes
   • Four basic element categories
          o Flow objects (events, activities, gateways)
          o Connecting objects (sequence flow, message flow, association)
          o Swimlanes (Pool, lane)
          o Artifacts (data objects, group, annotation)
   • Can be executed in BPEL
www.bpmn.org
Who is the orchestrator (the one in charge)? à can be more in one process!
The one that is in charge, is also responsible. Who manages the flow? To whom?
Granularity of services:
Transparency of underlying processes: White-box / Grey-box / Black-box
Data and information flow
Balancing central and decentral management
Which roles should be executed central of decentral?
Central IT: Facilitating reuse and sharing of business processes, support making of
agreements, overseeing funding and investments, initiating service portfolios, change
management initiation, mandate standards and processes.
Business units: business case, prioritization, service levels, change board, defining standards,
services, processes, policies.
Decision table / business rules are more applicable when there are many ‘if-then’
statements. It is better than a flow diagram because it will be simpler.
Design Guidelines – Orchestration (SAME AS ‘rules modular design)
   • Information should be captured only once at the source and reused by other
       modules (coordination)
   • There should be a (central) process control component integrating business process
       steps with functionality provided by modules
   • The module should, whenever possible, be offered as reliable and proven
       commercial-off-the-shelf (COTS) software products supplied by a vendor
   • Be able to manage the quality of modules (QoS, performance, security, ..)
   • A module should be reusable and capture a business function
   • Use of versioning (extensibility, multiple instances)
   • Develop domain-specific modules (use of namespaces)
E2: Introduction to Data Management, Data Privacy & Data Security
Nick Martijn, CRANIUM
Data management is the practice of organizing and maintain data processes to meet
ongoing information lifecycle needs.
Data privacy focusses on the identification and mitigation of the risk of non-compliance with
the requirements of the GDPR in Europe, ensuring proper handling of personal identifiable
information.
Data security focuses on the implementation of the technical measures of the GDPR and all
other activities necessary to safeguard confidentiality, integrity, and availability of
information.
   1. CRANIUM
Belgium based company, mainly focusing on data management in relation to data privacy
and data security (consultancy). Active in projects in 12 countries world-wide. KPMG did
many assessments, gather situation and bring out advice. CRANIUM is more involved in the
implementation of the advice (data management etc).
Topics: GDPR Compliance, Information security, Data strategy, Test data management, Data
governance, Big data and advanced analytics, Privacy and security retainer, data retention
and deletion, internet of things. à data-driven enterprise.
Approach: First aid, support, control, improve.
    2. Enterprise Data
Complexity due to organization.
What data the company has? What external sources to data a company has? How they need
to manage and secure and protect that data?
Consider the (main) data flow.
4 axis: Data / People & governance / Processes / IT
Many linked together systems (APIs), much duplicated data, extremely hard to get an
overview.
Data management is more than just buying extra applications!! Must consider the human
aspect as well, since they interact with the processes and IT. The consult is often more a
‘training and educating people’ than ‘fixing the environment’.
Structured vs unstructured
Internal vs external
Static vs non-static
The Case for data
Compliance: Comply to legal requirements in order to maintain ‘license to operate’, Take
care of personal data, make sure there is only one version of the truth (no duplicates).
à GDPR, Regulatory reporting, financial statements, other data quality regulations, risk
management, other data quality regulations, BCBS239 BASEL lll Solvency ll. (CRO / CFO)
Efficiency: Accurate and timely management information, Less maintenance costs, Sharing
data and knowledge throughout the enterprise.
à Improved process efficiency, improved decision making, decreased running costs (OpEx),
Data-driven enterprise, knowledge management, management information, decreased
capital costs (CapEx), Business-IT alignment (through data). (CIO / COO)
Growth: Innovative revenue models, improved insights in the requirements of the customer.
à Artificial intelligence, big data/ data lakes, data driven business models, advanced data
analytics, internet of things. (CEO)
CDO should be added within companies? Chief Data Officer.
   3. Data Management
Two parts: Data Life Cycle & Data Management.
To define the maturity of the DMM within a company.
Data governance who is the owner of the data? How do the stakeholders within a company
work with the same data? à everyone is using the data, nobody feels responsible.
Rules and roles need to be established to safeguard the privacy and security of the system.
Check: Data Lineage
    4. Data Privacy
Start with Data Minimization.
Main concept: Principles, Legal ground, Subject rights, Processing, Data breaches, DPO.
à Principles: transparency, accuracy, purpose limitation, data minimization, integrity and
confidentiality.
   1. Collect only what you need
   2. Do not make useless copies
   3. Safeguard the quality of data
   4. Discard data when obsolete
Privacy by default (and design) is the new advancement that is being considered.
Full back ups vs Incremental back-ups
   5. Data Security
How to get a grip on the risk that a company has.
Who can access what data?
Check: DMB model
Block / disable hardware or by Training people.
ISMS (based on ISO27x) = Need to have everything in place = Information Securty
Management Systems
   6. Client Cases (eg)
 Main Challenges                               Solutions
 Compliance to GDRP                            Information Security Management Systems
                                               (ISMS)
 Compliance to health & safety guidelines      Data privacy officer role
 (regarding fe the use of alcohol and drugs)
 Data security concerns                        Data privacy impact assessment
                                               Renewed Code of Conduct
                                               Privacy policies and measures
Lecture 9 (!!)
Enterprise Application Integration (EAI) and Middleware
   •   Learn types of EAI approaches
   •   Learn various classifications of middleware technology
   •   Learn and understand characteristics middleware technologies:
          o RPC, MOM, Transaction monitors, brokers, database, Distributed objects
   •   Understand challenges of and solutions for distributed transactions (!)
   •   Be able to select integration approaches and middleware technology
Data integrity
   • Refers to that the information stored in a system corresponds to what is being
       represented in reality.
   • Refers to aspects like consistency, security, reliability, timeliness, non-repudiation,
       non-manipulation, that need to be warranted
   • CIA Triade – which are conditions for information security :
           o Confidentiality – Data should only be accessed by authorized persons
           o Integrity – ensures that data is accurate and consistent. In other words, data
               stored in a system should correspond to what is being presented in reality.
           o Availability – authorization to make data available at the right time to fulfill a
               need
   • Ensuring data integrity requires data management/governance and middleware
Almost all data and system quality dimensions
Batch oriented systems are often not available all the time (maintenance, back-ups, update)
Objects can be programmed and interact with each other.
Database replicators = copy of a database (for faster response time)
Batch data extraction = not continuous
Legacy applications = a software program that is outdated or obsolete. Although a legacy
app can still work, it may be unstable because of compatibility issues with current OSes.
Wrappers = are used in front of a legacy system, they provide access to the system.
Application integration approaches
   • Information-oriented
           o Data replication
           o Data federation
           o Interface processing
           o Semantic integration
   • Service-oriented
   • Portal-oriented
   • Business process-oriented
Often mixed approaches are being used.
The point of departure (business, information, databases, applications) often influence the
outcomes.
The ‘no green field’ (=path dependencies) might block certain approaches.
Information-oriented integration
   1. Interface processing: App A à API à App B
   2. Data replication: Datab A ß à Datab B
          a. Very easy to replicate data; minimize risks and increase speed
   3. Data federation: Virtual Datab 1 ß àààà Datab A, Datab B, Datab C, Datab D
          a. Often older databases, virtual datab is set up so that it feels like ‘one’ datab
   4. Semantic integration: A network of connected nodes
          a. Bottom-up approach (No standardization…)
Service-oriented integration
   • Using web-services
   • Reusability
   • Need for transaction support (if one of the Apps is not working!! Causes
       inconsistencies)
   • Limited view on Service-oriented Architectures (SOA)
à A composite application is made that consists of App A, App B, App C for example.
Portal-oriented integration (Externalizing information)
   • Web-services
   • Common interface
   • Heterogeneous content
   • Externalizing information
   • User integrates
Human à Web browser à Portal Server à Internet, Datab, App, Office
Very easy to use, not always considered to be a real Application Integration because ‘the
human’ is doing the integration job. Important to have a common user interface and have
heterogenous content on the back.
Business Process-oriented integration
Example of the webservice orchestration
   • Process and service oriented
   • Control information (business process and the data)
   • Combining middleware technology and business process automation (BPM)
   • The future of EAI (according to some)
Highly intuitive form
Middleware classification
   • Interaction patterns (1-1, 1-n, n-n)
   • Synchronous or asynchronous
         o Directly (in real time = the internet) or not directly (example = email)
   • Connection-oriented or connectionless
   • Language specific or independent
   • Proprietary or standard-based
         o Dependent or Object-management-group standards
   • Embedded vs. Enterprise
         o Embedded = “hidden”, is more and more applicable (all IoT devices) – Lack of
             security and lack of processing power often
Vree’s model on middleware
Asymmetric very important for Personal Data – not all data in the same place
Interaction patterns (directed)
Conversational mode is the ideal situation – a conversation can happen
Synchronous vs Asynchronous communincation
For synchronous: immediate reaction is expected (seconds), not doing anything else till a
response is received, error is seen immediately. (Calling)
Asynchronous: Emailing.
Middleware language
Standard (windows)
Interoperability                                    Proprietary (stock market)
Easy replacement                                    Cover areas standards do not address (yet)
Economies of scale resulting in low-costs           Differentiate from competitors
Longer life-cycle                                   Customer lock-in
Long-lasting standardization processes              Support can be part of buying process
Support not dependent on one vendor
Levels of middleware (main focus is application)
   • Application
   • Domain-specific middleware services
   • Common middleware devices
   • Distribution middleware
   • Host infrastructure middleware
   • Hardware devices
Everything that is used to support this lecture reaching the listener
Approaches to integration
Most preferred to do it on the data level, yet most difficult. The internet for example is only
on UI integrated.
Data level integration
   • Typically, relatively easy approach
   • Extract data directly from databases
   • Most applications make it possible to circumvent their business logic and access data
       directly
   • Transform data
   • Frequency:
           o Scheduled
           o Instantaneous
           o Triggers
Application level integration
   • Method level: distributed computing
   • Often integration based on accessing APIs
   • API exposes application service to outside world
   • API functionality dictates how an application can be accessed
           o Business process
           o Low level services
           o Data
   • Very wide variety of levels of services and quality of APIs, some are extremely
       complex
   • Wrappers: provide an interface based on some standard, e.g. Corba, Java, .net,
       webservices
   • Wrappers expose business services as methods in an interface
   • Wrappers requires effort to build, test, and maintain (disadvantage)
User-interface integration
   • Sometimes only way an application logic can be called
   • Screen scraping (green screen)
           o Application thinks it is interacting with users
   • Primitive but often very necessary
           o Not always efficient navigation
           o Not generally scalable
           o Most follow interface format, extract results and ignore formatting returned
              from applications
           o High maintenance costs
   • Extremely relevant for internet (HTML)
7 types of middleware
    • Remote Procedure Call (RPC)
           o Client-server interaction that makes it possible for the functionality of an
              application to be distributed across multiple platforms. Local program
              requests a service from a program located on a remote computer, without
              having network details. Used for synchronous data transfers, where client and
              server need to be online for the communication.
    • Message Oriented Middleware (MOM)
           o Becomes less complicated to use application spread over various platforms.
              Enables the transmission of messages across distributed applications. Also has
              a queuing mechanism the allows the interaction between the server and the
              client to happen asynchronously. (Overlap message brokers)
    • Message brokers
           o Communication by using queues supporting asynchronous and synchronous
              message passing. Validity check on data structures and completeness.
              Database for supporting publish-subscribe models. (is the central / receiving
              system of messages)
    • Database middleware
           o Between the databases and the applications. Call-level interface (CLI)
              between databases (drivers) and applications.
   •   Transaction Processing (TP)
           o Two major types: TP Monitors and Application servers. Transaction; unit of
              work consisting of a number of interactions with a beginning and an end.
              Generally: tightly coupled, method sharing, need to change source and
              destination IS for transactions. à strict monitoring. Two phase; prepare and
              commit.
   •   Application service (wrappers)
           o TM becomes applications server by incorporating application logic, typically
              web-enabled, more and more functionality of message brokers included:
              messaging, transformation, intelligent routing.
   •   Distributed objects (very complicated)
           o Middleware or application development? Creating distributed applications,
              for cross-enterprise method sharing. E.g. Corba and DCOM. Client = stub,
              server = skel.
(object) Transaction Monitors (OTM) (USED TO BACK-UP FOR INFORMATION/TRANSACTION
LOSSES)
   • Ensure that sequence of actions is committed or rolled back
   • Incorporate application logic encapsulated in a transaction
   • Often use of persistent message queues
   • Creates overhead
   • ACID requirements
           o A: Atomic – number of tasks and interactions are executed in its entirely
           o C: Consistent – state of all applications is similar
           o I: Isolated – Other application only transact with the TM as they were alone
           o D: Durable – Redoing and undoing of changes data is not lost
OTM vs. Message brokers
  • OTM
         o Synchronos communication
         o CORBA, DCOM, RMI, …
  • Message brokers
         o Asynchronous communication
         o Messages XML, SOAP
         o Queuing
  • Combinations
         o COM+ = COM, MTS (Microsoft Transaction Server) en MSMQ (Microsoft
            Message Queue)
Lecture 10 – Concurrency and transactions in distributed architectures
Difficult topic need for understanding of the concepts.
   •   Understand the potential use of distributed ledger technology (blockchain), smart
       contracts, zero proof of knowledge
   •   Be able to explain the problems and the need for transactions mechanisms for
       distributed application architecture (due to the loss and manipulation of information)
   •   Be able to explain the working of concurrency (do things in parallel) and locking
       mechanisms, 2PL, and deadlock control and blockchain
   •   Be able to explain transactions concepts, 2PC, ACID properties, transaction control,
       transaction monitors, roll back (undo everything what you’ve done) and
       compensation
Examples of Distributed architectures:
Insurance company (having data stored at multiple locations)
Multiple servers (S1 waits for S2 to process and proceed, S1 holds in between à deadlock)
Data replication (Users and data are distributed, how to process change?)
Transaction = a series of actions, carried out by a user or application, which accesses or
changes contents of database or other system.
Distributed systems = having everything decentralized, which makes is more secure and
faster. But also, more difficult. Book a flight, hotel and rent a car à Get insurance policy
information which is distributed over a number of (legacy) systems.
Databases = logical unit of work on the database à transforms data from one consistent
state to another, although consistency may be violated during transaction
Transaction can result in:
   • Success – transaction commits and database reaches a new consistent state
   • Failure – transaction aborts, and database must be restored to consistent state
       before it started à is rolled back or undone
   • Committed transactions cannot be aborted
   • Aborted transaction that is rolled back can be restarted later
Properties (ACID):
   • A: Atomicity = ‘All or nothing’ property
   • C: Consistency = Must transform database from one consistent state to another
   • I: Isolation = Partial effects of incomplete transactions should not be visible to other
       transactions
   • D: Durable = effects of a committed transaction are permanent and must not be lost
       because of later failure
Clearing transactions
A centralized ledger tracks asset movement within the financial system between institutions.
Users have a key to make transactions, which contain a timestamp. Ledger is for storing all
transactions and creates a list. Ledgers are maintained by banks or intermediary and need to
be secured. The key issue is how to secure the ledgers and not being able to manipulate it.
The solution: Distributed ledger technology (= Block chain)
    • Distributed autonomous ledger
           o Timestamped blocks that hold batches of valid transactions
           o Each block includes the hash of the prior block
           o The linked blocks form a chain
           o The blocks are distributed and synchronized (=distributed ledger)
           o Creating new blocks is known as mining
    • Integrity is created by distributed consent (majority voting)
    • Every node in a decentralized system has a copy of the block chain
    • Longest chain represents the truth
Opinion: Not suitable for personal information because the information is open and can be
accessed by everyone.
Problem: The blockchain gets longer and longer, more processing power is needed.
Therefore, sometimes an older part of the blockchain is stored somewhere. ‘Tangels’ can also
be an option.
Old: 10 people à 1 Database
New: 10 people à X number of Ledgers (=distributed ledgers, many nodes: data integrity)
Possibilities of blockchain:
   • Enabling tokenization
   • Time proof sealing
   • Data record immutability
   • Viewing history
   • Automatic execution of transactions (smart contracts)
   • Various blockchain infrastructures have different properties
Types of blockchain infrastructure (Think of who has access?) (!!)
Try to minimize what you store inside the blockchain.
Evaluation of the blockchain technology
Smart contracts describing inputs needed resulting in actions.
  • Self-executing contracts with the terms of agreement between buyer and seller being
       directly written into lines of code
  • Code and agreements are contained and stored in a distributed ledger
  • The code controls the execution, the transactions are trackable and irreversible
  • No central trusted third party (TTP) needed
Zero-knowledge and blockchain
   • Need: Validate cryptocurrency transactions managed on a blockchain and combat
       fraud without revealing data about which wallet a payment came from, where it was
       sent, or how much currency changed hands?
   • Why? Protection of personal data related to the identity of individuals (date of birth,
       bank statements, transaction histories, education credentials)
   • Zero-knowledge proof is a method by which one party (the prover) can prove to
       another party (the verifier) that they know a value x
           o Without disclosing any information apart from the fact that they know value x
           o Statement being proved must include the assertion that the prover has such
               knowledge to avoid fraud
   • A zero-knowledge proof of knowledge is a special case when the statement consists
       only of the fact that the prover possesses the secret information.
Properties:
Soundness – Everything that is provable is true, no cheating
Completeness – Everything that is true has a proof
Zero-knowledge – only the statement being proven is revealed
à eg; client password when login on a system
The collection of technologies contributes to benefits; not only blockchain or only smart
contracts.
Governance OF blockchain (programmers) // Governance BY blockchain (zero-knowledge)
Concurrency Control
   • Process of managing simultaneous operations on systems without having them
      interfere with one another
   • Prevents interference when two or more users are accessing database or shared
      object simultaneously and at least one is updating data
   • Although two transactions may be correct in themselves, interleaving of operations
      may produce an incorrect result
   • Potential problems:
          o Lost update problem
                 § Successfully completed update is overridden
          o Uncommitted dependency problem
                 § Occurs when one transaction can see intermediate results of another
                     transaction before it has committed (might be rolled back)
          o Inconsistent analysis problem
                 § Occurs when transaction reads several values but second transaction
                     updates some of them during execution of first transaction
Check the read(x) and write(x) !! these are important
Solutions to the problem:
Serializability: Objective of a concurrency control protocol is to schedule transactions in such
a way as to avoid any interference.
    • Could run transactions serially, but this limits degree of concurrency of parallelism in
        system (speed!!)
    • Serializability identifies those executions of transactions guaranteed to ensure
        consistency:
            o Schedule – Sequence of reads/writes by a set of concurrent transactions
            o Serial Schedule – schedule where operations of each transaction are executed
                 consecutively without any interleaved operations from other transactions
    • No guarantee that results of all serial executions of a given set of transactions will be
        identical
Precedence graph to show ‘who waits for who’.
If it is a cycle, it is not serializable.
Distributed Transaction Management
    • Divided into a number of sub-transactions, one for each site that has to be accesses
       represented by an agent
    • Systems must ensure indivisibility of each sub-transaction:
       • Synchronization of sub-transactions with other local transactions executing
           concurrently at a site
       • Synchronizing of sub-transactions with global transaction running simultaneously
           at same or different sites
Phase Commit (2PC) Protocols
   • Governs whether a transaction is to be aborted or carried
   • Can be sued for nested transactions
   • Two phases: Voting phase and Decision phase
   • Coordinator asks all participants whether they are prepared to commit transaction:
         o If one participant votes abort, or fails to respond within a timeout period,
             coordinator instructs all participants to abort transaction (veto)
         o If all vote commit, coordinator instructs all participants to commit
   • All participants must adopt global decision
Two-phase Commit (2PC)
If participant votes abort, free to abort transaction immediately (on your own)
In bitcoin, it is ‘majority vote’ since it will happen more often that one node does not respond
and thereby aborts the whole transaction
Summary Concurrency
Locking can be used to deny access to other transactions and so prevent incorrect updates
   • Most widely approach to ensure serializability
   • Generally, a transaction must claim a shared (read) or exclusive (write) lock on data
       item before read or write
   • Lock prevents another transaction from modifying item or even reading it, in case of
       write lock
Locking rules:
   1. If transaction has shared lock on item, can read but not update item
   2. If transaction has exclusive lock on item, can both read and update item
   3. Reads cannot conflict, so more than one transaction can hold shared locks
       simultaneously on same item
   4. Exclusive lock gives transaction exclusive access to that item
   5. Some systems allow transaction to upgrade read lock to an exclusive lock, or
       downgrade exclusive lock to a shared lock
à two-phase locking (2PL)
   •   Transaction follows 2PL protocol if all locking operations precede first unlock
       operation in the transaction
   •   Two phases for transaction:
          o Growing phase – acquires all locks but cannot release any locks
          o Shrinking phase – releases locks but cannot acquire any new locks
2PL to prevent Lost Update Problem
2PL to prevent Uncommitted Dependency Problem
2Pl to prevent Inconsistent Analysis Problem
   •   Deadlock is an impasse that may result when two (or more) transactions are each
       other’s waiting locks held by the other to be released
   •   Deadlocks should be transparent to the user, so DBMS should restart transactions
   •   Three general techniques for handling deadlock:
          o Timeouts (monitoring)
                  § abort one (or both) transaction(s), commonly used
                  § Disadvantage: A lock may be aborted without a deadlock & penalizing
                      long-running transactions
          o Deadlock prevention
                  § Looks ahead to see if transaction would cause deadlock and never
                      allows deadlock to occur
                          • Wait-Die- only an older transaction can wait for younger one,
                              otherwise transaction is aborted (dies) and restarted with
                              same timestamp
                          • Wound-wait- only a younger transaction can wait for an older
                              one. If older transaction requests lock held by younger one,
                              younger one is aborted (wounded)
          o Deadlock detection and recovery
                  § Monitor deadlocks to occur and breaks it
                  § Monitor constructs of wait-for graph (WFG) showing transaction
                      dependencies
                          • Create a node for each transaction
                          • Create edge T1 à T2, if T1 waiting to lock item locked by T2
                  § Deadlock exists if and only if WFG contains cycle
                  § WFG is created at regular intervals
Timestamping is transactions ordered globally so that older transactions, transactions with
smaller timestamps, get priority in the event or conflict
à Conflict is resolved by rolling back and restarting transaction
No locks so no deadlock
Four main locking strategies:
   1. Dirty read: where apps may read data, which has been updated but not yet
      committed to a database
   2. Committed read: where apps may not read dirty data
   3. Cursor stability: where a row being read by T1 is not allowed to be changed by T2
   4. Repeatable read: All data items are locked until a transaction reaches a commit point
Lecture 11 – Data architecture & governance in practice in a multi-actor domain
Guest lecture by Verdonck, Klooster & Associates (VKA)
   •   Data > information > knowledge (differences)
   •   Data fundament = data management
           o Data architecture patterns
           o BI architectures
   • Data exchange and sharing
   • Artificial Intelligence and data science
   • Our experience in practice DO’s and DON’T’s
Architecting as a way of structuring reality for others.
The concept of data-information-knowledge (Boisot, 2004) (Data >Information> knowledge)
Data is subjective since it has been perceived. The same ‘outside world’ data can be
perceived differently. How does this affect AI? The same…
Data fundament = Data management
Data, Management, Body, Of, Knowledge
à a practitioner’s guide (DMBOK) is a knowledge base and 10 data management areas
model:
Data governance is not the same as data management.
Data governance is at the heart of the DMBOK model, focuses on the question: Who to do
with data? And How to organize it? Exercise of authority and control of data assets.
Ownership, governance of everyday business, strategy of the organization for data.
Data management focuses on the implication of data governance on fe the architecture.
Data governance
   1. Data is an asset and production factor
   2. Data has an owner and a steward (accountable+responsible)
   3. Data has a lifecycle (metadata model)
   4. Data governance is a responsibility of the CIO/CDO
   5. Data is documented in a data dictionary
   6. Data access is through authorization (need to know) OR data access is open (need to
       share)
   7. Master data is only altered at the source
   8. Data is validated on CREATE and UPDATE
   9. Data addition/ enrichment leaves master data intact
   10. Data is “Open by design”
Breaking down Data Silos starts with Governance.
Silos = collection of data that is isolated from other parts of the organization. à prevent free
flow of data within an organization.
The ownership is often fragmented.
Employees
  • Chief Data Officer (CDO): Oversees a range of data-related function to ensure your
      organization is getting the most from what could be its most valuable asset.
  • Data Stewards: are accountable for the day-to-day management of data.
  • Business Analyst / Data Translators: play a critical role in bridging the technical
      expertise of data engineers and data scientists with the ‘business’.
  • Data scientists: are analytical data experts who have the technical skills to solve
      complex problems.
  • Data architects: conceptualize an visualize data frameworks; Data engineers build
      and maintain them.
Types of data
Structured:
    • Master data: Company’s own data such as client’s info, personal info
    • Reference data: common data such as NLD = Netherlands, from an external silo
    • Transaction data: Reflection of a transaction
    • reporting data: aggregated data, can be refined from Transaction data
Unstructured: (become increasingly important)
    • Documents
    • Media
    • Photographs
Technology changes rapidly, but data is relatively stable. Master data forms the collective
memory of an organization. Focus shifts from internal à external data.
Master data Management
A wheel within DMBOK scheme
   • External codes
   • Internal codes
   • Customer data
   • Product data
   • Dimension Mgmt
Data dictionary
Business object model
Data models
     1. Semantical (dictionary) - What is the meaning of a certain word/ object?
     2. Conceptual (Business objects) – Show relationship between products and actors fe
     3. Logical (object relations and attributes) – Customer has ID, address, account, IBAN
     4. Technical (database design) – How can we put all these things in a database?
4th layer (technical) is the only difference per design à there you choose which database to
use (Oracle for example).
Data architecture patterns
How to create value from data (data in itself is ‘dumb’)
Is also in the DMBOK scheme
3 patterns:
    • Business intelligence (BI) – How you organize it into a systematic architecture
    • Data science – analytics
    • Data sharing – between organizations or departments
BI Architecture
Crisp-DM model
    • Cross-Industry Standard Process for Data Mining
    • Widely used and standard for DS projects, Finished: no longer maintained
    • Non-proprietary
   •   Application/ Industry neutral
   •   Tool neutral
   •   Focus on Business issues
          o As well as technical analysis
   •   Framework for guidance
   •   Experience base
          o Templates for analysis
   •   Focus on Continues Evaluation &
                                                    CRISP-DM
Damhof Model: 1&2 Combined
Hadoop for Data Analytics and Use: (a sort of data lake)
Data discovery:
   • Keep data warehouse for operational BI and analytics
   • Allow data scientists to gain new discoveries on raw data (no format or structure)
   • Operationalize discoveries back into the warehouse
Data Exchange and Sharing
   • Canonical model (predefined dictionary)
          o Predictable data to exchange, closed
          o Partners are known
          o Design paradigm (design before actually knowing it)
   • Linked data (RDF, semantic web)
          o Flexible exchange, define local data context, open
          o Partners/ users unknown beforehand
           o Organic development paradigm (add new parts of the data at the moment)
Service delivery in chains
Artificial Intelligence and data science
    • Artificial Intelligence = A system that is capable of coming up with a solution to a
        problem on its own
    • Machine learning = Programming computers to optimize a performance criterion
        using example data or past experience
    • Data science = data science combines multiple field including statistics, scientific
        methods, and data analysis to extract value from data
AI and algorithms
NLP = Natural language processing
Three types of algorithms:
   1. Classification
   2. Regression
   3. Clustering
Examples are named.
Predictive policing
Predictive maintenance
Do’s & Don’t
Having one truth is very difficult
Why data quality is important
Don’t:
Start BIG (Think BIG, but act small)
Think you know what the business wants à analyze!
Look at the data without context to see if the correlation makes sense
Forget to assess quality; garbage in = garbage out
Thumb rules:
Always start with a clear business question
Know and engage the business domain
Semantics count! Understand the data!
Solve something quick, harvest small success
Remember the goal of the operation
Shared data must be used according to GDPR
Lecture 12
Project presentations
Lecture 14
Separate file on Brightspace with more Example Exam Questions.
Q1: What are characteristics of blockchain technology?
   • Ledger for storing transactions
   • Users have a key to make transactions
   • Timestamped blocks that hold batches of valid transactions
   • Each block includes the hash of the prior block
   • Every node in a decentralized system has a copy of the block chain
   • Longest chain represents the truth
Figure of Orlikowoski (1992) ; there is always an influence of the technology on the
governance. à Different meanings are given to the same technology. (=duality of
technology)
Extra information from the internet
GDPR
General Data Protection Regulation protects the consumer for the collection of data by
companies and governments. The collectors need to prove that you gave permission to
collect the data and proof that they handle this carefully and delete after a certain time.
Fines can be up to 4% of their annual revenue. Right of access and right to be forgotten are
applicable.
Remarks: Can be a burden to business while also still too vague to be applicable.
Web service vs. API
Both serve as a mean of communication. Yet, a web service facilitates interaction between
two machines over a network, while an API serves as an interface between two different
applications so that they can communicate with each other.
Web services can send requests in the form of JSON, XML, an HTML file, images, Audio, etc.
API: Not always Need for Network. (online & offline)
Web Service: Always Need for Network.
All APIs are not web services à All web services are API.
API: Light weight architecture and good for devices which have limited bandwidth.
Web service: No lightweight architecture. Require SOAP protocol.
API: Any style of communication.
Web Service: only three styles; SOAP, REST, XML-RPC.
API functions:
    1. Access to data
    2. Hide complexity
    3. Extend functionality
    4. Security (gatekeepers)
Middleware
“Software-glue”. Between operating system and user application for example.
Database – JDBC – Java application.
Meta information
Meta information is information about information. For example, if a document is
considered to be information, its title, location, and subject are examples of meta
information. This term is sometimes used interchangeably with the term metadata.
BPEL4WS
Is a standard executable language for specifying actions within business processes with web
services. Processes in BPEL export and import information by using web service interfaces
exclusively.
Data Lineage
Refers to the traceable path for specific critical data element (CDE) from end user report
upstream to the ultimate source (that path includes aggregated sources such as data
warehouse and data marts, operational data stores, staging areas, and transactional
system).
Data Lake
A system or repository of data stored in its natural/raw format. Usually a single store of data
including raw copies of source system data, sensor data, social data, and transformed data
used for tasks such as reporting, visualization, advanced analytics and machine learning. Can
include structured, semi-structured, unstructured, and binary data. Can be ‘on premised’ or
‘in the cloud’.
A “data swamp” is a deteriorated and unmanaged data lake that is either inaccessible to its
intended users or is providing little value.
Code of Conduct
Is a set of rules outlining the norms, rules, and responsibilities of proper practices of an
individual party or an organization.
Wrapper (as middleware)
Batch extraction vs continuous extraction
Clearing transactions with a Ledger system
TP monitors
A control program that monitors the transfer of data between multiple local and remote
terminals to ensure that the transaction processes completely or, if an error occurs, to take
appropriate actions.
Distributed Ledger Technologies (DLT)
Machine learning (Classification, Regression, Clustering)
Supervised: input and output, labels are given by user.
Unsupervised: Input, no attached labels (training set/ real set).
Reinforcement learning: Feedback with reward.
Data marts
Subject oriented data assets
Example Exam Questions
Q1: What is the best definition of enterprise IT-architecture according to Ross (2003)?
A) Policies for using IT in the organization
B) The organizing logic for data, applications, information and business processes
C) A destination plan for the IT-landscape
D) A description of the relationship between business and IT
Q2: Which of the following is the reason why a layered approach is preferred in architecting?
a) Each layer can be used to represent similar types of entities
b) Layers create greater complexity and scope
c) Each layer can be designed dependent on each other
d) Layers avoid different views and objectives
e) None of above
Q3: What are the four types of communication presented on the figure below? (from top to
bottom)
A- User interface, B- Application method level, C- Application Interface Level and D- Data
Level
B- Software and Operational Systems, B- System applications, C- Automated Data Collection
and D- Databases exchange.
C- User interface applications, B- Logical Operational Systems, C- Application Integration
Level and D- Data Exchange Level.
D -None of them.
Q4: Which of the following answer(s) is (are) NOT TRUE about the differences between
architecting and engineering? (more than one answer can be correct)
a) Architecting takes place in ill-structured situation, meanwhile engineering takes place in
better defined environment
b) Engineering serves the client, whereas architecting serves the builder
c) Heuristics/synthesis is mostly used in architecting, whereas engineering uses equations
and analysis
D) Architecting focuses on components, whereas engineering focusses on misfits interfaces
Q5: What are characteristics of block chain technology? (more than one answer can be
correct)
a) Ledger for storing transactions
b) Users have a (public/private) key to make transactions
c) Timestamped blocks that hold batches of valid transactions
d) Each block includes the hash of the prior block
e) Every node in a decentralized system has a copy of the block chain
f) Longest chain represents the truth
g) None of them
Q6: What characteristics belong to information stewardship? (more than one answer can be
correct)
a) Third parties can make changes
B) Third parties report changes and mistakes to the information stewardship
c) All (third) parties should reuse information from the steward
d) Third parties have the obligation to keep data actual
e) Information stewards have the obligation to keep data actual
f) None of them
Q7: What characteristics belong to Enterprise Data Management (EDM)?
a) data management strategy
b) data governance
c) data quality
d) platform & architecture
e) data maintenance
f) supporting process
Q8: Which of the following statements is (are) correct? (Governance)
a. Governance can deal with business and IT alignment
b. Governance can deal with translating Strategy into implementation
c. Architecture use should be governed
d. Architecture development should be governed
Web services Beginner tutorial
4 YouTube videos
Introduction – What is a Web Service
What is a web service?
  • Service available over the web.
  • Determine the criticality of the webservice.
  • Enables communication between applications over the web.
  • Provides a standard protocol/format for communication.
  • Platform independent communication.
  • Using web services, two different applications (implementation) can talk to each
       other and exchange data/ information.
In example; In a restaurant, the waiter is the means of communication between ‘you’ and
‘the kitchen’. The waiter is the web service / API. He is communicating between two
applications and making sure the communication is successful.
How web services work (overview)
Client -- Requests à Server
        ß Response –
Enables communication between applications over the web
Applications written in different languages, using different databases.
Server Service Provider: develops and implements the application (web service) and makes it
available over the web (internet). There should be a client (service consumer).
Needs for communication in webservices:
Medium – HTTP/ Internet
Format – XML/ JSON
Two main types of webservices:
   1. Simple Object Access Protocol (SOAP)
          a. Medium: HTTP (Post)
          b. Format: XML
   2. Representational State Transfer (REST)
          a. Medium: HTTP (Post, Get, Put, Delete)
          b. Format: XML/ JSON/ TEXT..
REST is more flexible than SOAP.
What is WSDL and UDDI
Components of a web services
Consumer / Client needs to know:
What are the services available?
What are the request and response parameters?
How to call the web service?
Structure & description of the webservice.
Web Service Description Language (WSDL)
Is an interface that the Service Provider publishes that describes all attributes and
functionalities of the web service. It is XML based so it can be easily requested.
When the Service Provider and Service Consumer, do not know each other. How to share the
WSDL? >> “A web Service Provider publishes his web service (through WSDL) on an online
directory from where consumers can query and search the web services. This online
registry/directory is called Universal Description, Discovery and Integration (UDDI).
What are SOAP web service?
A web service that complies to the SOAP web services specifications is a SOAP web service.
     • Defined by W3C (World Wide Web Consortium) – An international community that
         develops open standards for the world wide web.
     • Service specifications:
             o Basic
                     § SOAP
                     § WSDL
                     § UDDI
             o Extended
                     § WS-security
                     § WS-policy
                     § WS-I
                     § …
It is a protocol/ set of rules/ definitions on how 2 application will talk to each other over the
web.
A SOAP message consists of: an Envelope (the root element), a Header, and a Body.
What are REST web service?
A web service that communicates / exchanges information between 2 applications using
REST architecture/ principles is called a RESTful web service.
Representation State Transfer (REST)
   • Unlike SOAP (which is a protocol), REST in an architectural style.
   • There is no central body determining the standards; REST defines a set of principles
      to be followed while designing a service for communication / data exchange between
      2 applications. When the principles are applied >> RESTful Web Service.
Constraints / Principles:
   • Uniform interface
           o Resource (nouns): everything is a resource (all modules/ databases etc are
               available as resource when defined)
           o URI: any resource/data can be accessed by a URI (=URL)
           o HTTP (verbs): make explicit use of HTTP methods (‘CRUD’ = Get, Delete, Post,
               Put)
   •    Stateless
            o All client-server communications are stateless (Server = stateless, request
               from client must contain all of the necessary data to handle the request)
               (Improves the web service performance)
   •    Cacheable
            o Happens at client side (Cache-control and Last-modified, What information
               should be saved?)
   •    Layered System
            o Layers can exist between server and client (proxies / gateways)
   •    Code on Demand (optional)
            o Ability to download and execute code on client side
“The key abstraction of information in REST is a resource. Any information that can be
named can be a resource: a document or image and so on… “ – Roy Fielding
Representation = description of the current state of the resource
Authorization vs Authentication
Authentication = Who you are.
Authorization = What authority you have.
Recap
Protocol can be considered as a law suit or common agreement between two or more
parties (components) used for communication with each other. Most of the times protocol
includes the steps and/or procedures that should used when communicating with each
other.
API allows and defines how two applications can communicate with each other by using the
methodologies defined by the service providing application. Compared to a protocol, API
describes the programmatic ways to communicate in-between applications. Service calling
application must properly adhere to the standards in order to get the required service.
Web	services these are very similar to APIs. Notable thing with the web services is,
developing a web service expects users to access it over the internet. Therefore web service
can be considered as an online API.
Middleware allows to communicate with distributed application components located in
several computers (Simply links the components located in various machines in order to get
the full application capabilities). Middleware minimizes the developing effort by overcoming
heterogeneous factors (OS, hardware, network equipment etc.). Middleware locates in
between application (application components) and the OS.
Cloud Computing Architecture
Why cloud computing?
Previous situation:
   • On-premise is expensive
   • Less scalability
   • Allot huge space for servers
   • Less chance of data recovery
   • Long deployment times
   • Lack of flexibility
   • Poor data security
   • Less collaboration
   • Data cannot be accessed remotely
With cloud computing:
   • No server space required
   • No experts required for hardware and software maintenance
   • Better data security
   • Disaster recovery
   • Ease of deployment
   • Cost-effective (pay as you go)
   • Collaboration is efficient
   • Management of services is easy
What is cloud computing?
“the delivery of on-demand resources (such as server, database, software, etc) over the
internet.”
Cloud providers – Companies offering the cloud (AWS, Azure, Google Cloud)
Cloud Computing Service provider – are the vendors that provide services to manage
applications through a global network.
Benefits
Easily upgraded
Cost-efficient
Scalability
Automated
Highly available
Flexible
Better security
Customization
Cloud computing architecture
Front end;
   • Cloud infrastructure consist of hardware and software components such as data
       storage, server, virtualization software etc
   • It also provides Graphical User Interface (GUI) to end users in order to perform
       respective tasks
Back end;
   • Manages all the programs that run the application on the front end
   • It has a large number of data storage systems and servers
   • It can be software or a platform
   • “task is to provide utility in the architecture”
   • Eg; Amazon S3, Oracle Cloud-storage, Microsoft Azure Storage
Components:
   1. Hypervisor
         a. Virtual Operating Platform, for every user
         b. Divide and allocate resources
   2. Management software
         a. Manage and monitor the cloud operations
         b. Improving the performance of the cloud
   3. Deployment software
         a. SaaS (Gmail)
         b. PaaS (Microsoft Azure)
         c. IaaS (pay-as-you-go pricing model)
   4. Network
   5. Cloud server
   6. Cloud storage
Data Management (Online-course)
Course objectives:
   • Understand data management capabilities from the people, process and technology
       perspective.
   • Understand how each capability fits into overall Data Management Framework.
Introduction
Data management refers to the development and execution of architectures, policies,
practices, and procedures in order to manage the information lifecycle of an enterprise in an
effective manner.
>> Al lecture titles are the Capabilities of data management. Each capability has three
aspects: People, Process, and Technology.
L1; Metadata management
Data Element (DE) = a unit of data for which the definition, identification, representation,
and permissible values are specified by means of a set of attributes.
> Critical data elements (CDE) = the data element that is “critical to success” in a specific
business or process.
Criteria to CDE: (list is not exclusive)
    • Business facts that are deemed critical to the organization
    • Support Critical Business Processes across an organization and its components
    • Data used to derive values that appear in key reports
    • Unique identifiers of things important to the business (e.g. Customer ID)
Metadata management involves managing data about other data, whereby ‘other data’ is
generally referred to data models and structures, not the content. (e.g. Business terms in
glossary, attributes in logical data model, or tables and columns in the database). It is to see
how the data is being managed by, and through, the organization.
Roles & Responsibilities
Business owner: is ultimately accountable with the definition of all data and metadata.
Responsibly for confirming that data is used in a fashion consistent with the overall strategy.
Also responsible for driving data management processes and activities. (Business role)
Data steward: responsible for operational oversight of assigned data and interactions with
subject matter experts across organization as well as identifying the approach to
standardize, measure, and monitor data quality. (Business role)
Technical owner: ultimately accountable that data from particular data system are managed
and used according to the defined data standards. (Technical role)
Data Custodian: technology specialist that is responsible for the secure storage and
management of the data for the particular system. (Technical role)
Operational metadata includes information about application runs; their frequency, record
counts, component by component analysis.
Metadata management process:
    1.   Identify Critical Data elements (CDE)
    2.   Collect CDE business metadata
    3.   Collect CDE technical metadata
    4.   Create CDE data standard (360 view)
    5.   Enforce CDE data standard
>> System Development Lifecycle (SDL) = Plan > Create > Test > Deploy > Plan
L2; Data Quality Management
Data quality refers to the methodical approach, policies and processes by which an
organization manages the accuracy, validity, timeliness, completeness, uniqueness, and
consistency (are the Dimensions) of its data in systems and data flows.
>> Is the data accurate? Is it valid? Is it on time? Is it complete? Is it unique? Is it consistent?
Data quality dimensions refers to the aspect of feature of information that can be assessed
and used to determine quality of data.
 Accuracy         Validity         Timeliness      Completeness      Uniqueness       Consistency
 Data             Data             Data            Data are          Data are         Data are
 accurately       conforms to      represents      complete in       properly         represented
 represents       the syntax       reality from    terms of          identified       consistently
 “the real        (format,         the required    required          and recorder     across the
 world”           type, range)     point of time   potential of      only once        data set
 values           of its                           data
                  definition
 > Incorrect      > Incorrect      > Customer      > Address         > single         > Customer
 spellings of     classification   address         missing the       customer is      account is
 name             values for       changes         zip code          recorded         closed, but
                  gender of                                          twice            there is a
                  customer                                                            new order to
                  type                                                                that account
Data quality rules refer to business rules that are set up to protect the data quality.
Data quality process: Define DQ requirements > Conduct DQ assessment > Resolve DQ
issues > Monitor and control.
L3; Data Governance
L4; Master and Reference data Management
L5; Data Integration
L6; Analytics
L7; Data Privacy
L8; Data Architecture
Data architecture refers to the models, policies, rules, or standards that govern which data
is collected, and how it is stored, arranged, and put to use in a database system, and/or in a
n organization.
Proof of work vs Proof of stake
Proof of work: requires all of its miners to attempt to solve a complex sum, with the winner
determined by the person who has the most powerful/quantity.
Proof of stake: model randomly chooses the winner based on the amount they have staked.