1 Data, Databases, and the
Software
Engineering Process
1.1
INTRODUCTION
In this
chapter, we introduce some concepts and ideas that are fundamental
to our
presentation of the design of a database. We define data,
describe
the notion of a database, and explore a process of how to design
a
database.
1.2
DATA
Data, as we
use the term, are facts about something or someone. For
example,
a person has a name, an address, and a gender. Some data
(facts)
about a specific person might be “Mary Smith,” “123 4th St.,”
“female.”
If we had a list of several people’s names, addresses, and genders,
we would
have a set of facts about several people. A database is a
collection
of related data. For this “set of facts about several people” to
be a
database, we would expect that the people in the database had something
in common—that
they were “related” in some way. Here related
does not
imply a familial relationship, but rather something more like
“people
who play golf,” “people who have dogs,” or “people I interviewed
on the
street today.” In a “database of people,” one expects the people to
have some
common characteristic that ties them together. A “set of facts
about
some people” is not a database until the common characteristic is
also
defined. To put it another way: Why are these people’s names and
addresses
being kept in one list?
2
•
Database Design Using Entity-Relationship Diagrams
CHECKPOINT
1.1
1. A tree
is classified as a “large oak tree about 100 years old.” What are
three
facts about this tree?
2.
Another tree has the following characteristics: pine, small, 15 years
old. If I
write about the two trees and their facts on a piece of paper,
what do I
have?
3. Why is
the piece of paper not a database of trees?
1.3
BUILDING A DATABASE
How do we
construct a database? Suppose you were asked to put together a
database
of items one keeps in a pantry. How would you go about doing this?
You might
grab a piece of paper and begin listing items that you see. When you
are done,
you would have a database of items in the pantry. Simple enough,
but is it
a good database or a poor one? Was your approach to database construction
a good
methodology or not-so-good methodology? The answer to
these
questions would depend on why you constructed the list—who will use
the list
and for what. If you are more methodical, you might first ask yourself
how best
to construct this database before you grab the paper and begin a list
of items.
A bit of prethinking might save time in the long run because you
might
think about how the list was to be used and by whom.
When
dealing with software and computer-related activity like databases,
we have a
science of “how to” called software engineering (SE). SE is a process
of
specifying systems and writing software. To design a good database, we will
use ideas
from SE. By being aware of SE and respecting its known systematic
approach,
we can see why we handle database design the way we do. In this
chapter,
we present a brief outline of SE. After this brief background/overview
of SE in
this chapter, we explore database models, in particular the relational
database
model, in subsequent chapters. While there are many kinds of database
models,
most of the databases in use today are relational. Our focus in
this book
is to put forward a methodology based on SE to design a sound
relational
database (as opposed to other database models).
CHECKPOINT
1.2
You have
a set of books on bookshelves in your house. Your mother asks you
to create
a list of all the books she has.
1. Who is
going to use this list?
2. When
the list is completed, is it a database?
3. What
questions should be asked before you begin?
4. What
is the question-and-answer procedure in question 3 going to
accomplish?
Data, Databases, and the Software Engineering
Process • 3
1.4 WHAT
IS THE SOFTWARE ENGINEERING PROCESS?
The term software
engineering refers to a process of specifying, designing,
writing,
delivering, maintaining, and finally retiring software. Software
engineers
often refer to the “life cycle” of software; software has a beginning
and an
ending. There are many excellent references on the topic of
SE
(Schach, 2011). Some authors use the term software engineering synonymously
with “systems
analysis and design,” but the underlying point
is that
any information system requires some process to develop it correctly.
SE spans
a wide range of information system tasks. The task we are
primarily
interested in here is that of specifying and designing a database.
“Specifying
a database” means that we will decide on and document what
the
database is supposed to contain and how we will go about the overall
task
itself.
A basic
idea in SE is to build software correctly, a series of steps or phases
is
required to progress through a life cycle. These steps ensure that a process
of
thinking precedes action—thinking through “what is needed”
precedes “what
software is written.” Further, the “thinking before action”
necessitates
that all parties involved in software development understand
and
communicate with one another. One common version of presenting
the
thinking before acting scenario is referred to as a “waterfall” model
(Schach,
2011); the software development process is supposed to flow in a
directional
way without retracing.
Generally,
the first step in the SE process involves formally specifying
what is
to be done. We actually break this first step down into two
steps:
requirement elucidation and actually writing of the specification
document.
The waterfall model implies that once the specification of the
software
is written and accepted by a user, it is not changed, but rather
it is
used as a basis for design. One may liken the overall SE exercise to
building
a house. The specification is the phase of “what you want in
your
house.” Once agreed on, the next step is to design the house to the
specification.
As the house is designed and the blueprint is drawn, it is
not acceptable
to revisit the specification except for minor alterations.
There has
to be a “meeting of the minds” at the end of the specification
phase to
move along with the design (the blueprint) of the house to be
constructed.
So it is with software and database development. Software
production
is a life-cycle process—software (a database) is created, used,
and
eventually retired.
3
•
Database Design Using Entity-Relationship Diagrams
4
The “players”
in the software development life cycle may be placed into
two
camps, often referred to as the user and the analyst. Software is
designed
by the
analyst for the user according to the user’s specification. In our
presentation,
we will
think of ourselves as the analyst trying to enunciate what
the users
think they want. Recall the example in this chapter in which your
mother
asked you to draw up a list of items in a home library. Here, the
mother is
the user; the person drawing up the list of objects is the analyst.
There is
no general agreement among software engineers regarding the
exact
number of steps or phases in the waterfall-type software development
model.
Models vary depending on the interest of the SE-researcher
in one
part or another in the process. A very brief description of the software
process
goes like this (software in the following may be taken to mean
a
database):
Step 1
(or Phase 1): Requirements. Find out what the user
wants/needs.
The “finding-out
procedure” is often called “elucidation.”
Step 2:
Specification. Write out the user wants/needs as precisely as
possible.
In this step, the user and analyst document not only what
is
desired but also how much it will cost and how long it will take. A
credo of
SE is to generate software on time and on budget.
Step 2a:
Feed back the specification to the user. A formal review of the
specification
document is performed to see if the analyst (you) has
it right.
Step 2b:
Redo the specification as necessary and return to step 2a until
the
analyst and the user both understand one another and agree to
move on.
Step 3:
Design—software is designed to meet the specification from
step 2. As in
house building, now that the analyst knows what is
required,
the plan for the software is formalized—the blueprint is
drawn up.
Step 3a:
Software design is independently checked against the specification.
If it is
necessary, the design is repaired or redone until the
analyst
has clearly met the specification. Note the sense of agreement
in step 2
and the use of step 2 as a basis for further action. When step
3 begins,
going back up the waterfall is difficult; it is supposed to be
that way.
Perhaps minor specification details might be revisited, but
the idea
is to move on once each step is finished. Once step 3a is completed,
both the
user and the analyst know what is to be done. In the
building-a-house
analogy, the blueprint is now drawn up.
Data, Databases, and the Software Engineering
Process • 5
One final
point here: In the specification, a budget and timeline
are
proposed by the analyst and accepted by the user. In the design,
this
budgetary part of the overall design is sometimes refined. All SE
takes
money and time and not only is it vital to correctly produce a
given
product, but also the ancillary items of time and money must
be clear
to all parties.
Step 4:
Development. Software is written; a database is created.
Step 4a:
In the development phase, software, as written, is checked
against
the design until the analyst has clearly met the design. Note
that the
specification in step 2 is long past, and only minor modifications
of the
design would be tolerated here. The point of step 4 is to
build the
software according to the design (the blueprint, if you will)
from step
3. In our case, the database is actually created and populated
in this
phase.
Step 5:
Implementation. Software is turned over to the user to be used
in the
application.
Step 5a:
User tests the software and accepts it or rejects it until it is
written
correctly (that is, until it meets the specification and design).
In our
case, the database is queried, data are added or deleted, and the
user uses
what was created. A person may think that this is the end of
the
software life cycle, but there are two more important steps.
Step 6:
Maintenance. Maintenance is performed on the software until
it is
retired. No matter how well specified, designed, and written,
some
parts of the software may fail. Some parts may need to be modified
over time
to suit the user. Times change; demands and needs
change.
Maintenance is a very time-consuming and expensive part
of the
software process—particularly if the SE process has not been
done
well. Maintenance involves correcting hidden software faults
as well
as enhancing the functionality of the software.
In
databases, new data are often required; some old data may no
longer be
needed. Hardware changes. Operating systems change.
The
database engine itself, which is software, is often upgraded—
new
versions are imposed on the market. The data in the database
must
conform to change, and a system of changing the data in the
database
has to be in place.
Step 7:
Retirement. Eventually, whatever software is written becomes
outdated.
Database engines, computers, and technology in general
are all
evolving. Think of the old software package you used on some
old
personal computer. It does not work any longer because the
5
•
Database Design Using Entity-Relationship Diagrams
operating
system has been updated, the computer is obsolete, and
the old
software has to be retired. Basically, the SE process has to
start all
over with new specifications. The same is true with databases
and
designed systems. At times, the most cost-effective thing
to do is
to start anew.
CHECKPOINT
1.3
1. In
what phase is the database actually created?
2. Which
person tests the database?
3. Where
does the user say what is wanted in the database?
1.5
ENTITY RELATIONSHIP DIAGRAMS AND THE
SOFTWARE
ENGINEERING LIFE CYCLE
This text
concentrates on steps 1 through 3 of the software life cycle for databases.
A
database is a collection of related data. The concept of related data
means
that a database stores information about one enterprise: a business,
an
organization, a grouping of related people or processes. For example, a
database
might contain data about Acme Plumbing and involve customers
and
service calls. A different database might be about the members and
activities
of the Over 55 Club in town. It would be inappropriate to have data
about the
Over 55 Club and Acme Plumbing in the same database because
the two
organizations are not related. Again, a database is a collection of
related data. To
keep a database about each of the above entities is fine, but
not in
the same database.
Database
systems are often modeled using an entity relationship (ER)
diagram
as the blueprint from which the actual data are stored; the blueprint
is the
output of the design phase. The ER diagram is an analyst’s tool
to
diagram the data to be stored in a database system. Phase 1, the requirements
phase,
can be quite frustrating as the analyst has to elicit needs and
wants
from the user. The user may or may not be computer sophisticated
and may
or may not know the capabilities of a software system. The analyst
often has
a difficult time deciphering a user’s needs and wants to create
a
specification that (a) makes sense to both parties (user and analyst) and
(b)
allows the analyst to do design efficiently.
In the
real world, the user and the analyst may each be committees of
professionals,
but the idea is that users (or user groups) must convey their
ideas to
an analyst (or team of analysts)—users have to express what they
Data, Databases, and the Software Engineering
Process • 7
want and
what they think they need; analysts have to elicit these desires,
document
them, and create a plan to realize the user’s desires.
User
descriptions may seem vague and unstructured. Typically, users
are
successful at a business. They know the business; they understand
the
business model. The computer person is typically ignorant of the
business
but understands the computer end of the problem. To the
computer-oriented
person, the user’s description of the business is as
new to
the analyst as the computer jargon is to the user. We present
a
methodology that is designed to make the analyst’s language precise
enough so
that the user is comfortable with the to-be-designed
database
and still provide the analyst with a tool that can be mapped
directly
into a database.
In brief,
we next review the early steps in the SE life cycle as it applies to
database
design.
1.5.1
Phase 1: Get the Requirements for the Database
In phase
1, we listen and ask questions about what facts (data) the user
wants to
organize into a database retrieval system. This step often involves
letting
users describe how they intend to use the data. You, the analyst,
will
eventually provide a process for loading data into and retrieving data
from a
database. There is often a “learning curve” necessary for the analyst
as the
user explains the system he or she knows so well to a person who
may be
uninformed of their specific business.
1.5.2
Phase 2: Specify the Database
Phase 2
involves grammatical descriptions and diagrams of what the
analyst
thinks the user wants. Database design is usually accomplished
with an
ER diagram that functions as the blueprint for the to-be-designed
database.
Since most users are unfamiliar with the notion of an ER diagram,
our
methodology will supplement the ER diagram with grammatical
descriptions
of what the database is supposed to contain and how the
parts of
the database relate to one another. The technical description of a
database
can be dry and uninteresting to a user; however, when the analysts
put what
they think they heard into English statements, the users and
the
analysts have a better meeting of the minds. For example, if the analyst
makes
statements like, “All employees must generate invoices,” the user
may then
affirm, deny, or modify the declaration to fit what is actually the
8 • Database Design Using Entity-Relationship
Diagrams
case. To
continue the example, it makes a big difference in the database if
“all employees
must generate invoices” versus “some employees may generate
invoices.”
1.5.3
Phase 3: Design the Database
Once the
database has been diagrammed and agreed to, the ER diagram
becomes
the finalized blueprint for construction of the database in phase
3. Moving
from the ER diagram to the actual database is akin to asking a
builder
of houses to take a blueprint and commence construction.
As we
have seen, there are more steps in the SE process, but also as stated,
this book
is about design and hence the remaining steps of the waterfall
model are
not emphasized.
CHECKPOINT
1.4
1.
Briefly describe the major steps of the SE life cycle as it applies to
databases.
2. Who
are the two main players in the software development life
cycle?
3. Why is
written communication between the parties in the design
process
important?
1.6
CHAPTER SUMMARY
This
chapter serves as a background chapter. The chapter briefly describes
data,
databases, and the SE process. The SE process is presented as it applies
to ER
diagrams—the database design blueprint.
CHAPTER 1
EXERCISES
Fred
Jones operates a golf shop. He has golf equipment and customers,
and his
primary business is selling retail to customers. Fred has so
many
customers that he wants to keep track of them on a computer. He
approaches
Sally Smith, who is knowledgeable about computers, and asks
her what to do.
Data, Databases, and the Software
Engineering Process • 9
1. In our context, Fred is a
__________; Sally is a ______________.
2. When Fred explains to Sally what
he wants, Sally begins writing
what?
3. When Fred says, “Sally, this specification is all
wrong,” what happens
next?
4. If Fred says, “Sally, this specification is acceptable,”
what happens
next?
5. If, during the design, Sally finds out that Fred forgot
to tell her about
something he wants, what is Sally to do?
6. How does Sally get Fred’s specifications in the first
place?
7. Step 3a says: “Software design
is independently checked against the
specification.” What does this mean?
BIBLIOGRAPHY
Schach, S. R. 2011. Object-Oriented and Classical
Software Engineering. New York:
McGraw-Hill.
No hay comentarios.:
Publicar un comentario