- Once our E-R model has been converted into relations, we may find that some relations are not properly specified. There can be a number of problems:
- Deletion Anomaly: Deleting one fact or data point from a relation results in other information being lost.
- Insertion Anomaly: Inserting a new fact or tuple into a relation requires we have information from two or more entities – this situation might not be feasible.
- Update Anomaly: Updating one fact in a relation requires us to update multiple tuples.
Anomaly Example 1
- Here is an example to illustrate these anomalies: Consider a very common CUSTOMER relation:
CUSTOMER(CustomerID, CustomerName, Street, City, State, PostalCode)
- In the United States, the PostalCode (or ZipCode) references a specific City and State so one might have data such as:
- Insertion Anomaly: What happens if we go to add a new Customer: C106, Joe Feldman, 99 Ninth St., Springfield, NJ
What we know about Joe is that he lives in Springfield, NJ (one fact) but we may not know his PostalCode.
We will need to get that additional fact (the fact that the PostalCode for Springfield, NJ is 07081.
- Deletion Anomaly: What happens if we delete customer C105: Then we not only remove the customer information but we also remove (lose) the fact that Farmingdale, NY has postal code 11735.
- Modification Anomaly: It is possible that when a town grows in population, the zip code will be split into two (or more) new zip codes.
For example, if Old Bridge, NJ splits its zip code, then we will have to update many different tuples even though we are only changing one “fact” about Old Bridge’s zip code.
|C101||Bill Smith||123 First St.||New Brunswick||NJ||07101|
|C102||Mary Green||11 Birch St.||Old Bridge||NJ||07066|
|C103||Ted Jones||3 Academy St.||Old Bridge||NJ||07066|
|C104||Sally Taylor||446 First Ave.||New Brunswick||NJ||07101|
|C105||Mary Miller||44 Toga Ct.||Farmingdale||NY||11735|
Anomaly Example 2
- Here is another example to illustrate anomalies: A company has a Purchase Order form:
- Our dutiful consultant creates the E-R Model directly matching the purchase order:
- When we follow the steps to convert to a set of relations this results in two relations (keys are underlined):
PO_HEADER (PO_Number, PODate, Vendor, Ship_To, ...) LINE_ITEMS (PO_Number, ItemNum, PartNum, Description, Price, Qty)
- Consider some sample data for the LINE_ITEMS relation:
PO_Number ItemNum PartNum Description Price Qty O101 I01 P99 Plate $3.00 7 O101 I02 P98 Cup $1.00 11 O101 I03 P77 Bowl $2.00 6 O102 I01 P99 Plate $3.00 5 O102 I02 P77 Bowl $2.00 5 O103 I01 P33 Fork $2.50 8
- What are some of the problems with this relation ?
- What happens if we want to add the fact that Order O103 has quantity 5 of part P99 ?
- What happens when we delete item I02 from Order O101 ?
- What happens if we want to change the price of the Plate (P99)?
- These problems occur because the relation in question contains data about 2 or more themes.
- Typical way to solve these anomalies is to split the relation in to two or more relations – This is part of the Process called Normalization discussed next.
On the next page we will formally define the Normalization Process.