Chapter 1: ---------- Read it carefully. You should understand it completely. After you have read chapter 1, you should be able to answer the following questions: - what is a database? - why do we need databases? - what is a database management system? - What is a transaction? - what is a declarative language? - what are the roles of the people involved in the design and implementation of a database-backed application (what they call transaction processing system in the book)? - what is the difference between transaction processing, decision support and data mining? Chapter 2: ---------- Do not be afraid. You need to read 2.1 because the case study is referenced throughout the book. However you can skip the rest of the chapter until the end of the class. If you read it before that, do it quickly to get an idea of the concepts and of the vocabulary which is introduced. You are not going to understand much of this chapter now. Remember to come back to it by the end of the course, this chapter should be very clear by then and a good way to review the basic concepts of relational data model, relational query language and transaction. Chapter 3: ---------- At the start of 3.1, the authors mention sectors, tracks and cylinders. If you want to refresh your memory on how data is represented on disk (we cover this briefly in the first class), look up 9.1 (pages 322-329 in the 2nd edition). The notions of data independance and data model introduced in 3.1 are absolutely central. You cannot get a passing grade without understanding them. The relational model introduced in 3.2 is again crucial. No passing grade without a good command of the basic concepts of 3.2.1 and the notion of key. Note that Codd got his Turing award for having defined the relational model back in 1970. The book covers the SQL-92 standard version of the SQL data definition language. As the authors mention, database vendors do NOT support this standard. So do not be surprised that the version of SQL you use with MySQL does not correspond much with the book. Actually MySQL supports quite a narrow subset of the standard. We will have a lab on Oracle so that you can exercise with some of the functionalities that MySQL does not support. By the way, try and find the differences between the standard and the SQL subset supported by MySQL. That is a good exercise. Note that the authors propose brain teasers throughout the chapter. It is worth spending some time on them. Chapter 4 --------- Conceptual modelling is the process of defining a database schema based on a textual description of the data. This is a very intuitive process for simple databases and a rather cumbersome process for large databases. This chapter covers the E-R and UML methodologies. You should keep in mind throughout this chapter that E-R and UML are tools. They give you a framework you can use to structure the design of a database. They do not help you take decisions during the modelling process. Section 4.9 details some of the key problems a designer faces when using the modelling tools. Understanding the notion of cardinality constraint is a necessary first step. If you get a good understanding of 4.4 (and 4.5.3, 4.5.4) you are well equipped to face any database design task. 4.8 follows up on the case study introduced in chapter 2. E-R and UML are quite similar. The book uses E-R to introduce the concepts and then introduces their UML representations. We will cover the UML representation in details. This is the representation you will be using in the labs, exercises and at the exam. Chapter 5 --------- I consider the introduction as important because it introduces SQL as a declarative query language. Note that SQL is a lot of things: it is a query language, a data definition language, a data manipulation language. The data definition aspects have been covered in chapter 3. This chapter focuses on the query language aspects which are the most important in the class. The data manipulation aspects are quickly covered in section 5.3. It should be clear that you will NOT be SQL experts on completion of this class. We will focus on the core SQL features: set-oriented programming, multisets and duplicate elimination, nested queries, aggregation, null values. I think that the coverage of the relational algebra and the query language part of SQL are intuitive. The notions are introduced in a natural order. The crucial point is that the relational algebra and SQL operate on (multi-)set(s) of tuples: these programming languages are set-oriented. This is a crucial difference compared to most of the programming languages you have seen so far. Note that the relational algebra and SQL are special-purpose programming languages. They are not Turing complete. No passing grade without a good command of 5.1 (until outer join p142) and 5.2 (from the intro p147 until nested queries p157, and 5.2.5/5.2.6 on aggregates). Chapter 6 --------- As you will see, I am no big fan of normalization. This is however a necessary evil and the notions of functional dependency is fundamental (as well as very intuitive). We will cover the Boyce-Codd Normal form and third normal form (6.5) and discuss their impact on relation decompositions (6.6). We will not cover any automatic way to decompose relations. This is because, as explained in 6.13, normalization might hurt performance and should thus be handled with care. Chapter 7 --------- We will cover this material briefly. However, triggers are a key element of a database management system. It is important to get an overview over the potential of triggers (7.1) and the potential problems (7.2). Chapter 8 --------- Note that we will cover chapter 8 relatively late in the course of the class as it is a self contained chapter. It is crucial to understand the problems associated with combining SQL statements with a conventional language (i.e., a turing complete programming language). We will cover JDBC and SQLJ in details. Some of the elements introduced with embedded and dynamic SQL (8.2-8.4) are taken up again in the context of JDBC and SQLJ. We will also cover object-relational mappings (ORM), which are not discussed in this book. We will rely on the material prepared by Betty O'Neal from UMass Boston: http://www.cs.umb.edu/~eoneil/orm/ This material will help for your project. Chapter 9 ---------- Most of the class focuses on how to use a database system, not on how to use it efficiently. We will however spend some time at the end of the class on indexing. You will learn how data is represented on secondary storage (9.2,9.3), how different types of indexes speed up specific access patterns (9.4). We will also cover the data structures used to implement indexes (9.5, 9.6, 9.7). You should also use the introduction of this chapter (9.1) to look up what secondary storage actually is. Chapter 12 ---------- This is the most interesting chapter in this book - but you will have to wait until a MSc class on database tuning to work with this material. Section 12.2.1 on indices will help you understand how indexes impact query/update performances. You can lookup http://www.databasetuning.org/ for some outstanding slides and some shameless self-promotion. Chapter 13 ---------- The (tuple) relational calculus is important (i) because it is declarative and defines the semantics of SQL (Section 13.1) and (ii) because its duality with the relational algebra is THE fundamental aspect of relational query languages. The domain relational calculus (13.2) is interesting as a counterpoint and as the underlying basis of visual query languages. These aspects are unfortunately not part of the pensum, so we will not cover them. Chapter 14 ---------- It is not part of the pensum. You can have a look the limitations of the relational model (14.1). It is a good complement/counterpoint to what you have seen in chapters 3, 4, 8. Chapter 15 ---------- XML is important as it is gaining acceptance. It is interesting as a counterpoint to the relational data model. It will also be the basis for web services (chapter 25) introduced in the distributed systems class. Chapter 15 is pretty long (100 pages). DO NOT READ IT ALL!!! You should understand 15.1 and 15.2; you should understand the key principles of 15.3 - what I suggest is that you scheme through 15.3 in a first pass and get back to it after the class based on the exercises. You should understand 15.4.1, and you should QUICKLY scheme through 15.4.3. Drop 15.4.2 and 15.4.4. It is important to focus on the essential: (1) the data model (15.1, 15.2 and the essence of 15.3). Note that the notion of structure in 15.1 is a key concept. We will insist on the aspects that are common/different wrt the relational model (attributes, declaration, name spaces, types, integrity constraints). (2) the query languages. We will cover XPath and an overview of XQuery. You can skip XSLT - I am working to get it out of the teaching goals (15.4.2) Chapter 18 ---------- This is a short chapter but we will spend the time necessary so that you get a clear understanding of the ACID properties (they have been mentioned in chapter 2). No passing grade without a good understanding of those. Chapter 19 ---------- We will focus on the flat transaction model (19.1), chained transactions (19.3.1), transaction demarcation (19.3.2), and recoverable queues (19.3.5). Some of these notions have been mentioned in 8.5 in the context of JDBC. We get a more general overview here. Chapter 20 ---------- Strictly speaking, only 20.4.2 is part of your pensum. We will cover in class the basics of concurrency control so that you can understand deadlocks. What I suggest is that you read quickly 20.1-20.5 to get an idea of the key concepts (in bold), and to get familiar with the representation of a schedule - do not spend too much time on the theorems though. Note however that this material is a key contribution to CS - Jim Gray got his Turing Award for this. Chapter 22 ---------- This is not part of the pensum. We will cover 22.1 and 22.2 - I think it will help you better understand the notion of transaction. Just scheme these chapters, you should not read them to understand everything. You will have to wait until an advanced DB class. Chapter 23 ---------- This chapter covers architectural/distribution issues for database-backed applications (or transaction processing systems as they call them in the book). We will not cover this chapter in this class; however this might be handy as a review of distributed systems concepts. The chapter (A) reviews general principles in a first part (centralized/distributed in 23.1 and 23.2, key concepts of a TP monitor (i) Remote Procedure Call - RPC -, (ii) Interface Definition Language - IDL - and directory services, (iii) distributed commitment, (iv) event brokering, (v) storage area network) and then (B) focuses on database-backed web applications and the popular application server J2EE. It is a relatively long chapter but it is hard to slice it. The best might be to read the whole chapter, focus on understanding the key concepts mentioned above in 23.3-23.8 and find their applications in the context of J2EE. Chapter 25 on web services will provide other application examples of RPC, IDL, directory services. Chapter 25 ---------- Again, a chapter for the distributed systems class. This chapter covers Web services, i.e., a means to integrate database-backed web applications into distributed applications. The chapter reviews Web basics (25.2 and 25.3) and introduces the details of the Web services RPC mechanism called SOAP (applied to XML data), of the Web services IDL (WSDL), its directory service (UDDI), a kind of scripting language used as a glue to compose and organize complex services (BPEL), and a form of distributed control over service invocations (WS-Coordination). Appendix B, C (online) ------------------------------- Appendix B and C are complement to the project run in parallel with this class. They discuss the different aspects of a project lifecycle in the context of a database application. It is interesting complementary reading wrt the reading list of the project course. Appendix B presents the requirement and specification documents as a complement to the case study introduced in Chapter 2. Appendix C discusses design and implementation/testing issues with a reference to the case study.