09 September 2009

Basic Structure of SQL

The basic structure of an SQL expression consists of three clauses:
  • The select clause corresponds to the projection operation of the relational algebra. It is used to list the attributes desired in the result of a query.
  • The from clause corresponds to the Cartesian-product operation of the relational algebra. It lists the relations to be scanned in the evaluation of the expression.
  • The where clause corresponds to the selection predicate of the relational algebra. It consists of a predicate involving attributes of the relations that appear in the from clause.

SQL

SQL uses a combination of relational-algebra and relational-calculus constructs. The SQL language has several parts:
  • Data-Definition Language (DDL) : The SQL DDL provides commands for defining relation schemas, deleting relations, and modifying relation schemas.
  • Interactive Data-Manipulation Language (DML) : The SQL DML includes a query language based on both the relational algebra and the tuple relational calculus. It includes also commands to insert tuples into, delete tuples from, and modify tuples in the database.
  • View Definition : The SQL DDL includes commands for defining views.
  • Transaction Control : SQL includes commands for specifying the beginning and ending of transactions.
  • Embedded SQL and Dynamic SQL : Embedded and dynamic SQL define how SQL statements can be embedded within general-purpose programming languages, such as C, C++, Java, PL/I, Cobol, Pascal, and Fortran.
  • Integrity : The SQL DDL includes commands for specifying integrity constraints that the data stored in the database must satisfy. Updates that violate integrity constraints are disallowed.
  • Authorization : The SQL DDL includes commands for specifying access rights to relations and views.

Data Abstraction

For the system to be usable, it must retrieve data efficiently. The need for efficiency has led designers to use complex data structures to represent data in the database. Since many database-systems users are not computer trained, developers hide the complexity from users through several levels of abstraction, to simplify users’ interactions with the system:
  • Physical Level : The lowest level of abstraction describes how the data are actually stored. The physical level describes complex low-level data structures in detail.
  • Logical Level : The next-higher level of abstraction describes what data are stored in the database, and what relationships exist among those data. The logical level thus describes the entire database in terms of a small number of relatively simple structures. Although implementation of the simple structures at the logical level may involve complex physical-level structures, the user of the logical level does not need to be aware of this complexity. Database administrators, who must decide what information to keep in the database, use the logical level of abstraction.
The three levels of data abstraction
  • View Level : The highest level of abstraction describes only part of the entire database. Even though the logical level uses simpler structures, complexity remains because of the variety of information stored in a large database. Many users of the database system do not need all this information; instead, they need to access only a part of the database. The view level of abstraction exists to simplify their interaction with the system. The system may provide many views for the same database.

Database System Applications

  • Banking : For customer information, accounts, and loans, and banking transactions.
  • Airlines : For reservations and schedule information. Airlines were among the first to use databases in a geographically distributed manner—terminals situated around the world accessed the central database system through phone lines and other data networks.
  • Universities : For student information, course registrations, and grades.
  • Credit Card Transactions : For purchases on credit cards and generation of monthly statements.
  • Telecommunication : For keeping records of calls made, generating monthly bills, maintaining balances on prepaid calling cards, and storing information about the communication networks.
  • Finance : For storing information about holdings, sales, and purchases of financial instruments such as stocks and bonds.
  • Sales : For customer, product, and purchase information.
  • Manufacturing : For management of supply chain and for tracking production of items in factories, inventories of items in warehouses/stores, and orders for items.
  • Human Resources : For information about employees, salaries, payroll taxes and benefits, and for generation of paychecks.

Data Model

  • A data model is a collection of conceptual tools for describing data, data relationships, data semantics, and consistency constraints.
  • There are two data models : the entity–relationship model and the relational model.
  • The entity–relationship (E-R) model is a high-level data model. It is based on a perception of a real world that consists of a collection of basic objects, called entities, and of relationships among these objects.
  • The relational model is a lower-level model. It uses a collection of tables to represent both data and the relationships among those data. Its conceptual simplicity has led to its widespread adoption; today a vast majority of database products are based on the relational model. Designers often formulate database schema design by first modeling data at a high level, using the E-R model, and then translating it into the the relational model.

07 September 2009

Database System Utilities

  • Loading : A loading utility is used to load existing data files-such as text files or sequential files-into the database. Usually, the current (source) format of the data file and the desired (target) database file structure are specified to the utility, which then automatically reformats the data and stores it in the database. With the proliferation of DBMSs, transferring data from one DBMS to another is becoming common in many organizations. Some vendors are offering products that generate the appropriate loading programs, given the existing source and target database storage descriptions (internal schemas). Such tools are also called conversion tools.
  • Backup: A backup utility creates a backup copy of the database, usually by dumping the entire database onto tape. The backup copy can be used to restore the database in case of catastrophic failure. Incremental backups are also often used, where only changes since the previous backup are recorded. Incremental backup is more complex but saves space.
  • File Reorganization: This utility can be used to reorganize a database file into a different file organization to improve performance.
  • Performance Monitoring: Such a utility monitors database usage and provides statistics to the DBA. The DBA uses the statistics in making decisions such as whether or not to reorganize files to improve performance.

06 September 2009

DBMS Interfaces

  1. Menu-Based Interfaces for Web Clients or Browsing : These interfaces present the user with lists of options, called menus, that lead the user through the formulation of a request. Menus do away with the need to memorize the specific commands and syntax of a query language; rather, the query is composed step by step by picking options from a menu that is displayed by the system. Pull-down menus are a very popular technique in Web-based user interfaces. They are also often used in browsing interfaces, which allow a user to look through the contents of a database in an exploratory and unstructured manner.
  2. Forms-Based Interfaces : A forms-based interface displays a form to each user. Users can fill out all of the form entries to insert new data, or they fill out only certain entries, in which case the DBMS will retrieve matching data for the remaining entries. Forms are usually designed and programmed for naive users as interfaces to canned transactions. Many DBMSs have forms specification languages, which are special languages that help programmers specify such forms. Some systems have utilities that define a form by letting the end user interactively construct a sample form on the screen.
  3. Graphical User Interfaces : A graphical interface (CUI) typically displays a schema to the user in diagrammatic form. The user can then specify a query by manipulating the diagram. In many cases, CUIs utilize both menus and forms. Most CUIs use a pointing device, such as a mouse, to pick certain parts of the displayed schema diagram.
  4. Natural Language Interfaces : These interfaces accept requests written in English or some other language and attempt to "understand" them. A natural language interface usually has its own "schema," which is similar to the database conceptual schema, as well as a dictionary of important words. The natural language interface refers to the words in its schema, as well as to the set of standard words in its dictionary, to interpret the request. If the interpretation is successful, the interface generates a high-level query corresponding to the natural language request and submits it to the DBMS for processing; otherwise, a dialogue is started with the user to clarify the request.
  5. Interfaces for Parametri c Users : Parametric users, such as bank tellers, often have a small set of operations that they must perform repeatedly. Systems analysts and programmers design and implement a special interface for each known class of naive users. Usually, a small set of abbreviated commands is included, with the goal of minimizing the number of keystrokes required for each request.
  6. Interfaces for the DBA : Most database systems contain privileged commands that can be used only by the DBA's staff. These include commands for creating accounts, setting system parameters, granting account authorization, changing a schema, and reorganizing the storage structures of a database.

A Brief History of Database Applications

    1. Early Database Applications Using Hierarchical and Network Systems: Many early database applications maintained records in large organzations, such as corporations, universities, hospitals, and banks. In many of these applications, there were large numbers of records of similar structure. One of the main problems with early database systems was the intermixing of conceptual relationships with the physical storage and placement of records on disk. Another shortcoming of early systems was that they provided only programming language interfaces. This made it time-consuming and expensive to implement new queries and transactions, since new programs had to be written, tested, and debugged.
    2. Providing Application Flexibility with Relational Databases : Relational databases were originally proposed to separate the physical storage of data from its conceptual representation and to provide a mathematical foundation for databases. The relational data model also introduced high-level query languages that provided an alternative to programming language interfaces; hence, it was a lot quicker to write new queries. Eventually, relational databases became the dominant type of database systems for traditional database applications. Relational databases now exist on almost all types of computers, from small personal computers to large servers.
    3. Object-Oriented Applications and the Need for More Complex Databases : The emergence of object-oriented programming languages in the 1980s and the need to store and share complex-structured objects led to the development of object-oriented databases. Initially, they were considered a competitor to relational databases, since they provided more general data structures. They also incorporated many of the useful object oriented paradigms, such as abstract data types, encapsulation of operations, inheritance, and object identity. However, the complexity of the model and the lack of an early standard contributed to their limited usc. They are now mainly used in specialized applications, such as engineering design, multimedia publishing, and manufacturing systems.
    4. Interchanging Data on the Web for E-Commerce : The World Wide Web provided a large network of interconnected computers. Users can create documents using a Web publishing language, such as HTML (HyperText Markup Language), and store these documents on Web servers where other users (clients) can access them. Documents can be linked together through hvpcrlinks, which are pointers to other documents. A variety of techniques were developed to allow the interchange of data on the Web. Currently, XML (eXtended Markup Language) is considered to be the primary standard for interchanging data among various types of databases and Web pages. XML combines concepts from the models used in document systems with database modeling concepts.
    5. Extending Database Capabilities for New Applications : The success of database systems in traditional applications encouraged developers of other types of applications to attempt to use them. Such applications traditionally used their own specialized file and data structures.

    The main characteristics of the database management system

    • Self-describing nature of a database system
    • Insulation between programs and data, and data abstraction
    • Support of multiple views of the data
    • Sharing of data and multiuser transaction procesing

    04 September 2009

    What is the most important difference between a disk and a tape?

    Tapes are sequential devices that do not support direct access to a desired page. We must essentially step through all pages in order. Disks support direct access to a desired page.

    Why are I/O costs important in a DBMS?

    I/O costs are of primary important to a DBMS because these costs typically dominate the time it takes to run most database operations. Optimizing the amount of I/O’s for an operation can result in a substantial increase in speed in the time it takes to run that operation.

    Why does a DBMS store data on external storage?

    A DBMS stores data on external storage because the quantity of data is vast, and must persist across program executions.

    When is an XML document well-formed? When is an XML document valid?

    An XML document is valid if it has an associated DTD and the document follows the rules of the DTD. An XML document is well-formed if it follows three guidelines:
    (1) it starts with an XML declaration, (2) it contains a root element that contains all other elements and (3) all elements are properly nested.

    What is the difference between a webserver and an application server? What funcionality do typical application servers provide?

    A web server handles the interaction with the client’s web browser. Application servers are used to maintain a pool of processes for handling requests. Typically, they are the middleware tier between the web server and the data sources such as database systems. Application servers eliminate the problems with processcreation overload and can also provide extra functionality like abstracting away heterogeneous data sources and maintaining session state information.

    Role Indicator

    Role Indicator - If an entity set plays more than one role, role indicators describe the different purpose in the relationship.

    Aggregation

    Aggregation - a feature of the entity relationship model that allows a relationship set to participate in another relationship set. This is indicated on an ER diagram by drawing a dashed box around the aggregation.

    Weak Entity Set

    Weak Entity Set - an entity that cannot be identified uniquely without considering some primary key attributes of another identifying owner entity.

    Covering Constraint

    Covering Constraint - within an ISA hierarchy, a covering constraint determines where the entities in the subclasses collectively include all entities in the superclass.

    Overlap Constraint

    Overlap Constraint -An overlap constraint determines whether or not two subclasses can contain the same entity.

    Participation Constraint

    Participation Constraint - a participation constraint determines whether relationships must involve certain entities.

    Many-to-Many Relationship

    Many-to-Many Relationship - a key constraint that indicates that many of one entity can be associated with many of another entity. An example of a many-to-many relationship is employees and their hobbies: a person can have many different hobbies, and many people can have the same hobby.

    One-to-Many Relationship

    One-to-Many Relationship - a key constraint that indicates that one entity can be associated with many of another entity. An example of a one-to-many relationship is when an employee can work for only one department, and a department can have many employees.

    Relationship Set

    Relationship Set - a collection of similar relationships.

    Entity Set

    Entity Set - a collection of similar entities such as all of the toys in the toy department.

    Relationship

    Relationship - an association among two or more entities.

    Entity

    Entity - an object in the real world that is distinguishable from other objects.

    Domain

    Domain - a set of possible values for an attribute.

    Attribute

    Attribute - a property or description of an entity. A toy department employee entity could have attributes describing the employee’s name, salary, and years of service.

    Why does a DBMS interleave the actions of different transactions instead of executing transactions one after the other?

    A DBMS is typically shared among many users. Transactions from these users can be interleaved to improve the execution time of users’ queries. By interleaving queries, users do not have to wait for other user’s transactions to complete fully before their own transaction begins. Without interleaving, if user A begins a transaction that will take 10 seconds to complete, and user B wants to begin a transaction, user B would have to wait an additional 10 seconds for user A’s transaction to complete before the database would begin processing user B’s request.

    Transaction

    A transaction is any one execution of a user program in a DBMS. This is the basic unit of change in a DBMS.

    The responsibilities of a DBA

    • Designing the logical and physical schemas, as well as widely-used portions of the external schema.
    • Security and authorization.
    • Data availability and recovery from failures.
    • Database tuning: The DBA is responsible for evolving the database, in particular the conceptual and physical schemas, to ensure adequate performance as user requirements change.

    The difference between logical and physical data independence

    Logical data independence means that users are shielded from changes in the logical structure of the data, while physical data independence insulates users from changes in the physical storage of the data.

    The advantages of using a DBMS are given below:

    • Data independence and efficient access. Database application programs are independent of the details of data representation and storage. The conceptual and external schemas provide independence from physical storage decisions and logical design decisions respectively. In addition, a DBMS provides efficient storage and retrieval mechanisms, including support for very large files, index structures and query optimization.
    • Reduced application development time. Since the DBMS provides several important functions required by applications, such as concurrency control and crash recovery, high level query facilities, etc., only application-specific code needs to be written. Even this is facilitated by suites of application development tools available from vendors for many database management systems.
    • Data integrity and security. The view mechanism and the authorization facilities of a DBMS provide a powerful access control mechanism. Further, updates to the data that violate the semantics of the data can be detected and rejected by the DBMS if users specify the appropriate integrity constraints.
    • Data administration. By providing a common umbrella for a large collection of data that is shared by several users, a DBMS facilitates maintenance and data administration tasks. A good DBA can effectively shield end-users from the chores of fine-tuning the data representation, periodic back-ups etc.
    • Concurrent access and crash recovery. A DBMS supports the notion of a transaction, which is conceptually a single user’s sequential program. Users can write transactions as if their programs were running in isolation against the database. The DBMS executes the actions of transactions in an interleaved fashion to obtain good performance, but schedules them in such a way as to ensure that conflicting operations are not permitted to proceed concurrently. Further, the DBMS maintains a continuous log of the changes to the data, and if there is a system crash, it can restore the database to a transaction-consistent state. That is, the actions of incomplete transactions are undone, so that the database state reflects only the actions of completed transactions. Thus, if each complete transaction, executing alone, maintains the consistency criteria, then the database state after recovery from a crash is consistent.

    Why would you choose a database system instead of simply storing data in operating system files?

      A database is an integrated collection of data, usually so large that it has to be stored on secondary storage devices such as disks or tapes. This data canbe maintained as a collection of operating system files,or stored in a DBMS (database management system).