These are my notes while studying for the Developing SQL Databases using the Exam Ref 70-762 book from Microsoft. It is written by Louis Davidson ans Stacia Varga. The book is a study companion for the actual Certification Exam which I’m most likely is not going to take, I’m doing this studying because I want to learn more about SQL databases but I feel that I don’t need the actual certification at this point just the knowledge. I’m doing this study as a part of a 30 day challenge where I commit to 20-30 minutes of studying per day starting today.
2020-04-18 (day 7)
Starting Skill 1.2 Design & implement indexes
In this skill B-Tree indexes on on-disk tables are covered. A term that also is used for the B-Tree indexes is rowstore.
Their structures are designed to contain related data for the a row.Indexes are used to speed access to rows using a scan of values in a table or index, or a seek for specific row(s) in an index.
Rowstore indexes on the on-disk tables are based on the concept of B-Tree structure. consisting of index nodes that sort the data to speed finding one value. In a search via a B-Tree you follow pointers to intermediate nodes which refines the search until you reach the leaf node that holds the data (see below).
There are 2 types of indexes in SQL Server, clustered and non-clustered.
Clustered indexes are indexes where the leaf node in the tree contains the actual data in the table.
Non-Clustered indexes is a separate structure that has a copy of the data in the leaf node that is in the keys along with a pointer to the heap or clustered index. The structure of the non-clustered leaf pages depends on whether it is a heap or a clustered index. For heaps it contains a pointer to the physical structure where the data is stored (resides).
For clustered table it contains the value of the clustered index keys (a.k.a. clustering keys) when the index key is a single column it is called a simple index. If it contains multiple columns it is called a composite index.
Indexes will be sorted in the order of the leading column first , then second column etc. For composite indexes it is best to choose the column with the most unique values as the first column. Max key size for a non-clustered index is 1700 bytes. Clustered indexes has a max size of 900 bytes
Indexes has pages that are 8060 bytes in size which includes the keys so it is important to design your keys appropriately so that the keys don’t use to much of the 8060 bytes on the page which then leads to the situation where there is not enough space for the data and to many pages are used which can affect performance of the index.
Note to self, think I need to find a separate article outside the boos to fully understand the concept of clustered vs. non-clustered indexes.
2020-04-17 (Day 4-5-6)
I have not updated in a few days but I have continued to study each day and take notes. Lately I have been covering how to determine the most efficient data types to use. Choosing the best type has immense value for the systems implemented using the database.
- It serves as the first limitation of domain of data values that columns can store
- It is important for performance
SQL Server has an extensive set of data types that we can choose from. For me it is easier to link to a listing of data types than to list them all myself. But i will provide an overview picture.
Beyond the basic data type, there are a couple of additional constructs that extends the concepts of a data type, they are;
Computed columns let you manifest expression as a column for usage. The engine maintains values that not meet the normalization rules. Example, you have columns for FirstName and LastName and you want to include a column for named FullName (as the concatenadted value of FirstName and LastName). If FullName was a column it would be duplicated data that would have to be managed and maintained. But adding it as a computed column means that the data is either instantiated at query time or you can specify it deterministic, PERSISTED. Persisted returns same value every execution and is saved in the physical storage.
Dynamic data masking can be included in the column definition when the table is created. If a column has had data masking added the masking rule is used when data is selected from the table.There are 4 types of mask functions available in SQL Server.
- DEFAULT – Takes the default mask for the data type
- EMAIL – Masks email so you only see a few meaningful characters
- RANDOM – Masks only the numeric data types (int,smallint, decimal et.) with a random value within a range
- PARTIAL – Allows you to take values from the front and back ans replace center characters with a fixed string value.
Once masking is applied the functin emits a mased value unless the column value is NULL in which case NUKK is returned.
Who can see masked or unmasked values iss controlled by a database level permission called UNMASKED. the dbo user always has has this right.
Next up we look at indexes.
2020-04-14 (Day 3)
Today was just continuing to read the book and take notes about normalization, “Rules covering the relationship of non-key attributes and key attributes”. When I first read the heading of this section it made no sense to me but after reading the section it did.
The key concept for normalization that is described is that all non-key attributes need to describe the key and what it represent.
Here column Manufacturer_URL should not be included in the table CAR because it does not help describing the car with LicenseTag = XYZ.
This is called the Boyce-Codd Normal form
Every candidate key is identified, all attributes are fully dependent on a key,and all columns must identify a fact about a key and nothing but a key.
It’s not harder than that!
2020-04-13 (Day 2)
I just installed SQL Server Express 19 and upgraded my SQL Sever Management Studio to v. 18.5.
I also dove in to chapter 1 and Skill 1.1 Design and implement database objects. Here are some notes I made.
Section Design and implement a relational database schema.
- It is important to gather business requirements before starting to create the database objects. This can take a lot of time and attention to detail.
- The process of database design involves scanning the requirements documents looking for key words and phrases. Nouns that can represent the tables or subsets or maybe types in the database.
- Requirement text can store information that might be stored about each object
- Schemas are security/organizational groupings of tables and code for the database.
Normalization is a set of ‘rules’ that cover som of the most fundamental structural issues with relational database design. All of these rules are very simple at their core and each will deal with eliminating some issue that is problematic to the users of the database when trying to store data with the least redundancy and highest potential for performance using SQL Server’s relational engine.
There are 2 families of normalization concepts
- Rules covering the shape of the table
- Rules covering the relationship of non-key attributes and key attributes
Rules covering the shape of the table
- Relations require that you have no duplicate rows
- A column or a set of columns that are used to uniquely identify one row from another row is called a key
- There are several types of keys (will be discussed later).
- All keys identify a row
- Foreign keys identify a row in another table
- Natural keys is a value that naturally identifies a row, i.e. EmployeeNumber
- Artificial keys, a.k.a. Surrogate keys, is a value that is created in the design phase to identify the row, i.e. EmployeeId
- More about thi slater when we dive in to UNIQUE and PRIMARY KEY constraints
- The desire is that every column in the table represents just a single value.
- Name = ‘Villén, Mats’
- LastName = ‘Villén’
- FirstName = ‘Mats’
- Avoid using arrays in tables. Use a second sub-table instead.
2020-04-12 (Day 1)
Below is a listing of what this study book will teach you. As the first step I went over the overview of chapter 1 to orient myself on what will be covered and under lined some areas that seemed important. These terms and areas are bold and italicized below. More tomorrow when I dive in to Skill 1.1. I plan to install SQL Express before then so I can follow along with the exercises which I think will be important in this book more that the Exam Ref 70-761 that I went through in January and February.
Skill 1.1 – Design and implement database objects
- Design and implement a relational database schema
- Design tables and schemas based on business requirements
- Improve design with normalisation
- Write create table statements
- Determine the most efficient data types
Skill 1.2 – Design and Implement indexes
- Design new indexes based on provided tables, queries or plans
- Distinguish between indexed columns and included columns
- Implement clustered index columns by using best practices
- Recommend new indexes based on execution plans
Skill 1.3 – Design and implement views
- Design a view to select data based on user or business requrements
- Identify steps necessary to design an updatable view
- Implement partitioned views
- Implement indexed view
Skill 1.4 – Implement columnstore indexes
- Determine use cases that support the use of columnstore indexes
- Identify proper usage of clustered and non-clustered columnstore indexes
- Implement columnstore index maintenance
The following chapters are
- Chapter 2 – Implement programmability objects
- Chapter 3 – Manage database concurrency
- Chapter 4 – Optimize database objects and SQL infrastucture