CS614 Finalterm Solved Paper 2009

Update at Thursday, 13 September 2012

FINALTERM EXAMINATION
Spring 2009
CS614- Data Warehousing

M a r k s: 70

Question No: 1 ( M a r k s: 1 ) http://vuzs.net
It is observed that every year the amount of data recorded in an organization is

Doubles (handouts page # 6)

Triples

Quartiles

Remains same as previous year

Question No: 2 ( M a r k s: 1 ) http://vuzs.net
Multidimensional databases typically use proprietary __________ format to store
pre-summarized cube structures.

File ( Page # 69 )

Application

Aggregate

Database

Question No: 3 ( M a r k s: 1 ) http://vuzs.net
Pre-computed _______ can solve performance problems

Aggregates (page # 101)

Facts

Dimensions

Question No: 4 ( M a r k s: 1 ) http://vuzs.net
_______________, if fits into memory, costs only one disk I/O access to locate a
record by given key.

A Dense Index (page # 211)

A Sparse Index

An Inverted Index

None of These

Question No: 5 ( M a r k s: 1 ) http://vuzs.net
The degree of similarity between two records, often measured by a numerical
value between _______, usually depends on application characteristics.

0 and 1 (page # 157 )

0 and 10

0 and 100

0 and 99

Question No: 6 ( M a r k s: 1 ) http://vuzs.net
The purpose of the House of Quality technique is to reduce ______ types of risk.

Two (page # 181)

Three

Four

All

Question No: 7 ( M a r k s: 1 ) http://vuzs.net
NUMA stands for __________

Non-uniform Memory Access ( page # 194)

Non-updateable Memory Architecture

New Universal Memory Architecture

Question No: 8 ( M a r k s: 1 ) http://vuzs.net
Which is the least appropriate join operation for Pipeline parallelism?

Hash Join

Inner Join

Outer Join

Sort-Merge Join

Question No: 9 ( M a r k s: 1 ) http://vuzs.net
There are many variants of the traditional nested-loop join. If the index is built as
part of the query plan and subsequently dropped, it is called

Naive nested-loop join

Index nested-loop join

Temporary index nested-loop join ( page # 230)

None of these

Question No: 10 ( M a r k s: 1 ) http://vuzs.net
Data mining derives its name from the similarities between searching for valuable
business information in a large database, for example, finding linked products in
gigabytes of store scanner data, and mining a mountain for a _________ of
valuable ore.

Furrow

Streak

Trough

Vein

Question No: 11 ( M a r k s: 1 )
With data mining, the best way to accomplish this is by setting aside some of
your data in a ________ to isolate it from the mining process; once the mining is
complete, the results can be tested against the isolated data to confirm the
model's validity.

Cell

Disk

Folder

Vault

Question No: 12 ( M a r k s: 1 ) http://vuzs.net
The Kimball s iterative data warehouse development approach drew on decades
of experience to develop the _____________.

Business Dimensional Lifecycle (page # 276 )

Data Warehouse Dimension

Business Definition Lifecycle

OLAP Dimension

Question No: 13 ( M a r k s: 1 ) http://vuzs.net
We must try to find the one access tool that will handle all the needs of their
users.

True

False

Question No: 14 ( M a r k s: 1 ) http://vuzs.net
For a smooth DWH implementation we must be a technologist.

True

False (page # 306)

Question No: 15 ( M a r k s: 1 ) http://vuzs.net
During the application specification activity, we also must give consideration to
the organization of the applications.

True ( page # 294 )

False

Question No: 16 ( M a r k s: 1 ) http://vuzs.net
Investing years in architecture and forgetting the primary purpose of solving
business problems, results in inefficient application. This is the example of
_________ mistake.

Extreme Technology Design

Extreme Architecture Design

None of these (page # 303)

Question No: 17 ( M a r k s: 1 ) http://vuzs.net
The most recent attack is the ________ attack on the cotton crop during 2003-
04, resulting in a loss of nearly 0.5 million bales.

Boll Worm (VIDO LECTURE # 38)

Purple Worm

Blue Worm

Cotton Worm

Question No: 18 ( M a r k s: 1 ) http://vuzs.net
The users of data warehouse are knowledge workers in other words they are
_________ in the organization.

Decision maker (page# 10 )

Manager

Database Administrator

DWH Analyst

Question No: 19 ( M a r k s: 1 ) http://vuzs.net
_________ breaks a table into multiple tables based upon common column
values.

Horizontal splitting (page # 46 )

Vertical splitting

Question No: 20 ( M a r k s: 1 ) http://vuzs.net
Execution can be completed successfully or it may be stopped due to some
error. In case of successful completion of execution all the transactions will be
___________

Committed to the database (page # 398 last line)

Rolled back

Question No: 21 ( M a r k s: 2 )
What is meant by the statement Be a diplomat NOT a technologist in the
context of a data warehouse development project?

7. Be a diplomat NOT a technologist

The biggest problem you will face during a warehouse implementation will be people, not the technology or the development. You’re going to have senior management complaining about completion dates and unclear objectives. You’re going to have development people protesting that everything takes too long and why can’t they do it the old way? You’re going to have users with outrageously unrealistic expectations, who are used to systems that require mouse-clicking but not much intellectual investment on their part. And you’re going to grow exhausted, separating out Needs from Wants at all levels. Commit from the outset to work very hard at communicating the realities, encouraging investment, and cultivating the development of new skills in your team and your users (and even your bosses).

Question No: 22 ( M a r k s: 2 )
Elaborate the concept of data parallelism.

Parallel execution of a single data manipulation task across multiple partitions of data.

Partitions static or dynamic

Tasks executed almost-independently across partitions.

“Query coordinator” must coordinate between the independently executing processes.

So data parallelism is I think the simplest form of parallelization. The idea is that we have parallel execution of single data operation across multiple partitions of data. So the idea here is that these partitions of data may be defined statically or dynamically fine, but we are requiring the same operator across these multiple partitions concurrently. And this idea actually of data parallelism has existed for a very long time.

www.vuzs.net
Question No: 23 ( M a r k s: 2 )
What will be the effect if we program a package by using DTS object model?

Question No: 24 ( M a r k s: 3 )
What is meant by the classification process? How we measure the accuracy of
classifiers?
Classification means that based on the properties of existing data, we have made or groups i.e. we have made classification.

Question No: 25 ( M a r k s: 3 )
How page dimension captures the static and dynamic nature of different web
pages?
Question No: 26 ( M a r k s: 3 )
Write down the limitations of pipelining parallelism?

Pipeline parallelism is a good fit for data warehousing (where we are working with lots of data), but it makes no sense for OLTP because OLTP tasks are not big enough to justify breaking them down into subtasks.

Question No: 27 ( M a r k s: 5 )
For a maximum performance of Bitmapped index, what characteristics a query
should have?

Question No: 28 ( M a r k s: 5 )
How the three parallel tracks capture the user requirements in the Kimball s data
warehouse life cycle Road Map?

Question No: 29 ( M a r k s: 5 )
How time contiguous log entries and HTTP secure socket layer are used for user
session identification? What are the limitations of these techniques?

Question No: 30 ( M a r k s: 10 )
What are the issues regarding the record management tools at campuses where
text files are used to store data?

Main issues

Data duplication

Update the data

Data deletion

We can easily elaborate these issues

Question No: 31 ( M a r k s: 10 )
Shared RDBMS architecture requires a static partitioning. How do you perform the
partitioning.

VU Subjects