Microsoft Certified: Azure Data Engineer Associate Mock Exam-1

Mock 1 | DP 203 | Microsoft Certified: Azure Data Engineer Associate

Welcome to Microsoft Certified: Azure Data Engineer Associate | DP 203 mock exam. We are bound to provide you the best quality exam mocks absolutely free.

1 / 51

You plan to implement an Azure Data Lake Storage Gen2 container that will contain CSV files. The size of the files will vary based on the number of events that occur per hour.
File sizes range from 4 KB to 5 GB.
You need to ensure that the files stored in the container are optimized for batch processing. What should you do?

Convert the files to JSON

Convert the files to Avro

Compress the files

Merge the files

2 / 51

You are planning a solution to aggregate streaming data that originates in Apache Kafka and is output to Azure Data Lake Storage Gen2. The developers who will implement the stream processing solution use Java. Which service should you recommend using to process the streaming data?

Azure Event Hubs

Azure Data Factory

Azure Stream Analytics

Azure Databricks

3 / 51

You have an Azure Synapse Analytics workspace named WS1 that contains an Apache Spark pool named Pool1.
You plan to create a database named DB1 in Pool1.
You need to ensure that when tables are created in DB1, the tables are available automatically as external tables to the built-in serverless SQL pool.
Which format should you use for the tables in DB1?

CSV

ORC

JSON

Parquet

4 / 51

You build a data warehouse in an Azure Synapse Analytics dedicated SQL pool.
Analysts write a complex SELECT query that contains multiple JOIN and CASE statements to transform data for use in inventory reports. The inventory reports will use the data and additional WHERE parameters depending on the report. The reports will be produced once daily.
You need to implement a solution to make the dataset available for the reports. The solution must minimize query times.
What should you implement?

an ordered clustered columnstore index

a materialized view

result set caching

a replicated table

5 / 51

You have an Azure Storage account that contains 100 GB of files. The files contain rows of text and numerical values. 75% of the rows containdescription data that has an average length of 1.1 MB.

You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse Analytics. You need to prepare the files to ensure that the data copies quickly.
Solution: You modify the files to ensure that each row is more than 1 MB. Does this meet the goal?

Yes

6 / 51

You have an Azure Storage account that contains 100 GB of files. The files contain rows of text and numerical values. 75% of the rows contain description data that has an average length of 1.1 MB.

You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse Analytics. You need to prepare the files to ensure that the data copies quickly.
Solution: You copy the files to a table that has a columnstore index. Does this meet the goal?

Yes

7 / 51

You have an Azure Storage account that contains 100 GB of files. The files contain rows of text and numerical values. 75% of the rows contain description data that has an average length of 1.1 MB.

You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse Analytics. You need to prepare the files to ensure that the data copies quickly.
Solution: You convert the files to compressed delimited text files. Does this meet the goal?

Yes

8 / 51

From a website analytics system, you receive data extracts about user interactions such as downloads, link clicks, form submissions, and video plays. The data contains the following columns.

dp 203 dumps

You need to design a star schema to support analytical queries of the data. The star schema will contain four tables including a date dimension. To which table should you add each column? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.
Hot Area:

TotalEvents:

9 / 51

From a website analytics system, you receive data extracts about user interactions such as downloads, link clicks, form submissions, and video plays. The data contains the following columns.

dp 203 dumps

NOTE: Each correct selection is worth one point.
Hot Area:

ChannelGrouping:

10 / 51

From a website analytics system, you receive data extracts about user interactions such as downloads, link clicks, form submissions, and video plays. The data contains the following columns.

dp 203 dumps

NOTE: Each correct selection is worth one point.
Hot Area:

EventCategory:

11 / 51

You are designing a fact table named FactPurchase in an Azure Synapse Analytics dedicated SQL pool. The table contains purchases from suppliers for a retail store. FactPurchase will contain the following columns

dp 203 dumps

FactPurchase will have 1 million rows of data added daily and will contain three years of data. Transact-SQL queries similar to the following query will be executed daily.

SELECT
SupplierKey, StockItemKey, IsOrderFinalized, COUNT(*)
FROM FactPurchase
WHERE DateKey >= 20210101
AND DateKey <= 20210131
GROUP By SupplierKey, StockItemKey, IsOrderFinalized

Which table distribution will minimize query times?

replicated

hash-distributed on PurchaseKey

round-robin

hash-distributed on IsOrderFinalized

12 / 51

You have a SQL pool in Azure Synapse.

You plan to load data from Azure Blob storage to a staging table. Approximately 1 million rows of data will be loaded daily. The table will be truncated before each daily load.

You need to create the staging table. The solution must minimize how long it takes to load the data to the staging table. How should you configure the table? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.
Hot Area:

dp 203 dumps

Distribution:

13 / 51

You have a SQL pool in Azure Synapse.

You plan to load data from Azure Blob storage to a staging table. Approximately 1 million rows of data will be loaded daily. The table will be truncated before each daily load.

NOTE: Each correct selection is worth one point.
Hot Area:

dp 203 dumps

Indexing:

14 / 51

You have a SQL pool in Azure Synapse.

You plan to load data from Azure Blob storage to a staging table. Approximately 1 million rows of data will be loaded daily. The table will be truncated before each daily load.

NOTE: Each correct selection is worth one point.
Hot Area:

dp 203 dumps

Distribution:

15 / 51

You plan to implement an Azure Data Lake Gen 2 storage account.

You need to ensure that the data lake will remain available if a data center fails in the primary Azure region. The solution must minimize costs. Which type of replication should you use for the storage account?

geo-redundant storage (GRS)

geo-zone-redundant storage (GZRS)

locally-redundant storage (LRS)

zone-redundant storage (ZRS)

16 / 51

You have an Azure Data Lake Storage Gen2 container that contains 100 TB of data.

You need to ensure that the data in the container is available for read workloads in a secondary region if an outage occurs in the primary region. The solution must minimize costs.

Which type of data redundancy should you use?

geo-redundant storage (GRS)

read-access geo-redundant storage (RA-GRS)

zone-redundant storage (ZRS)

locally-redundant storage (LRS)

17 / 51

You have two Azure Storage accounts named Storage1 and Storage2. Each account holds one container and has the hierarchical namespace enabled. The system has files that contain data stored in the Apache Parquet format.

You need to copy folders and files from Storage1 to Storage2 by using a Data Factory copy activity. The solution must meet the following requirements:

-No transformations must be performed.
-The original folder structure must be retained.
-Minimize time required to perform the copy activity.

How should you configure the copy activity? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.
Hot Area:

dp 203 dumps

Copy activity copy behavior:

18 / 51

You need to copy folders and files from Storage1 to Storage2 by using a Data Factory copy activity. The solution must meet the following requirements:

-No transformations must be performed.
-The original folder structure must be retained.
-Minimize time required to perform the copy activity.

How should you configure the copy activity? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.
Hot Area:

dp 203 dumps

Source dataset type:

19 / 51

You have an Azure Synapse Analytics dedicated SQL pool that contains the users shown in the following table.

dp 203 dumps

User1 is the only user who has access to the unmasked data.

Use the drop-down menus to select the answer choice that completes each statement based on the information presented in the graphic.

NOTE: Each correct selection is worth one point.
Hot Area:

When User1 queries the BirthDate column. the values returned will be:

20 / 51

You have an Azure Synapse Analytics dedicated SQL pool that contains the users shown in the following table.

dp 203 dumps

User1 is the only user who has access to the unmasked data.

Use the drop-down menus to select the answer choice that completes each statement based on the information presented in the graphic.

NOTE: Each correct selection is worth one point.
Hot Area:

When User2 queries the YearlyIncome column, the values returned will be:

21 / 51

You have an enterprise-wide Azure Data Lake Storage Gen2 account. The data lake is accessible only through an Azure virtual network named VNET1.

You are building a SQL pool in Azure Synapse that will use data from the data lake.
Your company has a sales team. All the members of the sales team are in an Azure Active Directory group named Sales. POSIX controls are used to assign the Sales group access to the files in the data lake.

You plan to load data to the SQL pool every hour.
You need to ensure that the SQL pool can load the sales data from the data lake.

Which three actions should you perform? Each correct answer presents part of the solution.
NOTE: Each area selection is worth one point.

Add the managed identity to the Sales group.

Use the managed identity as the credentials for the data load process.

Create a shared access signature (SAS).

Add your Azure Active Directory (Azure AD) account to the Sales group.

Use the snared access signature (SAS) as the credentials for the data load process.

Create a managed identity.

22 / 51

You need to design an Azure Synapse Analytics dedicated SQL pool that meets the following requirements:

-Can return an employee record from a given point in time.
-Maintains the latest employee information.
-Minimizes query complexity.

How should you model the employee data?

as a temporal table

as a SQL graph table

as a degenerate dimension table

as a Type 2 slowly changing dimension (SCD) table

23 / 51

You need to create a partitioned table in an Azure Synapse Analytics dedicated SQL pool.

How should you complete the Transact-SQL statement? To answer, drag the appropriate values to the correct targets. Each value may be used once,more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.

NOTE: Each correct selection is worth one point.
Select and Place:

dp-203 dumps

Select two most appropriate answers. In real exam you may get drag and drop option.

CLUSTERED INDEX

COLLATE

DISTRIBUTION

PARTITION

PARTITION FUNCTION

PARTITION SCHEME

24 / 51

HOTSPOT (Drag and Drop is not supported)
You have an Azure Data Lake Storage Gen2 container.

Data is ingested into the container, and then transformed by a data integration application. The data is NOT modified after that. Users can read files in the container but cannot modify the files.

You need to design a data archiving solution that meets the following requirements:

-New data is accessed frequently and must be available as quickly as possible.
-Data that is older than five years is accessed infrequently but must be available within one second when requested.
-Data that is older than seven years is NOT accessed. After seven years, the data must be persisted at the lowest cost possible.
-Costs must be minimized while maintaining the required availability.

How should you manage the data? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point
Hot Area:

dp 203 dumps

Seven-year old data:

25 / 51

HOTSPOT (Drag and Drop is not supported)
You have an Azure Data Lake Storage Gen2 container.

Data is ingested into the container, and then transformed by a data integration application. The data is NOT modified after that. Users can read files in the container but cannot modify the files.

You need to design a data archiving solution that meets the following requirements:

How should you manage the data? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point
Hot Area:

Five-year old data:

26 / 51

You have a data model that you plan to implement in a data warehouse in Azure Synapse Analytics as shown in the following exhibit.

dp 203 dumps

All the dimension tables will be less than 2 GB after compression, and the fact table will be approximately 6 TB. The dimension tables will be relatively static with very few data inserts and updates.

Which type of table should you use for each table? To answer, select the appropriate options in the answer area.

Fact_DailyBookins:

27 / 51

You have a data model that you plan to implement in a data warehouse in Azure Synapse Analytics as shown in the following exhibit.

dp 203 dumps

Which type of table should you use for each table? To answer, select the appropriate options in the answer area.

Dim_Time:

28 / 51

You have a data model that you plan to implement in a data warehouse in Azure Synapse Analytics as shown in the following exhibit.

dp 203 dumps

Which type of table should you use for each table? To answer, select the appropriate options in the answer area.

Dim_Employee:

29 / 51

You have a data model that you plan to implement in a data warehouse in Azure Synapse Analytics as shown in the following exhibit.

dp 203 dumps

Which type of table should you use for each table? To answer, select the appropriate options in the answer area.

Dim_Customer:

30 / 51

HOTSPOT (Drag and Drop is not supported)
You use Azure Data Factory to prepare data to be queried by Azure Synapse Analytics serverless SQL pools.

Files are initially ingested into an Azure Data Lake Storage Gen2 account as 10 small JSON files. Each file contains the same data attributes and data from a subsidiary of your company.

You need to move the files to a different folder and transform the data to meet the following requirements:
-Provide the fastest possible query times.
-Automatically infer the schema from the underlying files.

How should you configure the Data Factory copy activity? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.
Hot Area:

CSV

JSON

Parquet

TXT

31 / 51

HOTSPOT (Drag and Drop is not supported)
You use Azure Data Factory to prepare data to be queried by Azure Synapse Analytics serverless SQL pools.

Files are initially ingested into an Azure Data Lake Storage Gen2 account as 10 small JSON files. Each file contains the same data attributes and data from a subsidiary of your company.

How should you configure the Data Factory copy activity? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.
Hot Area:

Flatten hierarchy

Merge files

Preserve heirarchy

32 / 51

HOTSPOT (Drag and Drop is not supported)
You need to output files from Azure Data Factory.

Which file format should you use for each type of output? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.
Hot Area:

dp 203 dumps

JSON with a timestamp:

33 / 51

HOTSPOT (Drag and Drop is not supported)
You need to output files from Azure Data Factory.

Which file format should you use for each type of output? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.
Hot Area:

dp 203 dumps

Columnar format:

34 / 51

You are designing the folder structure for an Azure Data Lake Storage Gen2 container.

Users will query data by using a variety of services including Azure Databricks and Azure Synapse Analytics serverless SQL pools. The data will be secured by subject area. Most queries will include data from the current year or current month.

Which folder structure should you recommend to support fast queries and simplified folder security?

/{SubjectArea}/{DataSource}/{DD}/{MM}/{YYYY}/{FileData}_{YYYY}_{MM}_{DD}.csv

/{DD}/{MM}/{YYYY}/{SubjectArea}/{DataSource}/{FileData}_{YYYY}_{MM}_{DD}.csv

/{YYYY}/{MM}/{DD}/{SubjectArea}/{DataSource}/{FileData}_{YYYY}_{MM}_{DD}.csv

/{SubjectArea}/{DataSource}/{YYYY}/{MM}/{DD}/{FileData}_{YYYY}_{MM}_{DD}.csv

35 / 51

HOTSPOT (Drag and Drop is not supported)
You are planning the deployment of Azure Data Lake Storage Gen2. You have the following two reports that will access the data lake:

-Report1: Reads three columns from a file that contains 50 columns.
-Report2: Queries a single record based on a timestamp.

You need to recommend in which format to store the data in the data lake to support the reports. The solution must minimize read times. What should you recommend for each report? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.
Hot Area:

dp 203 dumps

REPORT2:

Avro

CSV

Parquet

TSV

36 / 51

HOTSPOT (Drag and Drop is not supported)
You are planning the deployment of Azure Data Lake Storage Gen2. You have the following two reports that will access the data lake:

-Report1: Reads three columns from a file that contains 50 columns.
-Report2: Queries a single record based on a timestamp.

NOTE: Each correct selection is worth one point.
Hot Area:

dp 203 dumps

REPORT1:

37 / 51

You have files and folders in Azure Data Lake Storage Gen2 for an Azure Synapse workspace as shown in the following exhibit.

dp 203 dumps

You create an external table named ExtTable that has LOCATION='/topfolder/'.
When you query ExtTable by using an Azure Synapse Analytics serverless SQL pool, which files are returned?

File2.csv and File3.csv only

File1.csv and File4.csv only

File1.csv, File2.csv, File3.csv, and File4.csv

File1.csv only

38 / 51

DRAG DROP (Drag and Drop is not supported)
You have a table named SalesFact in an enterprise data warehouse in Azure Synapse Analytics. SalesFact contains sales data from the past 36 months and has the following characteristics:

-Is partitioned by month
-Contains one billion rows
-Has clustered columnstore indexes

At the beginning of each month, you need to remove data from SalesFact that is older than 36 months as quickly as possible.

Which three actions should you perform in sequence in a stored procedure? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.

Select and Place:

Switch the partition containing the sate data from SalesFact to SalesFact_Work

Truncate the partition containing the sale data.

Drop the SalesFact_Work Table

Create an empty table named SalesFact_Work that has the same schema as SalesFact.

Execute a DELETE statement where the value in the Data column is more than 36 months ago.

Copy the data to a new table using CREATE TABLE AS SELECT(CTAS)

39 / 51

You have an Azure Synapse workspace named MyWorkspace that contains an Apache Spark database named mytestdb. You run the following command in an Azure Synapse Analytics Spark pool in MyWorkspace.

CREATE TABLE mytestdb.myParquetTable(
EmployeeID int,
EmployeeName string,
EmployeeStartDate date)
USING Parquet

You then use Spark to insert a row into mytestdb.myParquetTable. The row contains the following data.

DP 203 DUMPS

One minute later, you execute the following query from a serverless SQL pool in MyWorkspace.
SELECT EmployeeID
FROM mytestdb.dbo.myParquetTable
WHERE name = 'Alice';
What will be returned by the query?

an error

a null value

40 / 51

You have a table in an Azure Synapse Analytics dedicated SQL pool. The table was created by using the following Transact-SQL statement.

-Ensure that users can identify the current manager of employees.
-Support creating an employee reporting hierarchy for your entire company.
-Provide fast lookup of the managers’ attributes such as name and job title.

Which column should you add to the table?

[ManagerEmployeeID] [int] NULL

[ManagerEmployeeID] [smallint] NULL

[ManagerEmployeeKey] [int] NULL

[ManagerName] [varchar](200) NULL

41 / 51

You need to design a data retention solution for the Twitter feed data records. The solution must meet the customer sentiment analytics requirements.
Which Azure Storage functionality should you include in the solution?

change feed

soft delete

time-based retention

lifecycle management

42 / 51

HOTSPOT (Drag and Drop is not supported)
You need to implement an Azure Synapse Analytics database object for storing the sales transactions data. The solution must meet the sales transaction dataset requirements.

What should you do? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:

Partitioning option to use in the WITH clause of the DDL statement:

43 / 51

What should you do? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:

Transact-SQL DDL command to use:

44 / 51

HOTSPOT (Drag and Drop is not supported)
You need to design an analytical storage solution for the transactional data. The solution must meet the sales transaction dataset requirements. What should you include in the solution? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.
Hot Area:

Table type to store promotional data:

45 / 51

NOTE: Each correct selection is worth one point.
Hot Area:

Table type to store retail store data:

46 / 51

You need to implement the surrogate key for the retail store table. The solution must meet the sales transaction dataset requirements. What should you create?

a table that has an IDENTITY property

a system-versioned temporal table

a user-defined SEQUENCE object

a table that has a FOREIGN KEY constraint

47 / 51

HOTSPOT (Drag and Drop is not supported)
You need to design the partitions for the product sales transactions. The solution must meet the sales transaction dataset requirements. What should you include in the solution? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.
Hot Area:

Store product sale transactions data in:

48 / 51

NOTE: Each correct selection is worth one point.
Hot Area:

Partition sale transactions data by:

49 / 51

DRAG DROP (Drag and Drop is not supported)
You need to ensure that the Twitter feed data can be analyzed in the dedicated SQL pool. The solution must meet the customer sentiment analytic requirements.

Which three Transact-SQL DDL commands should you run in sequence? To answer, move the appropriate commands from the list of commands to the answer area and arrange them in the correct order.

NOTE: More than one order of answer choices is correct. You will receive credit for any of the correct orders you select.

Select and Place: # We don't have feature of Drag and Drop, for now you just select all the right options.

CREATE EXTERNAL DATA SOURCE

CREATE EXTERNAL FILE FORMAT

CREATE EXTERNAL TABLE

CREATE EXTERNAL TABLE AS SELECT

CREATE DATABSE SCOPED CREDENTIAL

50 / 51

HOTSPOT (Drag and Drop is not supported)
You need to design a data storage structure for the product sales transactions. The solution must meet the sales transaction dataset requirements.

What should you include in the solution? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.
Hot Area:

When creating the table for sales transactions:

51 / 51

HOTSPOT (Drag and Drop is not supported)
You need to design a data storage structure for the product sales transactions. The solution must meet the sales transaction dataset requirements.

What should you include in the solution? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.
Hot Area:

Table type to store the product sales transactions:

Your score is

The average score is 0%