Mock 1 | DP 203 | Microsoft Certified: Azure Data Engineer Associate
Welcome to Microsoft Certified: Azure Data Engineer Associate | DP 203 mock exam. We are bound to provide you the best quality exam mocks absolutely free.
1 / 51
You plan to implement an Azure Data Lake Storage Gen2 container that will contain CSV files. The size of the files will vary based on the number of events that occur per hour.File sizes range from 4 KB to 5 GB.You need to ensure that the files stored in the container are optimized for batch processing. What should you do?
2 / 51
You are planning a solution to aggregate streaming data that originates in Apache Kafka and is output to Azure Data Lake Storage Gen2. The developers who will implement the stream processing solution use Java. Which service should you recommend using to process the streaming data?
3 / 51
You have an Azure Synapse Analytics workspace named WS1 that contains an Apache Spark pool named Pool1.You plan to create a database named DB1 in Pool1.You need to ensure that when tables are created in DB1, the tables are available automatically as external tables to the built-in serverless SQL pool.Which format should you use for the tables in DB1?
4 / 51
You build a data warehouse in an Azure Synapse Analytics dedicated SQL pool.Analysts write a complex SELECT query that contains multiple JOIN and CASE statements to transform data for use in inventory reports. The inventory reports will use the data and additional WHERE parameters depending on the report. The reports will be produced once daily.You need to implement a solution to make the dataset available for the reports. The solution must minimize query times.What should you implement?
5 / 51
You have an Azure Storage account that contains 100 GB of files. The files contain rows of text and numerical values. 75% of the rows containdescription data that has an average length of 1.1 MB.
You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse Analytics. You need to prepare the files to ensure that the data copies quickly.Solution: You modify the files to ensure that each row is more than 1 MB. Does this meet the goal?
6 / 51
You have an Azure Storage account that contains 100 GB of files. The files contain rows of text and numerical values. 75% of the rows contain description data that has an average length of 1.1 MB.
You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse Analytics. You need to prepare the files to ensure that the data copies quickly.Solution: You copy the files to a table that has a columnstore index. Does this meet the goal?
7 / 51
You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse Analytics. You need to prepare the files to ensure that the data copies quickly.Solution: You convert the files to compressed delimited text files. Does this meet the goal?
8 / 51
From a website analytics system, you receive data extracts about user interactions such as downloads, link clicks, form submissions, and video plays. The data contains the following columns.
You need to design a star schema to support analytical queries of the data. The star schema will contain four tables including a date dimension. To which table should you add each column? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.Hot Area:
TotalEvents:
9 / 51
ChannelGrouping:
10 / 51
EventCategory:
11 / 51
You are designing a fact table named FactPurchase in an Azure Synapse Analytics dedicated SQL pool. The table contains purchases from suppliers for a retail store. FactPurchase will contain the following columns
FactPurchase will have 1 million rows of data added daily and will contain three years of data. Transact-SQL queries similar to the following query will be executed daily.
SELECTSupplierKey, StockItemKey, IsOrderFinalized, COUNT(*)FROM FactPurchaseWHERE DateKey >= 20210101AND DateKey <= 20210131GROUP By SupplierKey, StockItemKey, IsOrderFinalized
Which table distribution will minimize query times?
12 / 51
You have a SQL pool in Azure Synapse.
You plan to load data from Azure Blob storage to a staging table. Approximately 1 million rows of data will be loaded daily. The table will be truncated before each daily load.
You need to create the staging table. The solution must minimize how long it takes to load the data to the staging table. How should you configure the table? To answer, select the appropriate options in the answer area.
Distribution:
13 / 51
Indexing:
14 / 51
15 / 51
You plan to implement an Azure Data Lake Gen 2 storage account.
You need to ensure that the data lake will remain available if a data center fails in the primary Azure region. The solution must minimize costs. Which type of replication should you use for the storage account?
16 / 51
You have an Azure Data Lake Storage Gen2 container that contains 100 TB of data.
You need to ensure that the data in the container is available for read workloads in a secondary region if an outage occurs in the primary region. The solution must minimize costs.
Which type of data redundancy should you use?
17 / 51
You have two Azure Storage accounts named Storage1 and Storage2. Each account holds one container and has the hierarchical namespace enabled. The system has files that contain data stored in the Apache Parquet format.
You need to copy folders and files from Storage1 to Storage2 by using a Data Factory copy activity. The solution must meet the following requirements:
-No transformations must be performed.-The original folder structure must be retained.-Minimize time required to perform the copy activity.
How should you configure the copy activity? To answer, select the appropriate options in the answer area.
Copy activity copy behavior:
18 / 51
Source dataset type:
19 / 51
You have an Azure Synapse Analytics dedicated SQL pool that contains the users shown in the following table.
User1 executes a query on the database, and the query returns the results shown in the following exhibit.
User1 is the only user who has access to the unmasked data.
Use the drop-down menus to select the answer choice that completes each statement based on the information presented in the graphic.
When User1 queries the BirthDate column. the values returned will be:Â
20 / 51
When User2 queries the YearlyIncome column, the values returned will be:Â
21 / 51
You have an enterprise-wide Azure Data Lake Storage Gen2 account. The data lake is accessible only through an Azure virtual network named VNET1.
You are building a SQL pool in Azure Synapse that will use data from the data lake.Your company has a sales team. All the members of the sales team are in an Azure Active Directory group named Sales. POSIX controls are used to assign the Sales group access to the files in the data lake.
You plan to load data to the SQL pool every hour.You need to ensure that the SQL pool can load the sales data from the data lake.
Which three actions should you perform? Each correct answer presents part of the solution.NOTE: Each area selection is worth one point.
22 / 51
You need to design an Azure Synapse Analytics dedicated SQL pool that meets the following requirements:
-Can return an employee record from a given point in time.-Maintains the latest employee information.-Minimizes query complexity.
How should you model the employee data?
23 / 51
You need to create a partitioned table in an Azure Synapse Analytics dedicated SQL pool.
How should you complete the Transact-SQL statement? To answer, drag the appropriate values to the correct targets. Each value may be used once,more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.
NOTE: Each correct selection is worth one point.Select and Place:
Select two most appropriate answers. In real exam you may get drag and drop option.
24 / 51
HOTSPOT (Drag and Drop is not supported)You have an Azure Data Lake Storage Gen2 container.
Data is ingested into the container, and then transformed by a data integration application. The data is NOT modified after that. Users can read files in the container but cannot modify the files.
You need to design a data archiving solution that meets the following requirements:
-New data is accessed frequently and must be available as quickly as possible.-Data that is older than five years is accessed infrequently but must be available within one second when requested.-Data that is older than seven years is NOT accessed. After seven years, the data must be persisted at the lowest cost possible.-Costs must be minimized while maintaining the required availability.
How should you manage the data? To answer, select the appropriate options in the answer area.NOTE: Each correct selection is worth one pointHot Area:
Seven-year old data:
25 / 51
Â
Five-year old data:
26 / 51
You have a data model that you plan to implement in a data warehouse in Azure Synapse Analytics as shown in the following exhibit.
All the dimension tables will be less than 2 GB after compression, and the fact table will be approximately 6 TB. The dimension tables will be relatively static with very few data inserts and updates.
Which type of table should you use for each table? To answer, select the appropriate options in the answer area.
Fact_DailyBookins:
27 / 51
Dim_Time:
28 / 51
Dim_Employee:
29 / 51
Dim_Customer:
30 / 51
HOTSPOT (Drag and Drop is not supported)You use Azure Data Factory to prepare data to be queried by Azure Synapse Analytics serverless SQL pools.
Files are initially ingested into an Azure Data Lake Storage Gen2 account as 10 small JSON files. Each file contains the same data attributes and data from a subsidiary of your company.
You need to move the files to a different folder and transform the data to meet the following requirements:-Provide the fastest possible query times.-Automatically infer the schema from the underlying files.
How should you configure the Data Factory copy activity? To answer, select the appropriate options in the answer area.
Sink file type:Â
31 / 51
 Copy behavior:
32 / 51
HOTSPOT (Drag and Drop is not supported)You need to output files from Azure Data Factory.
Which file format should you use for each type of output? To answer, select the appropriate options in the answer area.
JSON with a timestamp:
33 / 51
Columnar format:
34 / 51
You are designing the folder structure for an Azure Data Lake Storage Gen2 container.
Users will query data by using a variety of services including Azure Databricks and Azure Synapse Analytics serverless SQL pools. The data will be secured by subject area. Most queries will include data from the current year or current month.
Which folder structure should you recommend to support fast queries and simplified folder security?
35 / 51
HOTSPOT (Drag and Drop is not supported)You are planning the deployment of Azure Data Lake Storage Gen2. You have the following two reports that will access the data lake:
-Report1: Reads three columns from a file that contains 50 columns.-Report2: Queries a single record based on a timestamp.
You need to recommend in which format to store the data in the data lake to support the reports. The solution must minimize read times. What should you recommend for each report? To answer, select the appropriate options in the answer area.
REPORT2:
36 / 51
REPORT1:
37 / 51
You have files and folders in Azure Data Lake Storage Gen2 for an Azure Synapse workspace as shown in the following exhibit.
You create an external table named ExtTable that has LOCATION='/topfolder/'.When you query ExtTable by using an Azure Synapse Analytics serverless SQL pool, which files are returned?
38 / 51
DRAG DROP (Drag and Drop is not supported)You have a table named SalesFact in an enterprise data warehouse in Azure Synapse Analytics. SalesFact contains sales data from the past 36 months and has the following characteristics:
-Is partitioned by month-Contains one billion rows-Has clustered columnstore indexes
At the beginning of each month, you need to remove data from SalesFact that is older than 36 months as quickly as possible.
Which three actions should you perform in sequence in a stored procedure? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.
Select and Place:
39 / 51
You have an Azure Synapse workspace named MyWorkspace that contains an Apache Spark database named mytestdb. You run the following command in an Azure Synapse Analytics Spark pool in MyWorkspace.
CREATE TABLE mytestdb.myParquetTable(EmployeeID int,EmployeeName string,EmployeeStartDate date)USING Parquet
You then use Spark to insert a row into mytestdb.myParquetTable. The row contains the following data.
One minute later, you execute the following query from a serverless SQL pool in MyWorkspace.SELECT EmployeeIDFROM mytestdb.dbo.myParquetTableWHERE name = 'Alice';What will be returned by the query?
40 / 51
You have a table in an Azure Synapse Analytics dedicated SQL pool. The table was created by using the following Transact-SQL statement.
You need to alter the table to meet the following requirements:
-Ensure that users can identify the current manager of employees.-Support creating an employee reporting hierarchy for your entire company.-Provide fast lookup of the managers’ attributes such as name and job title.
Which column should you add to the table?
41 / 51
You need to design a data retention solution for the Twitter feed data records. The solution must meet the customer sentiment analytics requirements.Which Azure Storage functionality should you include in the solution?
42 / 51
HOTSPOT (Drag and Drop is not supported)You need to implement an Azure Synapse Analytics database object for storing the sales transactions data. The solution must meet the sales transaction dataset requirements.
What should you do? To answer, select the appropriate options in the answer area.NOTE: Each correct selection is worth one point.Hot Area:
Partitioning option to use in the WITH clause of the DDL statement:Â
43 / 51
Transact-SQL DDL command to use:
44 / 51
HOTSPOT (Drag and Drop is not supported)You need to design an analytical storage solution for the transactional data. The solution must meet the sales transaction dataset requirements. What should you include in the solution? To answer, select the appropriate options in the answer area.
Table type to store promotional data:
45 / 51
Table type to store retail store data:Â
46 / 51
You need to implement the surrogate key for the retail store table. The solution must meet the sales transaction dataset requirements. What should you create?
47 / 51
HOTSPOT (Drag and Drop is not supported)You need to design the partitions for the product sales transactions. The solution must meet the sales transaction dataset requirements. What should you include in the solution? To answer, select the appropriate options in the answer area.
Store product sale transactions data in:
48 / 51
Partition sale transactions data by:
49 / 51
DRAG DROP (Drag and Drop is not supported)You need to ensure that the Twitter feed data can be analyzed in the dedicated SQL pool. The solution must meet the customer sentiment analytic requirements.
Which three Transact-SQL DDL commands should you run in sequence? To answer, move the appropriate commands from the list of commands to the answer area and arrange them in the correct order.
NOTE: More than one order of answer choices is correct. You will receive credit for any of the correct orders you select.
Select and Place:Â # We don't have feature of Drag and Drop, for now you just select all the right options.
50 / 51
HOTSPOT (Drag and Drop is not supported)You need to design a data storage structure for the product sales transactions. The solution must meet the sales transaction dataset requirements.
What should you include in the solution? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.Hot Area:
When creating the table for sales transactions:
51 / 51
Table type to store the product sales transactions:
Your score is
The average score is 0%
Restart quiz