Minimizing Number of Steps using Custom SQL in SSDP

Modified on Thu, 28 Mar at 9:26 PM

Scenario:

The organization wants to track sales data for each designation in order to gain insight into how different roles contribute to overall sales performance. This requirement allows the organization to tailor strategies and allocate resources based on the specific needs and contributions of each employee group, resulting in improved sales outcomes across the board.


Requirement:

Our objective is to analyse the overall sales performance of each employee, categorized by their job title. This analysis will focus on the influence of different roles on sales performance, providing guidance for targeted strategies to enhance results.


About the Use case


Let’s analyse the input data that we have:


1. Employees Detail Table: It contains master details of the all the employee of the organization.


Employee Id

Employee Name

Joining Date

Birth Date

Designation

1

Vikas Patel

01-01-2024

02-02-1984

Sales Manager

2

Shyam Mishra

03-01-2024

02-05-1986

Regional Manager

3

Vikram Pandey

05-01-2024

02-05-1990

Department Manager

 

2. Sales Detail Table: Sales transaction data with entire sales details.

 

Expected Output: We need Employee name wise Sales along with their respective Designation


Employee Name

Designation

Sales

Vikas Patel

Sales Manager

8650

Shyam Mishra

Regional Manager

9400

Vikram Pandey

Department Manager

8225

 

Solution:
This use case can be solved by two approaches however the number of steps involved and requirement of data duplication is the main difference between the two approaches.


A.  Without Custom SQL

The following are the logical steps:

1. Sales Detail will be our base dataset and for getting names of the employee we will join Employee master details to it.

2. As we have different id columns for each of the designation along with their sales detail, we will perform join with employee data one by one as follows:

  • We will join Sales manager ID to Employee ID column,
  • Then join Department manager ID to Employee ID column,
  • Lastly join Regional manager ID column to Employee ID.

3. After joining, we will carry out Transformation steps that includes:

  • Removing unnecessary columns.
  • Renaming columns for more clarity.

4. For achieving required output, we will now Unpivot the resultant data in order to list name of sales manager, department manager and regional manager coming in different columns into one single column along with their Sales.

5. Now in order to get their overall sales we will Aggregate the data.

 

Now we will implement same in smarten

Step 1 – Adding Employee Detail Dataset to Sales Detail Dataset.

We will add the Employee details dataset by using a “Add dataset” option which will allow us to then join it with Sales data.

 

 

Step 2 – Joining the Datasets using Blend-Join option
Our initial step would involve joining the Sales details data with the Employee details data using the Join option.


For instance, we will join Sales Manager ID column to Employee ID column. This will give us the name of Sales Manager.



Hence by joining sale manager id column from the sales detail table with the employee id column from the employee detail table, we got the Name of the Sales Manager.
Similarly, In order to get the names of Department and Regional manager we will join Department Manager Id and Regional Manager id one by one with the employee Id.


Step 3 - Removing unwanted columns and renaming the columns 
As we have performed multiple joins in the previous steps we will get some unwanted redundant columns for Employee id, Birth date, Joining date.
We will remove such columns.

Also we need to rename the columns having the names of employee with their associated designation.
For instance, The column EmployeeName have the name of Sales Manager i.e. Vikram Patel.
Similarly EmployeeName_1 have Regional Manager name i.e. Shyam Mishra and EmployeeName_2 have Department Manager name i.e. Vikram Pandey.

 

 




An alternative method for removing columns is that users can directly delete the columns by using the delete option available by right-click operation on the dataset. When a user right-clicks on the column they want to remove, they can delete it.

 


In case the employee names are to be consolidated into a single column, the resulting dataset can be unpivoted to provide an aggregated output consisting of a single record per employee.
Below are the steps to achieve the requirement after performing the above mentioned 3 steps.


Step 4 – Unpivoting the columns


We are getting the names of the employee in different columns. For listing all the designations and its sales in single column for both designation and sales respectively we will unpivot the data.



Here Select the columns we need to unpivot and give names of the column required after unpivoting the data. Then add this group.


For instance, we will unpivot the columns named as Sales Manager, Regional Manager and Department Manager columns and list them in single column say Designation and their names will come in a separate single column say Employee Name.



Below is the resultant data we will get after unpivoting the columns.



Step 5 – Aggregate the resultant Data

We will now aggregate the dataset based on Employee Name and Designation.



As we require the overall Sales done by each employee along with their designation, we will sum up the Sales to achieve our requirement.



Below is the resultant dataset we get after aggregation.



Hence with the above method we achieved our requirement.
But here we faced the major drawbacks are

  • We could not apply multiple joins on single column in one step, we need to apply three different joins.
  • It involved multiple steps to Join and aggregate the data.


B. With Custom SQL


Users can utilize the "Custom SQL" feature in Smarten to directly write and execute SQL queries within the SSDP environment. This provides users with a versatile and effective approach for manipulating and analyzing data. By utilizing this feature, users can streamline data manipulation processes and achieve desired outcomes more efficiently. The "Custom SQL" feature offers a high level of control, intriguing to users who enjoy a hands-on approach to data analysis.


Note: Smarten SSDP operates on the Spark Engine and the Custom Spark SQL is a module for structured data processing that runs on top of Spark Engine, enabling users to process data using SQL scripts.


Below are the steps to use “SQL query feature” of SSDP to achieve the expected output:


1. We will first add Employee details dataset into the Sales details dataset to make it available for SQL query feature

2. Then we will write our custom SQL query by using SQL feature of smarten.

 

Step 1 – Adding Employee Detail Dataset to Sales Detail Dataset.

We will add the Employee details dataset by using a “Add dataset” option which will allow us to then join it with Sales data.



Step 2 – Utilising the “SQL query” feature.

We will write the SQL query using SQL Query option available.



Use the below query in the SQL query window and click on Preview which will allow us to validate the output. 
After it click on OK to get the resultant output onto your dataset.


With the help of below query we will get the will give all the name of employee corresponding to designations in different columns and the Sales of the same.


Also, while joining on same column of employee data multiple times, we will give different alias for every join with the Sales Data.


For instance, here for getting the name for each designation while joining to employee table we have used alias as EM_SM for Sales Manager, for Regional Manager we have used alias as EM_RM and for Department Manager we have used EM_DM.

 



Below is the resultant data we get through this query.


 

In case if our requirement is to get the aggregated data along with the employee names listed in one single column along with their designation mentioned.
Below query will suffice our requirement

 

This query combines sales data for all the designations using the UNION ALL ensuring a unified view without duplicates. It performs LEFT JOINs to link sales details to employee names based on IDs, focusing only on records with valid IDs through WHERE conditions. Each segment aggregates sales by their respective designation using SUM and GROUP BY to calculate and organize totals. The use of LEFT JOIN specifically allows us to include all sales records and associate them with the relevant employee’s name associated with their designation, even if some sales don't have a corresponding designation, those records are excluded by the WHERE condition.



Below is the resultant output.



Conclusion:


By utilizing the previously mentioned approaches, we can address our specific scenario. Furthermore, each approach includes a distinct number of steps. 
For users who are not very tech-savvy, the first approach would be more practical, while the second approach would be better suited for those with technical expertise with good command on SQL. 

 

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article