Articles On Testing

Wecome to http://www.articlesontesting.com !!!

Articles On Testing

Wecome to http://www.articlesontesting.com !!!

Articles On Testing

Wecome to http://www.articlesontesting.com !!!

Articles On Testing

Wecome to http://www.articlesontesting.com !!!

Articles On Testing

Wecome to http://www.articlesontesting.com !!!

Showing posts with label ETL testing. Show all posts
Showing posts with label ETL testing. Show all posts

How to create test data using MS Sql Server

What is test data creation in testing Business Intelligence application ?
What is the relevance of test data creation ?
How is test effectiveness measured in terms of the amount of test data we create to do BI testing ?

There are many other questions which can be religiously and profusely asked by the test enthusiasts, but most of us may not be in a position to answer all of them without touch basing the basic point that no testing can be performed in a BI based application without a proper test data generator being in place. When I say proper , it means a lot specially with so many tools available we still need the ability to code and get this test data preparation done as per our needs.

Manual creation of test data may not be a good way of performing the testing activities in the BI domain. The simple reason accompanying the ideology to this conjecture is application performance .
Performance of an application gets deteriorated when we have huge chunk of data in place and a simple job is run to load data from one source to another one as per the business model's needs.
For example we might need to load some SAP systems data into a simple flat file type data. There can be numerous such occasions when we need to load data from some typical Line of business applications into some separate systems, in this case just a flat file.

So going to the main point how do we create test data ?
Suppose we have  a need to create a thousand record within a excel sheet, we may utilise the excel tips like dragging , or Control + D etc, but to what extent ?

It has always been a known fact that coding is the only way an engineering excellence can be achieved especially in the software world, there is absolutely no other choice if you are really interested to bring the best quality product in place.

Very recently I had come across a small challenge in one of my activities. I had to create a couple of million test data in a specific format to suit the testing of the application under test. Let us have a sample case to make things easy for us to understand.
            I have a simple job which when run, transforms various record sets from some flat file into a table. So basically the source file is my test data in current context, and when I run the job , the records from source file will be loaded onto the table within a particular database. By the way I have been using a term here 'job'.
Has it got anything to do with the daily office going job ? Jokes apart, I know people from Business Intelligence domain are pretty accustomed with the term job but for others I would just like to add some details. So job is basically a sequence of coded steps which can be configured using MS Sql Server/ or many other tools . This combination of steps results in a set of activities as needed by the development architecture.In general for ETL activities is what I shall be discussing this article on.
So just to give some insight on the business requirement front, I have MS Sql server on my machine, on which the job has been configured, I have a source file that will have some record sets in it and the same is placed on some folder within the same machine. And now what I will do is just run the job . After some time job completes with a success message displayed implying that job has been successfully able to extract data from the source file and load the valid data in the table within the database.
That is as simple as moving out from one home into another one, just to make office commuting easy in day to day life.. I hope I did not break your rhythm.

By the way I was just looking at the title of this blog post, "How to create test data using MS Sql Server". And was just thinking if my post till now did any justice to this topic. Yeah of course creating a base on which now we can understand the relevance of test data creation and importance of coding skills in creating huge chunks of data in a matter of seconds.

But will continue some time later , for now feeling sleepy !!!






ETL testing in a Business Intelligence application

What is ETL testing all about ?
                   ETL is the extract , transform and load operation performed on the source data in order to build a strong base for developing an application that does the needful for the decision making teams in large enterprise systems.

Such is the importance of a Business Intelligence(BI) application that at times big enterprise end up in developing a complete application for just a single user to access and analyze the reports available in them. And developing the same itself involves huge effort and cost not only due to huge chunk of data and their testing but also the degree of complexity associated with the logic development for achieving the same . Reports however are not solely dependent on the ETL but several other logically inter-related objects and the access and authorization rules implemented on the cube level as well as the UI level.Someone who is an expert at BI application development might just not be sufficient to develop such an application as the end user's requirement needs to be documented and loads of data analysis on its nature needs to be done to frame a volley of questions for the end user and clarification documents needs to be tracked right through out the project development life cycle. this is because in these systems a bug that gets discovered at a very late stage generally has very high cost associated with it for being fixed.

Lets not get out of context and try to understand the approach and priority of testing just the ETL logic within the application under test. For a Business Intelligence application software developed using the Microsoft technologies we have the Microsoft SQL Server in place . And developing the ETL can be achieved by the  SSIS - Sql Server Integration Services feature available there in.
Using the SSIS feature, the packages can be developed which have the ETL logic in them, based on which the source data is filtered as per the requirement traced out for the application.

Once the ETL packages get developed , the testing of the same becomes an uphill task due to the huge amount of data in the source environment which gets extracted, transformed and loaded into the destination rather a sort of pre -destination environment. Just imagine the verification of each and every record set that was in the source against the target environment. From common principle we might just conclude that the data load is based on the Sql queries which in any case will go fine, but the main target area is the logic verification as to which data needs to be ignored for loading in the target environment. there might be cases wherein we have duplicate records in the source and we might not be able to load both the records just because that will create high level of discrepancy when we browse through the reports in the end product. Running ETL packages gets the data loaded from the source to the staging environment and depending on the context and nature of the application , same gets loaded into the data mart as well based on the transformation logic applied and also the nature of load which may be a full load, that is truncate and load, or a incremental load that is just the additional data gets loaded into the environment.

When we have the uphill task of validating such huge chunks of data we take the help of some database automation tools that helps us to verify each and every record . There are many tools available in the industry  and at times we can ourselves create tools using the excel and ,macro programming but then what I prefer to do is utilize the DB unit test projevct feature available within the Visual Studio IDE. Now the time is to build up the logic that will help us validate each record set in the source against the target. The general approach that is considered to be sufficient to to do the same is in a two staged sql queries verification. One being the count of the source and target data environment must match on applying the filters as has been documented by the client and the second is that the data must match . We genuinely find the option of empty return value as sufficient for doing the same . What we all need to apply is the except keyword between the two query execution logic and the addition of the Test condition from within the added DB unit test file.
Just browse through the some screenshots of the working analogy for doing the same that would definitely make things easy for the SSIS testers.


 



Just hit cancel for the above dialog box. This is actually the database configuration file creation which we can create directly by adding an app.config file which is as under. To get it done we can add a new item into the project by right clicking the project and clicking add new item and then  selecting an application configuration file as shown :




The content of the same will be something as under :
 <?xml version="1.0" encoding="utf-8" ?> <configuration>   <configSections>     <section name="DatabaseUnitTesting" type="Microsoft.Data.Schema.UnitTesting.

Configuration.DatabaseUnitTestingSection, 
Microsoft.Data.Schema.UnitTesting, Version=10.0.0.0, Culture=neutral, 
PublicKeyToken=b03f5f7f11d50a3a" />
  </configSections>
  <DatabaseUnitTesting>
    <DataGeneration ClearDatabase="true" />
    <ExecutionContext Provider="System.Data.SqlClient"  
ConnectionString="Data Source=DB_Server_Name;Initial Catalog=Master;
Integrated Security=True;Pooling=False"
        CommandTimeout="220" />
  </DatabaseUnitTesting>
</configuration>

Once this has been set up , we can go ahead with the logic to validate the same, which we do as under.
Here we have just renamed the method from databasetest to more relevant onw that is Employee_RowCount, similarly, we add another test method by clicking the plus icon to add another method to the same DBUnit class file that is employee to validate the data content as under.




So what is it that I have done in this First level of verifictaion as in above image : Its is simple I have just written the query to fetch the row count and utilized the Except keyword. So now if the count matches of the two query , due to the except query in place we are expecting the "Empty resultset" as the return value of the complete query execution. Hence in the below section of the image you can see, I have removed the default condition that got added and added a new condition namely the empty resultset. We are hence ready with one validation that is on the row count.

Second level of verification is for the data match as well. We add a new method by the upper plus icon and rename it to Employee_DataCheck method name, provide the query for the same and add the except keyword in between the two queries and rest is as was done above to get the empty resultset as the return value for the query execution. This will look as under :
As part of some experience tips we at times have issue with data validation especially for the string datatype attributes. Just check in for the collation conflict that creates such issues. Do provide the collation to sort out those issues.

A third level of verification that adds quality to the testing of the SSIS packages and ETL execution is verification of the schema of the database objects in the source against the destination environment. This provides an added quality as it helps us to verify if the data will be loaded with same precision values or not.
The general query that will fetch the schema details of any database table is as under :

select column_name  collate Latin1_General_CI_AI,
 DATA_TYPE  collate Latin1_General_CI_AI
 , CHARACTER_MAXIMUM_LENGTH,
NUMERIC_PRECISION
, DATETIME_PRECISION
from Source.INFORMATION_SCHEMA.COLUMNS
where TABLE_NAME='Employee'
Except
select column_name  collate Latin1_General_CI_AI,
 DATA_TYPE  collate Latin1_General_CI_AI
 , CHARACTER_MAXIMUM_LENGTH,
NUMERIC_PRECISION
, DATETIME_PRECISION
from Destination.INFORMATION_SCHEMA.COLUMNS
where TABLE_NAME='Employee'
and COLUMN_NAME Not in ('clumn not to be validated especially some ID reltaed that gets auto generated')

The code above validates the column names in the two environment,  do keep in mind that there are certain columns that get auto generated especially in staging environment we have staging id and so on, so do exclude their verification as has been addressed.these columns get verified on their attributes which we provide above namely "datatype", "Size" that is character maximum legth,"numeric precision","datetime precision". We could have easily validated another aspect such as IsNull. But the general approach in Business Intellignece(BI) is that if we have some column to validate the ID of the complete record set we tend to oignore the IsNull feature and the same gets reflected in the code above.


Thus we have successfully automated the ETL testing using three levels of verification namely rowcount, datacheck and schemacheck. This provides the team with more level of confidence on the quality of testing of the ETL packages as data verification has been done at a rigorous level rather a sampling manner.

So we have explored the ETL testing and ways to achieve high degree of quality for the same..