Articles On Testing

Wecome to http://www.articlesontesting.com !!!

Articles On Testing

Wecome to http://www.articlesontesting.com !!!

Articles On Testing

Wecome to http://www.articlesontesting.com !!!

Articles On Testing

Wecome to http://www.articlesontesting.com !!!

Articles On Testing

Wecome to http://www.articlesontesting.com !!!

Showing posts with label bi testing. Show all posts
Showing posts with label bi testing. Show all posts

How to create test data using MS Sql Server

What is test data creation in testing Business Intelligence application ?
What is the relevance of test data creation ?
How is test effectiveness measured in terms of the amount of test data we create to do BI testing ?

There are many other questions which can be religiously and profusely asked by the test enthusiasts, but most of us may not be in a position to answer all of them without touch basing the basic point that no testing can be performed in a BI based application without a proper test data generator being in place. When I say proper , it means a lot specially with so many tools available we still need the ability to code and get this test data preparation done as per our needs.

Manual creation of test data may not be a good way of performing the testing activities in the BI domain. The simple reason accompanying the ideology to this conjecture is application performance .
Performance of an application gets deteriorated when we have huge chunk of data in place and a simple job is run to load data from one source to another one as per the business model's needs.
For example we might need to load some SAP systems data into a simple flat file type data. There can be numerous such occasions when we need to load data from some typical Line of business applications into some separate systems, in this case just a flat file.

So going to the main point how do we create test data ?
Suppose we have  a need to create a thousand record within a excel sheet, we may utilise the excel tips like dragging , or Control + D etc, but to what extent ?

It has always been a known fact that coding is the only way an engineering excellence can be achieved especially in the software world, there is absolutely no other choice if you are really interested to bring the best quality product in place.

Very recently I had come across a small challenge in one of my activities. I had to create a couple of million test data in a specific format to suit the testing of the application under test. Let us have a sample case to make things easy for us to understand.
            I have a simple job which when run, transforms various record sets from some flat file into a table. So basically the source file is my test data in current context, and when I run the job , the records from source file will be loaded onto the table within a particular database. By the way I have been using a term here 'job'.
Has it got anything to do with the daily office going job ? Jokes apart, I know people from Business Intelligence domain are pretty accustomed with the term job but for others I would just like to add some details. So job is basically a sequence of coded steps which can be configured using MS Sql Server/ or many other tools . This combination of steps results in a set of activities as needed by the development architecture.In general for ETL activities is what I shall be discussing this article on.
So just to give some insight on the business requirement front, I have MS Sql server on my machine, on which the job has been configured, I have a source file that will have some record sets in it and the same is placed on some folder within the same machine. And now what I will do is just run the job . After some time job completes with a success message displayed implying that job has been successfully able to extract data from the source file and load the valid data in the table within the database.
That is as simple as moving out from one home into another one, just to make office commuting easy in day to day life.. I hope I did not break your rhythm.

By the way I was just looking at the title of this blog post, "How to create test data using MS Sql Server". And was just thinking if my post till now did any justice to this topic. Yeah of course creating a base on which now we can understand the relevance of test data creation and importance of coding skills in creating huge chunks of data in a matter of seconds.

But will continue some time later , for now feeling sleepy !!!






What is Business Intelligence Testing


What is a BI software application all about ?
What is BI testing all about ?
How does BI application ease out the business process in the current world scenario ?
What type of industry enterprise look forward to developing a BI application ?
What are the challenges one faces while testing a BI application ?
How much of an impact can a BI application make to an organization within its peer group ?

So many questions strike at the back of one's mind when we start with a Business Intelligence domain software application with a testing perspective at the back of one's mind.

What is a BI software application all about ?
 
Let me just list out the terms of BI testing.
Source data, target data, data warehouse, data mart, reports, ETL - Extract Transform Load, etl logic - incremental or full , integration services, reporting services, analysis services, database engine, cubes, measures, measure groups,dimensions, facts, reports testing - drill down, drill through, build deployment. I mean there are just so many of them .

How does the chain look like ? May be something like this :
Source  -> Staging -> Datawarehouse -> Cube -> Reports

Source : Multiple data sources  such as MSSql Server, Oracle, MySql, SAP, Teradata, DB2, Sybase. It can be any combination of the listed as well as non listed ones for example as simple as flat text files. These multiple data sources can be an input data source to an application software that might help the end user take future strategic decisions.

Staging : An intermediary state of source data that act as an input to the datawarehouse where in data is loaded by the ETL logic using integration services from the source data. It is more or less the source data in its original form with those tables getting discarded that might not be needed for the report generation within the application.

Datawarehouse : It is the final entity developed within the database engine that gets created in the sequence mentioned above. It is here that the final objects get created and based on these objects the cubes are created for the analysis purpose. Here the object types that the complete data is organized into is called dimensions and facts. They as well as just like simple tables but the attributes within each dimensions and facts have a specific association within itself to resemble some realistic facts and associated information as per the business requirement.

Cube :  This is the Analysis services objects within the complete  BI application development phase. As the name suggests it is not just the two dimension tables that hold its relevance in a typical RDBMS . It is indeed a three dimensional data modeling technique wherein we analyze a data set in more than just two dimensions. The facts that are associated with an application are analyzed as per the association they have with the various parameters which in BI terms are called as measures, or measure groups. Measure groups are actually a combination of more than one related measure. Thus we get greater insights into how a specific aspect of any business decision making gets impacted with a variation in various parameters.

Reports : Reports are nothing but the cubes data getting represented onto a user friendly interface with the option to parameterise the reports as per the business needs. In simple terms it is the end product that gets developed and the data we look for in the cubes are available for view purpose in them. We can drill down and drill through them based on the scenario we need. For example we can by default see the report for any specific fiscal year as to how much sales have been materialized and then drill down onto the quarter basis , and then monthly basis, then the weekly basis and finally on the daily basis . Similarly drill through also gets applied on to the reports and the data can be visualized as per needs. Authorization and authentication is another feature that has its role to be played in the reporting services but then the authorization of the cubes over ride the privileges granted on the reports.

I hope by now we have gain some insights on what these applications resemble like and how is it different from any other typical web application . Its more of an application that delves into data and data modeling and management techniques.

What is BI testing all about ?
 BI testing is pretty different from any other application in ways more than we can just list out. Domain expertise can only make the tester's life easy in an otherwise unknown nature that such application develops . There is no way one can wait till the reports get developed that we have any UI to start testing with !!! And thats what makes it even more interesting and challenging . 

So what do we do ?
When do we do ?
How do we do ?

Do we wait till the last phase of development cycle where in the reports get deployed onto a web application and then we start testing or is it that we have scopes of starting with our testing activities well within the initial development cycle or may be even before the full flow development activities start ?
There is a scope for automation as well, loads of scope for performance testing and of course the manual has to be the core of any testing activity.
 Things really start once the Integration services packages gets into a shaped state wherein the data from the raw format gets extracted ,transformed and loaded into the new environment.That is what we call as ETL execution as well in short.Test cases authoring can be started once the ETL packages gets developed because that is where a tester's activity gets eased out. This is also the area of high automation ROI wherein the etl related test cases  can be automated using the various database automation techniques.
 Once integration services are in place we target the Analysis services section wherein the  cube testing starts at the base of which exists the dimensions and measures and measure groups. These cubes have the raw data organized in a very effectively designed and inter related manner that the real time data analysis is possible. These cubes form the basis of the reporting service that is the base of the UI testing .
 UI testing or the Reporting services testing is more on the filter and various combination of filter that can be provided with a set of inter - related measures and measure groups for the various facts as per the dimensions supplied. Within the same lies the validation of the drill down and drill through capabilities of the reports by just right click and left clicks in the reports on its graphs and the various axes.The three terms that makes sense during the data validation for the complete application is the dimensions, and fact tables . These are the actual tables that form the basis of the cube data.

 As per the data quality is concerned , the major target area is the Integration services packages that is the ETL packages and the logic with which the data gets extracted , transformed and  loaded into the target systems . It is here that we design the data loading logic that shall ensure that the junk data or data that may in no way be utilized for the reports getting generated do not load into the staging system. It is simply because the data mart is the next stage within the complete BI application development logic and for the data mart to have deal with only relevant data , staging environment must have only needed data with quality in tact. Then stored procedures come into play to arrange the data from the staging tables into the data mart tables and two different categories of data tables get created namely the dimensions and the facts. The data arrangement logic has to be validated in these tables with the major target areas being the stored procedure. These also get created during the   ETL execution phase that is the Integration services phase.Once the mart is finalized the cubes need to be tested wherein the testing involves no more the Sql queries execution but the MDX queries . Here in the measures and measure groups and their data for the various inter-related entities come up.It is done by browsing the cubes within the Analysis service engine. Finally these data becomes the base for showing the relevant reports in the UI as per the business user's requirements.

There we end up with a BI application in place ready to be delivered to the business users. Post production defects are unavoidable in these type of applications. The major cause behind the same is that the application is being developed offshore at a location far away from the end users. This results in most of the cases data not being provided that resembles their real time data due to various data security issues. However, that is how the nature of applications is . So do learn to live with that as a challenge.