Test automation using Pytest in VSCode

 While test automation as such has nothing new in its own basics, but how we can drive the same using the pytest is something we will have a look at.

IDE suited for automation test creation and maintenance is VSCode, and do ensure the right stuffs are installed on host machine, typical pytest test automation project needs

python 

pytest

Above utilities need to be installed and can be easily done via the VSCode IDE.

Just excute the pip command to install all packages and dont forget to add the pip path as environment variable on the host machine.

Sample command are as under

    Pip install pytest

Few other re-usable frameworks to make things easy and optimally utilized. We can talk about them as and when we hit the need to do so.

And a very quick and easy way to get started is just download a sample project from online and get started. Do not delve too much into how to do without getting hand dirty. As all test automation have different set of needs and you need to pay attention to right details in right ratio to get things done.

Sample code to connect to azure databricks and fetch the clusters , jobs etc details is pretty simple.

 import databricks_client

client = databricks_client.create("https://databricksinstance.azuredatabricks.net/api/2.0")
client.auth_pat_token("dapitoken")
client.ensure_available()
clusters_list = client.get('clusters/list')
for cluster in clusters_list["clusters"]:
print(cluster)
jobs_list = client.get('jobs/list')
for jobs in jobs_list["jobs"]:
  print(jobs)

If you want to run any job which has been pre-configured as part of pre-run of your test automation scripts
below code will help you
from databricks_api import DatabricksAPI

 db = DatabricksAPI(

 host="adb-instance",

 token="dapitoken"

)

 run_id = db.jobs.run_now(

 job_id=job-id of the job which has been created and attached to a cluster . This jobs should be having a notebook hooked onto it.,

)

  print(run_id)

The above is involving multiple packages that need to be installed on host and can be done as explained above.

Sample code to connect to azure databricks and fetch the database tables and other objects details is pretty simple. This can aso be done by configuring simba odbc driver instance on the host machine and pointing the automation suite's connection string to that dsn.

from databricks import sql
    with sql.connect(server_hostname="databricksinstance.azuredatabricks.net",
                    http_path="sql/protocolv1/.......",
                    access_token="dapitoken") as connection:
       with connection.cursor() as cursor:
                     cursor.execute("select * from Deltadb.Deltatable LIMIT 2")
                     result = cursor.fetchall()
                     for row in result:
                       print(row)

This can aso be done by configuring simba odbc driver instance on the host machine and pointing the automation suite's connection string to that dsn.

connection = pyodbc.connect("DSN=AzureDatabricks_DSN", autocommit=True)

Do remember to configure the above DSN via the odbc driver , by installing simba driver and configuring same details like adb details, the http path and providing dapi token.

How to do validation on the delta tables of azure databricks is a tricky questions. Since we want it to be done un-attended and seamless in nature.

We need to implement a hybrid of unit and pytest frameworks to achieve this, and few sample code below shall be helpful to attain the same. We cannot go by the basics to achieve it, since basics knowledge is readily available elsewhere.

So basic question which arises is how to achieve low maintainable automation suites.

We achieve this by implementing data driven nature of the sql queries, we keep the sql queries in csv file and invoke them via ddt into testmethods. So what we will be able to acheive is on the relevant tables all set of validation we keep it outside the testmethod in a csv  file and data drive the testmethod execution via the framework of ddt.

A sample code below shall be helpful to understand the flow

import pytest
import sys
import string
from fwk_code.functions import *
from fwk_code.constant import *
from ddt import ddt, data, file_data, unpack
import unittest

Below code excerpt is sort of intialization script to instantiate connection to the azure databricks
and close the same once teardown happens at the end of execution of the very class file in which the
same is maintained.

Connection is sort of connection string to connect to the azure databricks delta tables and can be
obtained from the azure databricks portal.

 from databricks import sql
    with sql.connect(server_hostname="azuredatabricks instance url",
                    http_path="sql http path",
                    access_token="dapitoken") as connection:


cursor = None
def setup_module():
       print('----------setup-------------')
       global cursor
       cursor = connection.cursor()
       return cursor
       
def teardown_module():
       print('----------teardown-------------')
       global cursor
       cursor = connection.close()

In the code below entire query content can be put in the called csv file so as to make the

testmethod independent of the calls and same testmethod cna run for all validations on this table

@ddt

class test_validate_null_values_Samplefact_table(unittest.TestCase):
    @pytest.mark.dataquality
    @data(*Utils.read_data_from_csv(""+rootpath+"datatest\\Samplefact.csv"))
    @unpack
    def test_validate_null_values_Samplefact_table(self, attribtotest):
        query = ("select attribtotest,column2, count(*) from " + dbinstance + ".tablename "+
                    "where Samplefact= "+Samplefact+" "+
                    "and column2 in('Sample1','sample2') "+
                    "and value_predict is null "+
                    "group by Column2,attribtotest")
        rows = reusable_methodgetcolumns(cursor,query)
        if rows:
            print("Predict Count of value rows having null Samplefact table are: ", rows)
            assert False                
        else:
            assert True

Below code is that re-usbale code to fetch stuffs from azure databricks delta table

def reusable_methodgetcolumns(cursor,query):
    cursor.execute(query)
    result= cursor.fetchall()
    return(result)

The last thing to take care is to make the run unattended. And we can easily drive it via Windows Scheduler. So create a task in scheduler and the question is how do we point the scheduler task to this automation project.

The solution is pretty simple. Create a batch file and put in the right commandlet in there to ensure tha scheduler task m akes the automation run absolute smooth and unattended. Just open up report to do result reconciliation alone and share with stakeholders.

Sample content for batch file is 

set rootpath=C:\repos\\AutomationTestProject\

set todaysdate=%date:~-4,4%%date:~-7,2%%date:~-10,2%

cd /d C:

cd %rootpath%reports

mkdir %todaysdate%

cd %rootpath%reports\%todaysdate%

pytest -v -m five --html=report_Module1.html %rootpath%tests\test_Module1.py

pytest -v -m eight --html=report_Module2.html %rootpath%tests\test_Module2.py

pytest -v -m nine --html=report_Module3.html %rootpath%tests\test_Module3.py


The output of this scheduler based task execution of test automation project is html reports which outlines details on pass / fail etc with detail logging on what activities were performed as part of test execution.



Post a Comment

Previous Post Next Post