Automation of Extracting JIRA Issues and Loading to Hive Table Using Python and Shell Script

Orchestrate End to End flow of Extracting JIRA issues with custom fields using Python and loading them to Hive using Shell Script.

Requirements:

Python Modules:

jira

pandas

Process:

jira python module extracts using JIRA REST API , for more information please see (https://pypi.org/project/jira/) .

Pandas Modules is used to convert the list to data frame and write to CSV files with specified delimiter which can be used to load to hive table.

CODE:

from jira import JIRA

import pandas as pd

#Specify your company base_url for JIRA Board

URL=‘https://xxx.jira.com’

#authenticate JIRA using basic authentication Username and Password

jira = JIRA(server=URL,basic_auth=(‘username’,’password’))

#PULL all the aviable issues in JIRA for the project specified.

blockSize=1000

blockNo=0

#initialize the list for adding the issues

jiraIssues=[]

while True:

startIDX=blockNo*blockSize

projectAllTkts=jira.search_issues('project=<project_code>',startIDX,blockSize)

if len(projectAllTkts) == 0:

break

blockNo +=1

for tkt in projectAllTkts:

jiraIssues.append(tkt)

#to display the count of issues extracted

print(len(jiraIssues)

tktsDF=pd.DataFrame()

for tkt in jiraIssues:

#specify dictionary to extract required fields from JIRA board

fieldsDict={

'PROJECT_CODE': tkt.fields.project.key, #4 CHARS Project Code in Key

'JIRA_TKT': tkt.key, #4 CHARS PROJECT CODE + NUMBER

'ISSUE_TYPE': tkt.fields.issuetype.name, #Bug,Story etc

'ASSIGNEE': tkt.fields.assignee,

'CREATOR': tkt.fields.creator,

'CREATION_DATE': tkt.fields.created,

'REPORTER': tkt.fields.reporter,

'SUMMARY': tkt.fields.summary,

'PRIORITY': tkt.fields.priority.name,

'STATUS': tkt.fields.status.name,

'LAST_UPDATED_DATE': tkt.fields.updated

}

tktsDF=tkts.append(fieldsDict,ignore_index=True)

#use Columns to organize in required order if needed

#colORDER=['CREATED_DATE','PROJECT_CODE','JIRA_TKT','ISSUE_TYPE','STATUS','PRIORITY','CREATOR','REPORTER','ASSIGNEE','SUMMARY','LAST_UPDATED_DATE']

#tktsDF=tktsDF.reindex(columns=colORDER)

#convert CREATED_DATE and LAST_UPDATED_DATE to DATE Format

tktsDF['CREATED_DATE']=tktsDF['CREATED_DATE'].appy(lambda field:pd.Timestamp(field).strftime('%Y-%m-%d'))

tktsDF['LAST_UPDATED_DATE']=tktsDF['LAST_UPDATED_DATE'].appy(lambda field:pd.Timestamp(field).strftime('%Y-%m-%d'))

tktDF.to_csv("all_jira_tkts.csv",encoding='utf-8',header=True,index=False,sep='|')

Shell Wrapper Script to Execute Python script and Load to Hive External Table.

#!/bin/ksh

TODAY=`date +'%Y%m%d'`

echo "starting shell script ${0} on ${TODAY}"

#Remove the Previous run files

rm /output/all_jira_tkts.csv

#activate conda environment

export PATH="/app/minconda3/bin:$PATH"

source activate py_env37

#start of python script

python <python_script.py>

if [ $? != 0 ] ; then

echo "ALERT: Encountered Error in Python Script Execution"

exit 1

else

echo "SUCESS: Python Script Compelted"

#hive command to drop existing table

hive -S -e "DROP TABLE IF EXISTS schema.tbl_name;"

echo "Dropped Table "

#Remove HDFS all_jira_tkts.csv file present in previous run"

hdfs dfs -rm -skipTrash hdfs://<cluster>/user/jira/all_jira_tkts.csv

#copy created csv file from local to HDFS

chmod 777 /output/all_jira_tkts.csv

hdfs dfs -copyFromLocal /output/all_jira_tkts.csv hdfs://<cluster>/user/jira

#verify if file copied sucessfully

hadoop fs -test -f hdfs://<cluster>/user/jira/all_jira_tkts.csv

if [ $? != 0 ] ; then

echo "ALERT: Encountered Error in previous command csv File Not Present"

exit 2

else

echo "SUCESS: Copied File from Local to HDFS "

#Create External Table by specifying the schema for all_jira_tkts.csv csv file

hive -S -e "CREATE EXTERNAL TABLE IF NOT EXISTS schema.tbl_name (created_date date,

project_code string,jira_issue string,issue_type string,status string,priority string,

creator string,summary string,last_updated_date date)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '|'

STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION 'hdfs://<cluster>/user/jira'

tblproperties ('skip.header.line.count'='1');

if [ $? != 0 ] ; then

echo "ALERT: Encountered Error in creating external table"

exit 3

else

echo "SUCESS: Created External Table "

echo "Completed Execution of Script ${0}"

Schedule it to run daily twice a day at 8 45 AM and 12 45 AM on Hadoop Edge Node using Cron Scheduler

crontab -e

45 8,12 * * * /<wrapperScriptwithloc>.sh 2>&1 > /log/$(date +\%Y\%m\%d\%H\%M\%S)_<wrapperScript>.log

A Journey to Data Architect ..

Search This Blog

Automation of Extracting JIRA Issues and Loading to Hive Table Using Python and Shell Script

Automation of Extracting JIRA Issues and Loading to Hive Table Using Python and Shell Script

Comments

Post a Comment

Popular posts from this blog

LookML

Data Warehouse 101 - Part 2

A Deep Dive Into Google BigQuery Architecture