Automation
of Extracting JIRA Issues and Loading to Hive Table Using Python and Shell
Script
Orchestrate End to End flow of Extracting JIRA issues with custom
fields using Python and loading them to Hive using Shell Script.
Requirements:
Python Modules:
jira
pandas
Process:
jira python module extracts using JIRA REST API
, for more information please see (https://pypi.org/project/jira/)
.
Pandas Modules is used to convert the list to data frame and write to
CSV files with specified delimiter which can be used to load to hive table.
CODE:
from jira
import JIRA import pandas as pd #Specify your company base_url for JIRA Board URL=‘https://xxx.jira.com’ #authenticate JIRA using basic
authentication Username and Password jira = JIRA(server=URL,basic_auth=(‘username’,’password’)) #PULL all the aviable
issues in JIRA for the project specified. blockSize=1000 blockNo=0 #initialize the list for adding the
issues jiraIssues=[] while True:
startIDX=blockNo*blockSize
projectAllTkts=jira.search_issues('project=<project_code>',startIDX,blockSize)
if len(projectAllTkts) == 0:
break
blockNo +=1
for tkt in projectAllTkts:
jiraIssues.append(tkt) #to display the count of issues
extracted print(len(jiraIssues) tktsDF=pd.DataFrame() for tkt
in jiraIssues:
#specify
dictionary to extract required fields from JIRA board
fieldsDict={
'PROJECT_CODE': tkt.fields.project.key, #4 CHARS Project Code in
Key
'JIRA_TKT': tkt.key, #4 CHARS PROJECT CODE +
NUMBER
'ISSUE_TYPE': tkt.fields.issuetype.name, #Bug,Story etc
'ASSIGNEE': tkt.fields.assignee,
'CREATOR': tkt.fields.creator,
'CREATION_DATE': tkt.fields.created,
'REPORTER': tkt.fields.reporter,
'SUMMARY': tkt.fields.summary,
'PRIORITY': tkt.fields.priority.name,
'STATUS': tkt.fields.status.name,
'LAST_UPDATED_DATE': tkt.fields.updated
}
tktsDF=tkts.append(fieldsDict,ignore_index=True)
#use Columns to organize in required
order if needed #colORDER=['CREATED_DATE','PROJECT_CODE','JIRA_TKT','ISSUE_TYPE','STATUS','PRIORITY','CREATOR','REPORTER','ASSIGNEE','SUMMARY','LAST_UPDATED_DATE'] #tktsDF=tktsDF.reindex(columns=colORDER) #convert CREATED_DATE and
LAST_UPDATED_DATE to DATE Format tktsDF['CREATED_DATE']=tktsDF['CREATED_DATE'].appy(lambda field:pd.Timestamp(field).strftime('%Y-%m-%d')) tktsDF['LAST_UPDATED_DATE']=tktsDF['LAST_UPDATED_DATE'].appy(lambda field:pd.Timestamp(field).strftime('%Y-%m-%d')) tktDF.to_csv("all_jira_tkts.csv",encoding='utf-8',header=True,index=False,sep='|') |
Shell Wrapper Script to Execute Python script and Load to Hive External
Table.
#!/bin/ksh TODAY=`date +'%Y%m%d'` echo "starting shell script
${0} on ${TODAY}" #Remove the Previous run files rm /output/all_jira_tkts.csv #activate conda
environment export PATH="/app/minconda3/bin:$PATH" source activate py_env37 #start of python script python <python_script.py> if [ $? != 0 ] ; then echo "ALERT: Encountered
Error in Python Script Execution" exit 1 else echo "SUCESS: Python Script Compelted" fi #hive command to drop existing table hive -S -e "DROP TABLE IF EXISTS schema.tbl_name;" echo "Dropped Table " #Remove HDFS all_jira_tkts.csv file
present in previous run" hdfs dfs
-rm -skipTrash hdfs://<cluster>/user/jira/all_jira_tkts.csv #copy created csv file from local to
HDFS chmod 777 /output/all_jira_tkts.csv hdfs dfs
-copyFromLocal /output/all_jira_tkts.csv hdfs://<cluster>/user/jira #verify if file copied sucessfully hadoop fs -test -f hdfs://<cluster>/user/jira/all_jira_tkts.csv if [ $? != 0 ] ; then echo "ALERT: Encountered
Error in previous command csv File Not Present" exit 2 else echo "SUCESS: Copied File
from Local to HDFS " fi #Create External Table by specifying
the schema for all_jira_tkts.csv csv file hive -S -e "CREATE EXTERNAL TABLE
IF NOT EXISTS schema.tbl_name (created_date
date, project_code string,jira_issue string,issue_type
string,status string,priority
string, creator string,summary string,last_updated_date
date) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://<cluster>/user/jira' tblproperties ('skip.header.line.count'='1'); " if [ $? != 0 ] ; then echo "ALERT: Encountered
Error in creating external table" exit 3 else echo "SUCESS: Created
External Table " fi echo "Completed Execution of
Script ${0}" |
Schedule it to run daily twice a day at 8 45 AM and 12 45 AM on
Hadoop Edge Node using Cron Scheduler
crontab -e
45 8,12 * * * /<wrapperScriptwithloc>.sh 2>&1
> /log/$(date +\%Y\%m\%d\%H\%M\%S)_<wrapperScript>.log |
Comments
Post a Comment