Curating 'CSV' files with JSON metadata Using Python

Download Click here to download full source code file

Here, sample_models.json is for your reference

                      
                      {
                        "model": "suppression_model1",
"columns": "company name,address1,city,state,zipcode,email",
"created_date": "3/23/2021" }

you can also download sample csv files from here csv file and you may have this file in clientfiles folder

Step 1 : Load the Json columns in a variable

                      
                      with open('sample_models.json', 'r') as file:
data_json = json.load(file)
jsonColumns = data_json["columns"]
jsonColumns = jsonColumns.split(",")

Step 2: Loop through the CSV files from a specific folder and match with the json column

                      
                      inputFile = r"C:\yourfile\path\clientfiles" 
all_files = glob.glob(inputFile + "/*.csv")

Step 3: Clean the column names and make it as lower case

                      
                      for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
#print(repr(filename[-1]))
csvfilename = os.path.basename(filename)
df.columns= df.columns.str.lower()

Step 4: Loop through the json columns for each file, If the cleaned column name match with the json column then take that matching column and write to a new curate file

                      
                      df.columns= df.columns.str.replace(' ', '', regex=True).str.replace('.', '_', regex=True) 
datacsvfile = list(df.columns)
new_df = pd.DataFrame()
for jdata in jsonColumns:
if jdata in datacsvfile:
new_df = new_df.append(df[jdata])
else:
df[jdata] = ''
new_df = new_df.append(df[jdata])

Step 5: Copy the curate file to another folder named curated

                      
                      new_df.T.to_csv(f'.\\curatedfiles\\' + csvfilename + '', index=False) 
                    
                  

Download Click here to download full source code file