Downloads

Download Click here to download full source code file

Here, sample_models.json is for your reference

                      
                      {
                        "model": "suppression_model1",

                        "columns": "company name,address1,city,state,zipcode,email",

                        "created_date": "3/23/2021"
                       }

you can also download sample csv files from here csv file and you may have this file in clientfiles folder

Step 1 : Load the Json columns in a variable

                      
                      with open('sample_models.json', 'r') as file:

                      data_json = json.load(file) 

                      jsonColumns = data_json["columns"]   

                      jsonColumns = jsonColumns.split(",")

Step 2: Loop through the CSV files from a specific folder and match with the json column

                      
                      inputFile = r"C:\yourfile\path\clientfiles" 

                      all_files = glob.glob(inputFile + "/*.csv")

Step 3: Clean the column names and make it as lower case

                      
                      for filename in all_files:

                      df = pd.read_csv(filename, index_col=None, header=0)

                      #print(repr(filename[-1]))

                      csvfilename = os.path.basename(filename)

                      df.columns= df.columns.str.lower()

Step 4: Loop through the json columns for each file, If the cleaned column name match with the json column then take that matching column and write to a new curate file

                      
                      df.columns= df.columns.str.replace(' ', '', regex=True).str.replace('.', '_', regex=True) 

                      datacsvfile = list(df.columns) 

    
                      new_df = pd.DataFrame()

                      for jdata in jsonColumns:

                          if jdata in datacsvfile: 

                              new_df = new_df.append(df[jdata]) 

                          else:

                              df[jdata] = '' 

                              new_df = new_df.append(df[jdata])

Step 5: Copy the curate file to another folder named curated

                      
                      new_df.T.to_csv(f'.\\curatedfiles\\' + csvfilename + '', index=False)

Curating 'CSV' files with JSON metadata Using Python

Here, sample_models.json is for your reference

Step 1 : Load the Json columns in a variable

Step 2: Loop through the CSV files from a specific folder and match with the json column

Step 3: Clean the column names and make it as lower case

Step 4: Loop through the json columns for each file, If the cleaned column name match with the json column then take that matching column and write to a new curate file

Step 5: Copy the curate file to another folder named curated