Uncle Pete’s Corner – Generic File Import Function

unclepetecorner
We are switching gears this week and looking at a WinDev function. In a recent project there was a need to support importing data for several different tables. Instead of handling each of these as a separate window, I created a Window that handles the import for any of our tables. This provides a consistent interface, and there is just a few lines of code to add to the window initialization to support a new table. It has a number of features as you will see in this article, so please read on.
Before we begin, I would like to give a nod to Bob Roos, it showed some us some things he was doing with imports for one of his projects which inspired me to add the ability to set the column order for my imports.

Let’s start with a brief overview of the features and a few screen shots then we will dive into the code.

  1. The window has one parameter (TableName) so that we can call it from anywhere in the application and just pass in which table we are wanting to import.
  2. The end user can set configure several settings for each table, such as:
    1. The location of the file to import
    2. If the file includes a header row
    3. If the existing data should be cleared before importing
    4. What separator is used for the file (Comma, Pipe, or Tab)
    5. Which field is used to match existing data for updates instead of additions
    6. And the order of the fields in the import file, including setting a field position to 0 meaning the field isn’t included in the import.
  3. It uses the NextAge Opensource Default Manager class to store the configuration settings that a user makes for each table
  4. There is a test import button that, imports the first 10 rows of the table into a table control so the user can verify their configuration settings
  5. And finally it has a nice progress screen as the import takes place.

Here are some screen shots
2014-01-16_0817
2014-01-16_0818
2014-01-16_0819

So now that we have a feature list and some screen shots, on to the code. First a confession to make, this procedure is not 100% generic. With a few tweak it could be, but for our purposes we decided that it wasn’t needed. This means that there is a few lines of code that has to be added to this function for each database table you intend to import. So lets look at those lines of code first.

2014-01-16_0939

This code is in the Global Declarations of the Window and as I said is the only code that gets adjusted for each new table. It is just a giant SWITCH statement, that looks at the table name passed in. And then setups up the defaults for that table. For instance if Product is passed in, it sets the Title Bar to “Import Products” and then adds the tables fields to an array. We will look at the array definition shortly, but first lets look at a couple of things in this code.

Notice the CASE statements is using Product..Name instead of a hard coded name string value. By using the Analysis properties instead of hard coded strings, the code is a little more generic and fault tolerant of typos, as intellisense can assist.  Similarly, when setting the FieldName, again I am using the name property for each field instead of the hard coding their names. If you have followed my webinars or use my open source classes you will notice I use this trick a lot, its just an easy way to avoid unnecessary errors in my code.

So even before we look at the code, you can get a pretty good idea about what is happening here. We setup an array with each field from the database table, what position it should be in the CSV file, and a boolean value called MatchForUpdates. We will look at the code for MatchForUpdates a little later, but its purpose is to say if a field should be used to determine if a record already exists in the table and update it instead of adding a new field.

Before moving on to looking at the rest of the code of the procedure a word about making this function 100% generic. There are several ways you could approach it, this code could be placed in a database field or a configuration text file and then executed dynamically. You could create configuration tables and store these values in them, and ship that file with your application. Or you could even go as far as to enumerate the Analysis and pull the information directly from the Analysis for a table and build the array based on that information. If you do decide to enumerate the analysis I suggest you get you hands on Glenn Rathke of Soft Design Consulting’s excellent training materials as he has example of that type of code. Or if you really need it generic and don’t have time to work on it yourself, remember I am always available for contract work 🙂

So now that we have looked at the only code that has to be changed in the procedure lets look at the rest of the code.

As mentioned the Window accepts one Parameter, the name of the Table to import, which as we just saw is used in the SWITCH statement.

2014-01-16_1003

And while we are on the topic, here is the code that calls the window from the Product browse. It simply passes in the name of the table, again using the properties to avoid typos, and when it returns it reinitialized the table control so we see the changes made by the import.

2014-01-16_1004

Next we declare the structure that is used for the array that we just looked at in the SWITCH statement. If I haven’t said it lately, I LOVE structured arrays!

2014-01-16_1007

Notice there is one field in the array that we didn’t discuss (ConversionCode). This is another stub for future feature additions. Often when importing files there is some clean up that needs to be done to fields, such as striping the formatting from a phone number etc. My plan would be to store this code in the array, exposing it to the interface which would allow the code to be adjusted as necessary, and then dynamically executing it as part of the import logic. If you have looked at my Open Source Business Rules class you have seen something very similar. Again I haven’t worked on this yet, just waiting on the need or a paying client to bring it to the top of the to do list.

Next we declare another structure, this one is used with the Default manager class in order to save all the configuration settings the user makes. We will look at that code in detail a little later.

2014-01-16_1015

Next we create a structure and array to hold the separator options. Tons of other ways this could be done, but I like doing it this way, as the combo on the screen can be configured to display the descriptive name, but return the actual separator value, and there is no additional coding in other areas to do a “translation” between the description and the actual separator value.

2014-01-16_1017

Next is a structure that we use to declare an array to hold the imported rows. I have declared it with 50 fields. There is probably a way to do this in a more generic way, but for my purposes this works and lets me link the sample import display to the array as we will see later. If I run into a need to import a table with more than 50 fields, I will first have a discussion with the client about proper database design :-), but failing that it would be very easy to just expand the number of fields in this structure.

2014-01-16_1032

Next we declare the arrays of the above structures, and a couple of other fields that we will see used in the code.

2014-01-16_1040

And the final bit of code in the Global Declarations is the SWITCH statement we already looked at.

So before moving on lets review what has happened at this point when the procedure runs. The name of the table to import will be passed in, the Field list will be populated via the SWITCH statment based on the table name, and several additional structures, variables, and arrays are declared.

So now let’s look at the code in the End of Initialization event.

2014-01-16_1045

The first 5 lines of code setup variables in the Prompts structured variable we declared. You should not that one of those fields is actually the fields array. This is where structures and structured arrays really start shining. What the Prompts structured variable has done is group the field array and all of the values for the other screen prompts into a single variable.

The next line of code uses the Default Manager class to retrieve the Prompts settings. You may be asking, wait we just set the prompts, and now we are retrieving them from the Default Manager. Well what we are doing is setting the beginning defaults, then the GetDefault method overwrites the defaults with whatever has been stored in it, if the Default doesn’t currently exists it gets created with the beginning defaults we just set.

The next four lines of code move the values from the Prompts structured variable to the screen fields.

The IF statement handles a situation where we have added a field to the database definition. For instance say our table original had 3 fields, that is what would be stored in the defaults, if we later add a 4 field to the database and the switch statement, we would never see it as the default would always be retrieved instead. So this IF statement detects that the number of fields has changed and uses the defaults from the SWITCH statement instead. Eventual these settings will be saved, and the next pass through the import it will again behave as normal and use the defaults from the default manger.

And finally we reinitialize the table which shows the field list.

So now let’s look at the code behind the buttons. The easiest one is the Restore Default button. This code just resets the screen the original defaults. Its very similar to the code in the End of Initialization event.

2014-01-16_1219

The code behind the Test and Import buttons calls the same procedure with a parameter to indicate if it is test mode or not.

2014-01-16_1319    2014-01-16_1320

So lets look at the DoImport Procedure. The first lines of code declare several variable that will be used. Line 11 sets the static on the Progress tab to “Reading Data”

2014-01-16_1323

Line 13 and 14, sorts the Field array and determines the number of columns in the import file.

2014-01-16_1325

Next we check the fields array to see if there is a field set as MatchForUpdates

2014-01-16_1326

Next the Progress tab is displayed, the sample tab is hidden, and the Progress tab is selected.

2014-01-16_1327

Next the CSV file is opened.

2014-01-16_1329

If opening the file fails, for instance if it doesn’t exists or is locked, then an error is displayed to the user, otherwise the “real” import logic starts after the else statement.

2014-01-16_1332

The size of the file is captured (this will be used by the progress display). And then if there is a header row it is skipped. And then the progress display is updated by calling the UpdateProgress procedure with the number of bytes from this row of data

2014-01-16_1336

Looking at the UpdateProgess procedure it simply uses math to determine the percentage of the file that has been read and updates the progress bar, which in this case is a dial.

2014-01-16_1336_001

Back to the DoImport procedure, a loop is start to import the CSV file into the ImportedRows array.

2014-01-16_1340

Let’s break that code down a little. Line 39 starts with FieldNumber 1. Line 40 uses VariableReset to clear the Structured Variable ImportedRow. Line 41 Reads a row of data from the CSV file. And Line 42 updates the progress display just like we saw above. Line 43 increase a counter that tracks how many rows are imported.  And Line 44 breaks out of the loop when we read the end of the CSV file.

Line 47 starts an interior secondary loop. This is the loop that processes each field of the row. The IF statement on line 49 makes the call to GetField with the “firstrank” constant instead of the “nextrank” constant if it is the first field processed. Lines 50 and 52  use the GetField procedure to set the OneField variable to value in current field in the CSV.

Now lets take a look at the GetField Procedure.

2014-01-16_1352

This code is a little hard to read but its where the real magic of parsing the CSV file takes place. The 3 parameters are Row which is the text for the entire row. Rank is the constants that we just discussed and we will see how they are used shortly, and the final parameters is the separator used for the file.

Line 2 uses the WX function ExtractString, this is a very powerful function that extracts a substring based the separator. And the constant “firstrank” does the first substring, and after that each “nextrank” gets the next substring, the same thing could be accomplished with a counter that starts at 1 and increase each time, its just a little cleaner to use the constants.

The IF statement on Line 3 is all about Double Quotes fields. In a CSV file a string can be double quoted in order for it to have a comma inside the field, such as an address line “123 Street, Suite1”. Unfortantely Extract string does not handle this so what is returned instead is “123 Street. So the IF statement on Line 3 test OneField to see if it starts with Double Quotes but does not end with Double Quotes and if so it loops through executing additional extract strings and stitching them together until it finds the closing Double Quote. Once it has the closing double quote, in strips the double quotes out of the OneField.

The ELSE IF on line 9 is similar, it detects a field content that has both an opening and closing double quotes and just strips them off.

And finally the parsed field is returned.

Returning to the DOImport Procedure, that brings us to Line 54, which uses the current field number to search the fields array for that position. Line 55 checks to see if the field number was found. If it was, then line 56 uses indirection to set the appropriate field in the structured variable ImportedRow to the value that was return from GetField. BTW, it is this logic that will let us set a field position to 0 if it is not included in the import.

Line 58 checks to see if this is the last field to be processed. If so then Line 59 adds the row to the ImportedRows array and breaks out of the interior loop. Otherwise the logic fall through to like 62 which increase the field number and cycles back to the top of the interior loop.

Once all of the fields for a row are processed and the break on line 60 is executed the logic falls through to line 64, which test if we are in test mode, if so and we have processed 10 rows the the break on line 65 breaks out of the logic. This way we only import the first 10 rows in test mode.

So once either 10 rows has been processed in test mode or the entire file if not in test mode, the logic fall through to the next portion of the DoImport Function, Line 69, which closes the CSV file.

2014-01-16_1444

The IF statement on line 71, checks to see if we are not in Test mode, and if not it begins the writing data Logic.

2014-01-16_1447

Line 72 changes the Static on the progress window to “Writing Data”. Line 73 Reset the Progress Bar. Lines 74-78, saves the configuration options to the Default Manager, which we have already discuss a number of times.

The IF statement on line 79 checks to see if the database table should be cleared first and Line 80 uses the HDeleteAll function to clear the database table if need.

Line 82 is again prepping the Progress Bar for a new loop. And that Loop starts at line 83.

2014-01-16_1456

Line 83 uses a FOR EACH loop to loop through each row of the ImportedRows array. Line 84 Updates the progress bar, this time instead of basing it on bytes read, it is based on rows processed, since we know the exact number of rows.

Line 85 sets a Boolean (DoUpdate) that is used to determine if this should be an add or modify.

The IF statement on line 86 checks to see if a MatchField was defined. IF so Line 87 seeks it in the array, and then Line 89 uses HReadSeek with some indirection to see if the record is already in the database. If the record is found Line 90 sets the DoUpdate variable to true.

Line 94 starts the field number at 1. And then Line 95 starts an interior loop to process each field. Line 96 looks for the field position in the array and if found, Line 98 uses indirection to set the database field equal to the field from the array.  Line 100 checks for the last field to see if all the fields have been processed and breaks out of the loop, if not Line 103 increase the FieldNumber and then cycles the loop.

The IF statement on Line 105 checks to see if an Update or Add should be done, which is accomplished by Line 106 and 108.

Once all the rows have been written the logic falls though to Line 112 which send a message letting the user know the import has completed and closes the window.

2014-01-16_1519

The final lines of code of the DoImport procedure are only reach if in test mode.

2014-01-16_1521

Line 117-119 hides the Progress Tab and displays Sample Tab.

Line 120 start a loop for 50 times, the number of fields in our Row array.

Line 121 Looks up the Field Number in the Fields array. If found the caption of the column is set to the field name in the sample table control, and the column is made visible with Lines 123 and 124, otherwise the column is hidden via Line 126.

And finally after all the columns are handled the Table is reinitialized with Line 129.

The last thing to look at is the Sample Table. As you can see from these screen shots it is a table based on the ImportedRows array with all 50 fields linked to the array and hidden.

2014-01-16_1530

2014-01-16_1532

2014-01-16_1532

And there we have it, a generic import function. As mentioned in its current state there is a little bit of code that needs to be defined for each database table you intend to support, but even that could be made generic if you have the interest. And we also discuss the ability to add some logic for converting a field as part of the import. If you do decide to do either of these I hope you share them with the rest of us.

Be sure to go over to wxLive.us and watch the Uncle Pete’s corner webinar that is the companion to this article.

Uncle Pete’s Corner is weekly webinar on all things WX every Friday , to watch the recorded version of this webinar, many other WX related webinars or to watch future ones live go to wxLive.us

[suffusion-the-author display=’author’]

Pete Halsted[suffusion-the-author display=’description’]

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s