Difference between revisions of "ITrack/Enterprise/Import Types"

From ISoft Wiki
Jump to navigationJump to search
 
(3 intermediate revisions by the same user not shown)
Line 26: Line 26:
===`databasefield`===
===`databasefield`===
If this is set to True, the `field` column is expected to be the name of a column in the destination table where data will go.  If this is set to False, this entry will not go into the final INSERT/UPDATE.  However, that doesn't mean it can't affect the import process.  Non-database fields are often used for users to specify import options or for mapping information that is required for another field's conversion.  For example, looking up a modelid often requires inventorytypeid, manufacturerid (through id or code), and a model name.  This means the conversion for such a field will probably reference a couple non-database fields for inventory type and manufacturer.
If this is set to True, the `field` column is expected to be the name of a column in the destination table where data will go.  If this is set to False, this entry will not go into the final INSERT/UPDATE.  However, that doesn't mean it can't affect the import process.  Non-database fields are often used for users to specify import options or for mapping information that is required for another field's conversion.  For example, looking up a modelid often requires inventorytypeid, manufacturerid (through id or code), and a model name.  This means the conversion for such a field will probably reference a couple non-database fields for inventory type and manufacturer.
NOTE: when this is False, these 'non-database' fields will still be present in {?converted_data_table} though they will not be part of {?mapping_data_table} or the import into the target table.


===`defaultconversion`===
===`defaultconversion`===
Line 46: Line 48:


==`importtypecommand`==
==`importtypecommand`==
This is a type that stores queries (commands) that can be run before and after an import is performed to extend current import behavior.  These queries follow the [[#Conversion Field Format|conversion field format]] for references to other columns and data.  Currently, the user doesn't see these commands (this may change in the future), but every command should have a user friendly name, description (for describing what is happening as the command runs), error description (for describing what went wrong/couldn't finish), and values for several more complex fields which are described below:
This is a type that stores queries (commands) that can be run before and after an import is performed to extend current import behavior.  These queries that reference {?raw_data_table} will follow the [[#Conversion Field Format|conversion field format]] for references to other columns and data.  For commands referencing data in {?converted_data_table} and {?mapping_data_table}, use the value in importtypefield.field to reference column names. 
 
Currently, the user doesn't see these commands (this may change in the future), but every command should have a user friendly name, description (for describing what is happening as the command runs), error description (for describing what went wrong/couldn't finish), and values for several more complex fields which are described below:
* `timing`: this dictates when in the [[#Import Process|import process]] that a command is run.  The current options are (See the import process section for more info):
* `timing`: this dictates when in the [[#Import Process|import process]] that a command is run.  The current options are (See the import process section for more info):
** Before Conversion: this happens after the data is put into the raw data table.  No other tables exist yet.
** Before Conversion: this happens after the data is put into the raw data table.  No other tables exist yet.

Latest revision as of 10:06, 15 October 2015

Import types are essentially 'recipes' defined in the database that ITrack uses that tell it how to import different types of data. Import types are currently used on the Data Management screen for users to import data.

ITrack Enterprise comes with many common import types used for basic functions. This list will expand over time. If you wish to automate a particular import type through data management, either:

  1. Contact ISoft Data Systems for assistance
  2. If you are a power-user who knows MySQL, the following documentation should help you create your own import types

Database Schema

There are several tables involved in creating a single import type. These will be addressed at a high level here. If you need in depth information on any particular table, please contact ISoft or refer to the examples.

`importtype`

This is the main table, and it contains one entry for each type of import that a user can perform with the screen. You can define a name, description, destination table, etc. There are also several columns that allow you to specify whether this import is insert only/update only and whether these options are on by default.

`importtypefield`

For each importtype entry, there are generally many entries in this table. Every entry in this table represents a 'field' that can be mapped by the user on the data management screen. Each field represents (generally) a column in the destination table that can/should be filled out. It is important to understand the difference between this and importtypefieldsource, which dictates ways that this data can come in. Each field should have an appropriate data type and column name set. Several of the more complicated and important fields are described here:

`requirement`

  • 'Optional' means a normal field that doesn't need to be mapped.
  • 'Required' is a field that must be mapped (this is common for specifying fields that really should be filled out in the destination table).
  • 'Update' means a field that, when specified is considered to be part of the set of columns that will be used to match up against existing rows. For example, when updating existing models, the model name might be marked as Update, causing input rows to be compared to existing rows based on this field. Multiple columns can be marked as update if the table has a multi-column 'identity'.
  • 'Update Required' means the same as Update but it also must be mapped before import is allowed.

`visible`

Visible columns can be seen by the user an mapped. Invisible fields can still affect the import process if they have a default conversion (see below). Invisible columns with a default are a good way to give columns complex defaults.

`databasefield`

If this is set to True, the `field` column is expected to be the name of a column in the destination table where data will go. If this is set to False, this entry will not go into the final INSERT/UPDATE. However, that doesn't mean it can't affect the import process. Non-database fields are often used for users to specify import options or for mapping information that is required for another field's conversion. For example, looking up a modelid often requires inventorytypeid, manufacturerid (through id or code), and a model name. This means the conversion for such a field will probably reference a couple non-database fields for inventory type and manufacturer.

NOTE: when this is False, these 'non-database' fields will still be present in {?converted_data_table} though they will not be part of {?mapping_data_table} or the import into the target table.

`defaultconversion`

This is an advanced field that follows the conversion field format. If a field is present that has a default conversion, it will be used if that field has no import type field source that is mapped or has a fixed value. Generally, a default value is specified in here (such as NOW() for a date-time or 0 for quantity), or a default conversion query (such as REPLACE({?partnumber}, '-', ) which would strip dashes out of what is input into the partnumber field.

`conversionvalidation`

This field specifies whether to perform validation on the data after the conversion step. 'Invalid values' count as either NULL, or less than or equal to 0 for numeric fields, or empty string for all other types of fields. It should be noted that, currently, every invalid cell is reported with a seperate entry in the warning/error list. This could lead to a lot of entries on large files with big problems.

  • None: do not perform validation, any post-conversion values will be implicitly cast to the destination table during the mapping step
  • Warn: any invalid values should be put into the warning section of the import, but import should not halt
  • Error: any invalid values should be put into the error section of the import, and halts import

`importtypefieldsource`

Most of the time, every importtypefield will have a single entry in this table. However, sometimes there is more than one, in the cases where the data the user will supply differs from the data that is saved. This commonly takes the form of a field that saves a unique ID, but a field source that takes a name (and has a conversion that looks up the id for that name). If a field has only one source, the user will not see information about the source on the screen. In this case, the 'output' data type of the source is expected to be the same as the data type for the field. This means either that the data types match, or in the case of a source with a conversion (see below), the output of the conversion matches the field data type. If they don't, implicit casting will happen, causing unpredictable behavior, use at your own risk. If a field has 2 or more sources, the user will see all the sources in a tree on the screen. The sources' name column should help the user understand which they should map to. This commonly happens where a field is an ID, and there are multiple sources, one that uses the raw ID itself (no conversion), and one that uses the name to look up the ID using a conversion query. The user is only allowed to map (or supply a default fixed value to) 1 source for any particular field, making sources essentially different 'options' for how to fill out a field.

`conversion`

This is an advanced field that follows the conversion field format. If this source is 'used' for a particular field, this conversion will be used to translate between the input data and the data type required by the importtypefield. A common application for this is an importtypefield representing a unique ID and a importtypefieldsource that matches to a 'code' or a 'name' and a conversion query that looks up the ID based on the code or name. Refer to the conversion field format form more information.

`importtypecommand`

This is a type that stores queries (commands) that can be run before and after an import is performed to extend current import behavior. These queries that reference {?raw_data_table} will follow the conversion field format for references to other columns and data. For commands referencing data in {?converted_data_table} and {?mapping_data_table}, use the value in importtypefield.field to reference column names.

Currently, the user doesn't see these commands (this may change in the future), but every command should have a user friendly name, description (for describing what is happening as the command runs), error description (for describing what went wrong/couldn't finish), and values for several more complex fields which are described below:

  • `timing`: this dictates when in the import process that a command is run. The current options are (See the import process section for more info):
    • Before Conversion: this happens after the data is put into the raw data table. No other tables exist yet.
    • Before Conversion Validation: this happens after data has been converted and put into the converted table, but before validation has been run. This step exists for the user to run queries that will perform extra validation or fill out more data that needs to be validated.
    • Before Mapping: this happens after the 'mapping table' is created, but before data is copied in from the conversion table. Commands at this stage allow preparing the mapping table for the converted and validated data.
    • Before Import: this happens after data is copied from the conversion table to the mapping table, but before data is inserted into the final table.
    • After Import: this happens after import is complete. This is after everything except for the cleanup.
  • `rank`: this field (defaults to 1) dictates the order of command execution for all commands with the same timing.
  • `permanence`: Dictates what types of operations the command is used on. 'Test' means only on test imports, 'Import' is only on real imports, and 'Both' is for both.
  • `testresults`: if this is set to true, this query's results should be displayed in the test results dialog. This is currently not implemented, but will be eventually.
  • `commandeffect`: tells the import screen whether the results should be ignored ('None'), or added to the total number of inserted rows ('Inserted'), updated rows ('Updated'), or deleted rows ('Deleted').
  • `postfail`: dictates error handling. If this is set to 'Stop', a query error will halt import execution. If its set to 'Error', execution stops and the errordescription is shown to the user, and if its set to 'Continue' execution continues.

Conversion Field Format

The data importer in ITrack uses a special query markup language with several advanced features that allows the developer to encode a certain amount of logic into a string. Conversion fields are strings that can contain fixed values, references to other fields, and/or MySQL code. These options can be combined in any logical way that will make sense inside of a query. Each option is explained below. Refer to current import types for details.

This format is used both in the conversion queries and in commands that reference columns in the {?raw_data_table}. References to later tables will use importtypefield.field as a column name instead.

  • Fixed values: a conversion field can simply have a fixed value specified, such as NULL, 0, or 'NE'. These values are common for hidden fields with default conversions. However, users often map fields to 'fixed values' using the software too. In those cases, they may specify such values and strings instead of a column reference.
  • References to other fields: every field in the importtype can be referred to by its unique `field` column. For example, if an import type had a `inventoryid` field, another field's conversion could refer to this field with {?inventoryid}. This can be (and often are) mixed in with the other other options of fixed values and mysql queries. When another field is referred to this way, the reference will be replaced with that other field's conversion string. So, if a conversion string for the 'test' field was '{?inventoryid} + 1', and the inventoryid field was mapped to column3 in the input file, this conversion will turn into '`column3` + 1'. This works with fixed values and subqueries. If a circular reference chain is detected, the system will stop replacing when a loop is detected.
  • Use of MySQL: since conversions are written directly into one of the queries during import (after field replacements), conversion queries can contain any amount of MySQL code. This includes:
    • References to functions ('f_get_uuid()')
    • Simple data conversion ('REPLACE({?partnumber}, '-', )')
    • Lookups in other tables ('(SELECT inventorytypeid FROM inventorytype WHERE name = {?self} LIMIT 1)'). Notice that, since this is a subselect, I added parentheses and a LIMIT 1. The LIMIT 1 is good practice because MySQL throws an error if a subselect returns more than one result when being used in a SELECT clause. This comes up when a certain lookup is expected to be unique, but a customer's database is missing a unique key or something.
  • Other useful references:
    • Specific columns in the import file can be referred to by column number like this: `column1` or `column22`. Normally this is not suggested because import files can have their columns in any order. The system does this automatically in the background if a conversion query refers to a mapped field using {?fieldname}. However, this can be useful for creating a fixed import that expects a particular input format and you don't want to have any user input.
    • {?self}: this special variable can be used in conversion queries. Its essentially a shorthand to the mapping of this specific column. If this column is mapped to a column, self will be replaced with `columnX`. If this column has a fixed value or query, {?self} will be replaced with that value.
    • Specially defined variable mappings. See special variables for more details. Most of these variables are only available at some points in the import process

Import Process

In order to understand how to build an import type, you need to understand the process that the system goes through in order to import the data the user supplies. The following are the steps that the system takes after the user hits import or test.

  1. Perform user data check and ensure all required data is mapped
  2. Create an empty table with column names as `column1` through `columnN` for the user input data. That table's name is something like import_<destination>XXXXXXX where destination is the destination table and the XXX is a timestamp to avoid collision. The name of this table is mapped to the variable named {?raw_data_table}. All column types are TEXT.
  3. Data is imported to this raw data table
  4. Conversion queries that are found to not involve references to the raw data table are run a single time now, for speed purposes.
  5. All 'Before Conversion' import commands are run, in order of rank. All mapped variables are replaced. This includes the special variable {?raw_data_table} and all the mappings the user created in the mapping step. You should use these variables instead of directly referencing `columnX`. If a query fails, error handling depends on the postfail and errordescription fields (see importtypecommand for more details).
  6. The 'conversion' table is created. This table stores 1 row for every input table row, but as data is inserted into it, all conversions are run against it. There is a column for every mapped field in this table, and the column names are the same as they will be in the destination table. However, the data types of these columns are dictated by MySQL and are NULLABLE. This is different than the mapping table, which has all the data types and keys of the destination table. This table also has an AUTO_INCREMENT column which gives the programmer access to which input row id each item relates to. The name of this table is stored in the special variable {?converted_data_table}.
  7. All 'Before Conversion Validation' commands are run. All mapped variables are replaced, including the special converted data and raw data table names. This step is present to allow an opportunity to perform extra pre-validation or add more data to be validated.
  8. A 'mapping' table is created that has the same structure as the destination table. This is done by running CREATE TABLE XXX LIKE `destination_table`. This table name is stored in {?mapping_data_table}.
  9. All 'Before Mapping' commands are run. All mapped variables are replaced, including the converted, raw, and mapping table names. This step is present to allow an opportunity to modify the mapping table before it receives data. Sometimes unique keys on the mapping table require extra steps to be taken before importing data.
  10. Data is copied from the conversion table into the mapping table. This step is important because the mapping table (which is like the destination table) contains primary and unique keys, which can cause duplicate entries to be compressed into one row. This is dealt with by the copy INSERT using ON DUPLICATE KEY UPDATE (and incrementing a special count variable on each row, which can be queried).
  11. [Import only (not test)]: [If the backup table before import checkbox was checked by the user, the destination table is copied into a backup. This usually takes the format of backup_importer.<table_name><timestamp>. The backup_importer database is the default value for a global setting that can be defined at each company.]
  12. All 'Before Import' commands ar erun. All mapped variables are replaced, including all special table name variables.
  13. [Import only (not test)]: [If the user has the 'empty table before import' checkbox checked, a TRUNCATE TABLE statement is run]
  14. [If the user has the insert new rows checkbox checked, new rows are inserted into the destination table (import only) using an INSERT INTO <destination> query. If this is a test, a special query is run that JOINs the mapping table to the destination table, looking for rows with no matches on all fields marked as Update or Update Required.]
  15. [If the user has the update existing rows checkbox checked, existing rows that match the input data get their fields updated (import only) using an UPDATE statement that joins the mapping data table to the destination table using all the Update and Update Required fields. If this is a test import, a special query is run that JOINs the conversion table to the destination table, lookign for rows with matches on all Update and Update Required fields.]
  16. [Import only] [If the 'Optimize Table after import' checkbox is checked, an OPTIMIZE TABLE statement is run against the destination table]
  17. All 'After Import' import commands are run, in order of rank. This is pretty much identical to the Before Import command step, except more tables exist.
  18. All the temporary tables created in earlier steps are dropped.

Special Variable Mapping

  • {?raw_data_table}: this variable is mapped before the 'Before Import' commands are run. Its value is the name of the raw data table (this is the table that contains all the user's raw string data in columns named column1 through columnX.
  • {?converted_data_table}: this stores the name of the table whose column names match the destination column names, but there is still one row per input row. All fields are NULLABLE, and there are no keys except for a special AUTO_INCREMENT key that is used to keep track of which input row created each item. This variable is not mapped in the Before Conversion command stage.
  • {?mapping_data_table} : this stores the name of the table whose schema is like the destination table. This variable is not mapped in the Before Conversion or Before Conversion Validation command stages.