Skip to main content

Avatar Incremental Data Warehouse Data Flow

Illustration and description of the data flow for myAvatar  Incremental Data Warehouse.

Data Warehouse Flowchart

 

Data Warehouse Incremental Middleware

The Data Warehouse Incremental Middleware reviews SQL table data from myAvatar and determines if there have been changes to table rows since the data was last loaded in the Data Warehouse. Any modified rows flagged for inclusion in the Data Warehouse are transmitted during a nightly load via a private MPLS network between Netsmart Plexus Cloud and the target database.

Upon completion of the nightly load process, an email notification will be distributed to an agreed upon distribution list.

Null values in columns will be transmitted as null values while empty strings will be transmitted as empty strings.

To process values in transient fields, the Middleware retrieves a list of transient rows for a table, queries the target database (Data Warehouse), and creates an SHA1 hash with the concatenation of the returned values for each row in the table. The Middleware performs a scheduled weekly run that pushes the SHA1 hash snapshot from the Data Warehouse to myAvatar for comparison purposes to determine what data to update in the Data Warehouse. By default, this run will occur on Sundays after the incremental load is completed, but the day of execution can be configured in the dss.properties file which is located in the ./lib folder. The properties are defined upon initial installation of the Data Warehouse. 

The transient field comparison is only done for tables/columns that are specified on the DW Comparison Configuration form. The Transient Field Comparator operates as a two-sided function. The comparison itself occurs within Cache, but the middleware initializes the comparison and waits for the completion for a table before proceeding with the load of the table.

Queue Manager

SQL table data from the Data Warehouse is either loaded manually or with the Queue Manager. The Queue Manager loads the SQL tables for processing in Cache and can be run from the command line. 

The queue manager is configurable for up to N number of threads. Due to processing constraints, the comparator processes in Cache is limited to 5 simultaneous threads. For example, if 10 threads are running in the middleware process that require the comparison process to run, 5 of those threads will be queued to wait until a thread in Cache opens up from the 5 that are currently running.

As an alternative to using the Queue Manager, manually start the load process with the ./dss.bat {table name}. This is preferable in the case of a single table failing during the load, and the user wants to rerun the single table.

SHA1 Hash snapshot

The SHA1 Hash snapshot provides a static view of the data in the Data Warehouse. This snapshot is produced on a weekly basis to compare data in the Data Warehouse to the data in myAvatar to detect any changes in columns marked as transient in a Cache table. This comparison allows data rows that contain transient columns to be updated incrementally. The SHA1 hash value is inserted into the myAvatar database and compared to the corresponding table in Cache.

Netsmart recommends scheduling the snapshot process for a one-time run for each middleware instance before the next database load, preferably on a weekly basis during a low-traffic period, such as after a nightly Data Warehouse load. The comparisons will not commence until either the SHA1 hash process is completed or the weekly process completes.

The snapshot is created just once a week due to the amount of time needed for processing. If a difference is found for a row, the snapshot information for that row is updated on the Cache side so that the data is kept up-to-date for future comparisons. 

Data Validation

The Data Warehouse Comparison Configuration form lets you specify SQL tables to be used in validating data sent to the Data Warehouse.

After each SQL table finishes its nightly load, the incremental middleware kicks off a comparison process. This process checks the table 'SYSTEM.dss_comparison_tables' for each SQL table and, if applicable, runs the comparison query against the Data Warehouse database for the columns specified. The comparison query returns a list of Row IDs that did not pass the comparison. These Row IDs are listed in a log file that is included in the nightly Dss completion email. 

If a table fails to load, error reports are sent via email as set up by the Netsmart Engineering team. Recipients are determined by the organization.

The comparison process will requeue any tables that failed and push the rows found by the comparison query to the Data Warehouse database.

Properties in the SYSTEM.dss_comparison_tables table include:

  • USERID - A valid Data Warehouse user set up in the Dss Middleware install.
  • table_name - An SQL table that will be validated.
  • ID_column - Default column for every table comparison. This field will be populated by myAvatar.
  • first_comp_column - First comparison column.
  • second_comp_column - Second comparison column.

Data Load

Any errors encountered in the Data Warehouse nightly load for any tables in the queue will kick off a restart process that will log the error in the table's log file and wait for 60 seconds before attempting to resend the remaining rows for the affected tables. There will be up to three attempts to resend the data before stopping. 

The 'dss_completion_flag' table on the SQL server, Oracle, or Informix database tracks the successful completion of the nightly load and indicates the load status as follows: 

  • The 'dss_status' field is updated with a value of I (Incomplete) at the start of each nightly load to indicate that the load is still running.
  • When the nightly load completes, the 'dss_status' field is set to 1 (Finished Successfully). 
  • If the restart process fails to complete the load after three tries, the 'dss_status' field is updated with a value of 0 (Errored). Otherwise, the status will always be logged as a 1.