Enabling Versioned Views

As noted in Release Type Options, SNOMED CT data can be imported into database tables from snapshot release files, from full release files or from both these sets of release files. The objective of importing the data into the database is to provide effective access to useful views of that data. In practical terms this means facilitating access to some or all of the views summarized in Table 1 below.

Table 1: Summary of Versioned Views of SNOMED CT Components and Reference Set Members

View
Content Description
Use Cases
Value

Current Snapshot

The most recent version of each SNOMED CT component and reference set member.

  • All practical uses of the current version of SNOMED CT

Essential for any use of SNOMED CT

Retrospective Snapshot

The most recent version of each SNOMED CT component and reference set member released prior to a specified earlier snapshot time.

  • A baseline against which to review of changes to SNOMED CT during or after installing a new release.

  • Review of health records taking account of the version of SNOMED CT at the time the data was recorded.

  • Comparative analysis of health record data collected at different times taking account of changes to SNOMED CT.

Valuable

Most Recent Delta View

The latest version of each SNOMED CT component and reference set member added, changed or inactivated in the most recent release. Typically, all these items with have an effectiveTime equal to the most recent release date. However, in cases where interim releases are made available between releases, the most recent delta view may be specified as including all items with an effectiveTime after the previous major release date.

  • Identification of changes to SNOMED CT arising from the most recent release.

Valuable as an indicator of recent changes

Other Delta Views

The versions of each SNOMED CT component and reference set member added, changed or inactivated after a specified delta start time and at or before a specified delta end time.

  • Identification of changes to SNOMED CT over a period of time.

Useful for longer term monitoring of changes.

Delta Views with Details of Changes

The content of a specified delta view combined with the retrospective snapshot view of SNOMED CT components and reference set members in the delta view at the specified delta start time.

  • Reviewing full details of changes to SNOMED CT between two releases or over a period of time.

  • Assessing and managing the impact of updates to SNOMED CT.

Required for effective change management

Table 2 summarizes the way in which different release type options affect the ability to access particular snapshot and delta views of SNOMED CT data. Importing the snapshot release supports direct access to the current snapshot view and query access to the most recent delta view. The full release provides access to all snapshot and delta views but is likely to perform slightly less well with the current snapshot view. Importing both full and current snapshot releases offers all the advantages of importing the full release and also provides direct access to the current snapshot view. This combined option requires more storage capacity but may be worthwhile because the current snapshot is likely to be the most commonly used view.

Table 2: Summary of Versioned View Access Capabilities Depending on Release Types Imported

Release Types Imported.

Views Supported

Current Snapshot

Full

Full & Current Snapshot

Current Snapshot View

Direct

Query

Direct

Retrospective Snapshot Views

Not supported

Query

Query

Most Recent Delta View

Query

Query

Query

Other Delta Views

Not supported

Query

Query

Delta Views with Details of Changes

Not supported

Query

Query

The following subsections explore specific mechanisms that can be used to deliver these views in a relational database.

  • Versioned View Queries

  • Versioned Database Table Views

  • Optimizing Versioned Table Views

See also Appendix C: Release Types and Versioned Views

Versioned View Queries

Current Snapshot Queries

Extracting the current snapshot from a full release table requires a query that can identify the rows with the most recent effectiveTime for each uniquely identified component or reference set member. The general form of a current snapshot view query is shown below.

General Snapshot Query - Replace full_tableName with Name of a Full Release Table

select * from full_tableName tbl
    where tbl.effectiveTime = (select max(sub.effectiveTime) from full_tableName sub
                                where sub.id = tbl.id);

Including Other Constraints in a Snapshot Query

It may be tempting to write queries that add criteria specific to a particular query within the structure of a general snapshot query. However if this is done, it must be done with care because additional conditions may cause incorrect results.

Additional criteria to be applied to snapshot view must be added to the outer query to deliver the expected results. Otherwise they may inadvertently exclude the most recent component from the snapshot and thus leading to a misleading result.

Retrospective Snapshot Queries

Queries for earlier snapshot views require an additional condition so that only versions with an effectiveTime that is equal to or earlier than (i.e. less than) the date of the snapshot. In this case, it is correct to include this condition in the nested query because the objective is to constrain the maximum effectiveTime to that the subquery returns. This ensures that the outer query does not return component versions added after the specified snapshot time.

Retrospective Snapshot Query for Snapshot as at 2019-01-31

select * from full_tableName tbl
    where tbl.effectiveTime = (select max(sub.effectiveTime) from full_tableName sub
                                where sub.id = tbl.id and sub.effectiveTime<='20190131');

Most Recent Delta View Query

. The general form of a query for the most recent delta view is shown below. Note that this query can be applied either to full release tables or the current snapshot release tables.

Most Recent Delta View Query for 2019-07-31 release. Replace tableName with Name of a Full or Snapshot Release Table for 2019-07-31 Release

select * from tableName tbl
    where tbl.effectiveTime = '20190731';

Specified Period Delta View Query

A more general purpose approach to delta views is to include all changes between two specified dates (or times). This can be applied to the period between two releases or to a defined period during which multiple changes may have occurred to the same component. The general form of a delta view query for a specified period is shown below.

General Most Recent Delta View Query - for Change After 2019-01-31 and On or Before 2019-07-31

select * from full_tableName tbl
    where tbl.effectiveTime > '20190131' and tbl.effectiveTime <= '20190731' ;

Delta View with Details of Changes

It is also possible to create an enhanced delta view that not only shows which components have changed but allows the pre-change state of that component to be seen. From a practical perspective this simply combines a delta view for a range of dates with a retrospective snapshot view for the delta view start date.

Delta View with Details of the Component Prior to the Changes

select * from full_tableName tbl
    where tbl.effectiveTime > '20190131' and tbl.effectiveTime <= '20190731'
union
select * from full_tableName tbl
    where tbl.effectiveTime = (select max(sub.effectiveTime) from full_tableName sub
                                where sub.id = tbl.id and sub.effectiveTime<='20190131')
    and tbl.id IN (select id from full_tableName
                    where effectiveTime > '20190131' and effectiveTime <= '20190731')
order by id;

Versioned Database Table Views

The previous section described queries that can be applied to database tables to access snapshot and delta views of individual database tables. However, meaningful access to SNOMED CT requires querying of interrelated data in multiple database tables including concepts, descriptions, relationships and reference sets. In most cases, the data from each of those tables needs to be from a snapshot view for the same date. This implies these composite need to derive information from snapshot queries applied to the table rather than directly accessing the table. Consequently, what should be relatively simple queries rapidly become complex, making them hard to understand and prone to errors.

Fortunately, relational databases provide solutions that reduce this complexity by allowing a query to be used to define a database view. These defined views can then be queried in exactly the same way as a database table. The result of this is that if all the required snapshot and delta views are defined for every table in a SNOMED CT database, it is possible to write relatively simple queries that return useful combinations of data from snapshot views of different tables.

Representing Snapshots as Database Views

The examples below two snapshot database views of a full release table. The first is current snapshot view and the second a is snapshot for 2019-01-31.

Creating a Current Snapshot View from the Full Concept Table

CREATE VIEW snap_concept as (select * from full_concept tbl
    where tbl.effectiveTime = (select max(sub.effectiveTime) from full_concept sub
                                where sub.id = tbl.id));

Creating a Retrospective Snapshot View for 2019-01-31 from the Full Concept Table

CREATE VIEW snap20190131_concept as (select * from full_concept tbl
    where tbl.effectiveTime = (select max(sub.effectiveTime) from full_concept sub
                                where sub.id = tbl.id and sub.effectiveTime<='20190131');

Having created those two views it is then possible to write queries like the examples below to display data from these snapshot views.

Creating a Current Snapshot View from the Full Concept Table

select * from snap_concept where id in (3859001,3704008);

Creating a Retrospective Snapshot View for 2019-01-31 from the Full Concept Table

select * from snap20190131_concept where id in (3859001,3704008);

Comparing the results of these two queries shows that in January 2019 both concepts changed in the 2019-07-31 release. The concept with id 3859001 was made inactive and the definitionStatusId of concept with id 3704008 was changed from 900000000000074008 (primitive) to 900000000000073002 (defined).

Creating Configurable Snapshot Views

It is possible to create snapshot views of every release file for every SNOMED CT release of SNOMED CT. However, this would result in over 600 distinct snapshot table views, most of which would rarely be used. A more practical solution is to create views for the current snapshot and two or three retrospective views with a configurable snapshot date. The SNOMED CT example database includes two retrospective snapshot views with a snapshotTime value in an identified row in a configuration table specifying the snapshot date.

The definition of one of these views is shown here. The other view (snap2_concept) has the same definition except that it specifies `cfg`.`id` = 2 rather than `cfg`.`id` = 1 so it refers to a different row in the configuration table. Having two independently configurable snapshots, allows queries to be written than compare different snapshots. Because the same configurable views are provided for all full release tables, changes to the snapshotTime for a view apply simultaneously to all those components and reference set members.

Example of Configurable Retrospective View Definition

CREATE VIEW `snap1_concept` AS select `tbl`.`id` AS `id`,`tbl`.`effectiveTime` AS `effectiveTime`,`tbl`.`active` AS `active`,`tbl`.`moduleId` AS `moduleId`,`tbl`.`definitionStatusId` AS `definitionStatusId` from `full_concept` `tbl` where (`tbl`.`effectiveTime` = (select max(`sub`.`effectiveTime`) from (`full_concept` `sub` join `config_settings` `cfg`) where ((`sub`.`id` = `tbl`.`id`) and (`cfg`.`id` = 1) and (`sub`.`effectiveTime` <= `cfg`.`snapshotTime`))));

Creating Configurable Delta Views

The SNOMED CT example database also includes two configurable delta views. These are created and configured in the same way as the configurable snapshot views. However in this case, deltaStartTime and deltaEndTime values in the identified configuration table row specify the delta period.

Example of Configurable Retrospective View Definition

CREATE VIEW `delta1_concept` AS select `tbl`.`id` AS `id`,`tbl`.`effectiveTime` AS `effectiveTime`,`tbl`.`active` AS `active`,`tbl`.`moduleId` AS `moduleId`,`tbl`.`definitionStatusId` AS `definitionStatusId` from (`full_concept` `tbl` join `config_settings` `cfg`) where ((`cfg`.`id` = 1) and (`tbl`.`effectiveTime` <= `cfg`.`deltaEndTime`) and (`tbl`.`effectiveTime` > `cfg`.`deltaStartTime`));

Optimizing Versioned Table Views

Using Separate Snapshot Tables

One of the points identified in the previous section is that snapshot queries and database views are less likely to perform as well as direct access to database tables. As shown in Table 3 on the SNOMED CT example database confirm that there is a substantial performance difference between these two approaches.

Table 3: Testing Performance of Queries on Snapshot Tables and Snapshot Views

Snapshot Table Seconds

Performance Ratio

Read 1 million rows from relationship snapshot

1.52

11.06

15%

Read 1 million rows from description snapshot

3.57

12.73

28%

Read all rows from concept snapshot

0.66

2.45

26%

Total time for all operations above

5.75

25.74

22%

Advanced test reading 10,000 relationships and with joins to descriptions and language reference set for the fully specified names of source, type and target concept

2.34

4.70

50%

Based on these findings the most effective way to optimize access to a snapshot view, is to replace the use of database views with snapshot tables. Representing the a snapshot with tables, rather than using a database view, adds roughly 2.6 Gb to the storage requirements for the example database.

The current snapshot view is essential and is used for most interactions with the database. Therefore, the performance enhancements justify use of the additional disk space required to store the current snapshot in separate tables. If there are specific reasons for extensive access to one or two retrospective snapshots, it might also be worthwhile representing those snapshots in separate tables. However, it would not be worthwhile to apply the same approach to the full range of less frequently used retrospective snapshots. Therefore, if the snapshot views defined in Versioned Database Table Views do not perform sufficiently well, it is worth considering ways to optimize snapshot access.

Unlike the views described earlier, the optimization methods described below require additional columns to be added to each of the full release tables. After importing release files, the tables are processed to generate values for the additional columns and this data is used to simplify the snapshot view queries by avoiding the need for nested queries.

Snapshot Flags Optimization Method

Overview

A column is added to each full release table. This column is used to represent flags that indicate which snapshot view each row is included in.

Practical Example

  1. A single 64-bit integer column called flag is added to all full release tables with a default value of 0 (zero).

  2. A distinct number which is a power of 2 between 20 and 263 is assigned to each required retrospective snapshot time.

  3. The flag column in each row is set to the sum of the values of all the snapshots in which that row appears.

    • To be precise this means that a specific bit in the flag value is set if the row is part of a particular snapshot and is not set if it is not part of that snapshot.

  4. Once this process is complete, it is possible to select the rows of a retrospective snapshot with a simple query that tests the relevant bit in the flag column value.

    • This avoids the need for the nested query required to identify rows that are part of a snapshot.

The example SQL below illustrates the process of flag setting. In practice, while this query works it is not very efficient for several reasons. A more efficient approach would be to use a stored procedure that computes the full set of flags applicable to each row. This approach would allow the flag column in each row to be updated once rather than requiring a separate update for each snapshot view.

Illustrative Example of a Flag Setting Query

SET SQL_SAFE_UPDATES=0;
update full_tableName tbl
 set flag=flag | 1
 where tbl.effectiveTime=(select max(sub.effectiveTime) from copy_full_tableName sub where sub.id=tbl.id and sub.effectiveTime<='20190731');
update full_tableName tbl
 set flag=flag | 2
 where tbl.effectiveTime=(select max(sub.effectiveTime) from copy_full_tableName sub where sub.id=tbl.id and sub.effectiveTime<='20190131');
update full_tableName tbl
 set flag=flag | 4
 where tbl.effectiveTime=(select max(sub.effectiveTime) from copy_full_tableName sub where sub.id=tbl.id and sub.effectiveTime<='20180731');
update full_tableName tbl
 set flag=flag | 8
 where tbl.effectiveTime=(select max(sub.effectiveTime) from copy_full_tableName sub where sub.id=tbl.id and sub.effectiveTime<='20180131');
SET SQL_SAFE_UPDATES=1;

Once the flags are set, a query such as the one below can be used to return component versions that are part of a particular snapshot view. The example query returns row in which the flag the second bit (value 2) is set. Based on the settings in the query above this means it would include rows that are part of the 2019-01-31 snapshot view of this table

Example of a Snapshot Query Using the Flag Method

select * from full_tableName where flag & 2;

Performance

Limited testing of this optimization approach indicates that it is between 2 and 3 times faster than the unoptimized snapshot views. Direct access to a snapshot table is still twice as fast as this optimized approach.

Storage Requirements

The full release files in the 2019-07-31 release contain a total of approximately 16 million rows. If flags are added each of these rows will require a further 8 bytes of storage. No additional indexes are required to support this optimization. As a result the overall increase in storage requirements to support this optimization is less than 150 Mb.

As described here the approach is limited to 64 snapshot times. This is probably more than sufficient for most practical requirements. However, it and could be extended by adding an another flag column or by changing the data type of the flag column to binary.

Disadvantages

  • The process of setting the flags required for this approach adds significantly to the time taken to build the database.

  • Adding an additional column to every table means that queries using "SELECT * FROM ... " will return a flag column that is not part of the original SNOMED CT data.

  • The flag values are technically essential to the process but this may not be apparent to anyone exploring the database.

  • Significantly slower than direct access to snapshot tables. Optimum current snapshot performance still requires the snapshot table.

Advantages

  • A significant improvement in retrospective snapshot performance compared with unoptimized tables.

  • Minimal impact on disk capacity (adds less that 5% to the size of the full release tables).

  • Provides a fallback option for the current snapshot view if storage capacity is limited.

Superseded Time Optimization Method

Overview

An additional datetime column is added to each row. This column is used to represent the time at when a row was replaced by the next version of that component or reference set member.

Practical Example

  1. A single datetime column called supersededTime is added to all full release tables with a default value of a long distant future date (e.g. 9999-12-31).

  2. Each full table is queried to establish the sequence of versions of each component or reference set member in effectiveTime order.

  3. The supersededTime of a component that have been updated is set to the effectiveTime of the immediately following version of that component

  4. Once this process is complete each component is part of all snapshot views with a snapshot time greater than or equal to its effectiveTime and less than its supersededTime.

    • This can be tested without the need for a nested query.

The example SQL below illustrates the process of flag setting. In practice, while this query works it is not very efficient for several reasons. A more efficient approach would be to use a stored procedure that computes the full set of flags applicable to each row. This approach would allow the flag column in each row to be updated once rather than requiring a separate update for each snapshot view.

Illustrative Example of Queries Setting the supersededTime Value

-- Create temporary table for the supersededTime values
CREATE TEMPORARY TABLE tmp (id CHAR(36) NOT NULL,effectiveTime DATETIME,supersededTime DATETIME, PRIMARY KEY (id,effectiveTime));
 
-- Compute the supersededTime values for each combination of id+effectiveTime and add these to the temporary file
INSERT INTO tmp SELECT tbl.id, tbl.effectiveTime, (SELECT IFNULL(MIN(sub.effectiveTime),DATE "99991231") FROM full_tableName sub
    WHERE tbl.id=sub.id AND tbl.effectiveTime<sub.effectiveTime) supersededTime FROM full_tableName tbl;
 
-- Apply the appropriate supersededTime values to each row in the full table
UPDATE full_tableName tbl
    JOIN tmp
    SET tbl.supersededTime=tmp.supersededTime
        WHERE tmp.id=tbl.id AND tmp.effectiveTime=tbl.effectiveTime;

Once the superseded time values are set, a query such as the one below can be used to return component versions that are part of a particular snapshot view. The example query returns row in which the flag the second bit (value 2) is set. Based on the settings in the query above this means it would include rows that are part of the 2019-01-31 snapshot view of this table

Example of a Current Snapshot Query Using the supersededTime Method

-- This query assumes the default supersededTime is 9999-12-31
SELECT *
    FROM full_tableName
    WHERE supersededTime = DATE '99991231';

Example of a Retrospective Snapshot Query Using the supersededTime Method

-- This query assumes that [snapshotTime] is replaced by the required snapshotTime
SELECT *
    FROM full_tableName
    WHERE [snapshotTime] >= effectiveTime AND [snapshotTime] < supersededTime;

Performance

. However, this figure varies depending on the complexity of the queries it is used in. Direct access to a snapshot table is between 2 and 3 times as fast as this optimized approach.

Storage Requirements

The full release files in the 2019-07-31 release contain a total of approximately 16 million rows. If a datetime column is added, each of these rows will require a further 5 bytes of storage.. As a result, the storage to fully support this approach required an additional 750 Mb.

Disadvantages

  • The process of setting the supersededTime required for this approach adds significantly to the time taken to build the database.

  • Adding an additional column to every table means that queries using "SELECT * FROM ... " will return the supersededTime column that is not part of the original SNOMED CT data.

  • The supersededTime values are technically essential to the process but this may not be apparent to anyone exploring the database.

  • The use of supersededTime together with associated indexes increases the storage capacity required for the full release tables by approximately 20%.

  • Significantly slower than direct access to snapshot tables and since the introduction of MySQL 8.0 are also slower than the snapshot flagging method.

Advantages

  • An improvement in retrospective snapshot performance compared with unoptimized tables but is out-performed by the snapshot flagging method.

  • A fallback option for the current snapshot view if storage capacity is limited but requires more storage than the snapshot flagging method.

  • Supports an unlimited number of snapshot times.

Last updated