Denodo - Creating Integrated Views
The following sections describe the process of creating the following types of views using our example to illustrate the process:
- Union views: see section Creating Union Views
- Join views: see section Creating Join Views
- Selection views: see section Creating Selection Views
- Flatten views: see section Creating Flatten Views
- Intersect views: see section Creating Intersection Views
- Minus views: see section Creating Minus Views
- Interface views: see section Creating Interface Views
We will use the following example as a guide when describing the process:
Example: Unified data about customer sales and incidents.
- A telecommunications company offers phone and internet services to its clients. Data on the incidents reported in the phone service are stored in a relational database, which is accessed through JDBC. In addition, data on the incidents reported in the Internet service are stored in another relational database also accessed through JDBC.
- In our example, the director of the IT department wants to monitor the number of incidents (either telephony or Internet) notified by the clients with the greatest sales volume to establish whether measures should be taken to increase client satisfaction.
- Data on customer sales volumes are managed by another department of the company. That department provides a Web Service so the other departments can access to that data.
- In this example, we will see how Virtual DataPort can be used to build a unified data view to meet the needs of the IT department, by obtaining the total number of incidents from clients with the greatest sales volumes.
Union Views
Creating Union Views
- A union view merges the tuples from various views into a single view. These views can belong to different databases.
- In standard relational algebra, all the relations or tables must have the same schema (same attributes with the same type). Virtual DataPort provides the standard relational algebra UNION operation. However, Virtual DataPort also has an extended union operation, whereby if any of the input views has an attribute that is not present in the other views, it is added to the output view.
Extended Union Views
Creating Extended Union Views
- To create an extended union view, right-click on the Server Explorer and click New > Union (extended).
- The Tool will open the “Extended union view” dialog and it will add the view that is selected in the Server Explorer. To add views to the union view, drag them from the Server Explorer to this dialog. You can drag views from different databases.
- In our example, we will drag the base views phone_inc and internet_inc (we explain how to create these views in the section Creating Base Views from a JDBC Data Source).
- The “Extended union view” dialog has six tabs:
- Model: tab where you have to drag the views that will form the union and establish the associations between the fields of the views. The fields linked by an association are considered a sole attribute in the output schema of the view.
- By default, the Tool will create associations between the attributes of the same name and type. You can also add more associations by graphically linking an attribute of one view with other attribute of the other view. In views with many fields, you can use the Find option to locate the desired field in a view (right-click on the view).
- Using the Design Studio, if the associations were not automatically created, connect them by dragging the “link” button (on the top-right side of the source view) to the target view. Once the views are connected, click the “link” and a dialog will show up at the bottom of the Model tab with the fields of both views. Use this dialog to add more associations graphically by linking an attribute of the left view with other attribute of the view on the right side. Connecting internet inc and phone inc with the Design Studio
- In order to associate two fields, they must have the same type or compatible. For example, you can associate a float field with a double, an int with a long, etc.
- To delete a view from the union, click the button image0 (on the top-right side of the view). In the Design Studio, click the trash button located on the top-left side of the view.
- In this tab, you can add view parameters by clicking on image1, at the top of the dialog. The section Parameters of Derived Views explains what are view parameters and the rules you need to follow when using them.
- Associations: tab that lists the associations between fields. You can delete them by clicking on image2.
- Where Conditions: tab that allows you to add WHERE conditions to the definition of the view. There are two modes to add WHERE conditions:
NOTE: The Simple condition mode is currently unavailable on the Design Studio and the default mode is Specify Where condition. It is on the roadmap to add the option Simple condition mode.
- Simple condition mode: use it to create a list of conditions combined with the AND operator, where each condition has an attribute of the view as left operand.
- To add a new condition, click on image3. For each condition you have to indicate:
- An attribute of the view.
- The operator of the condition: =, <>, contains, etc.
- The condition operators are described in detail in the section Comparison Operators of the VQL Guide.
- The right operand. It can be either any expression using constants, view attributes and functions (see section Functions for Conditions and Derived Attributes of the VQL Guide).
- Click on the menu Help > Functions list to display the list of existing functions.
- If you want to enter a constant and depending on the type of the selected field, you can open the “Value editor” of the condition by clicking on image4:
- If the type of the field is text or xml, it will open the “Value editor”.
- This editor will surround the value with single quotes ('). If the value contains single quotes, it will escape them when you click Ok to close the editor.
- If the type of the selected field is compound (register or array), it will open the “Compound value editor” (see section Creating Conditions with the Compound Values Editor)
- If the type of the field is text or xml, it will open the “Value editor”.
- Specify Where condition mode: used to write any condition. This is required, for instance, when you need to use the OR and NOT operators, or when the left operand of the condition is an expression. Press Ctrl Space to display the auto-complete list that will help you form syntactically valid expressions.
- Group By: tab that allows you to add GROUP BY fields to the view. To do this, select the Use group by check box, select a field of one of the views in the All fields list and click Add >>.
- To select two or more fields, hold Ctrl or Shift and click on each field.
- If the input views have many fields, press Ctrl F to display a search box that will help you find the field you are looking for.
- Besides adding fields, you can define Group By expressions by clicking Add expression.
- To change the order of a field in the Group By clause of the view, select a field in the Selected group by fields and expressions list and click on the buttons and .
- After adding one or more group by fields or expressions, you can define a HAVING condition in the Having condition box.
- You can define aggregation expressions by clicking on the New aggr. field button of the Output tab.
- Note that if you add Group By fields, the view can only have Group By fields and aggregation expressions (see section Group BY Clause of the VQL Guide).
- Output: in this tab you configure the output of the view:
- Change the name of the view.
- Change the fields’ name. If the view has many fields, press Ctrl F to display a search box that will help you find the field you are looking for.
- Reorder the projected fields. To do this, select the check box beside the type of the field you want to reorder and click image5 or image6 to move the field up or down.
- Project the subfields of a register field by right-clicking it and then, on Project subfields of…
- Virtual DataPort models data types with a complex structure using the types register (indicated with the icon image9) and array (indicated with the icon image10).
- In this tab, you can project the inner fields of register fields. To do this, right-click on the field and then, on Project subfields of… Instead of projecting all the fields of a register, you can project only some of them by right-clicking on each field and then, on Project subfield…
- To project the content of an array field, create a Flatten view (see section Creating Flatten Views).
- Sort the fields alphabetically: right-click on the table of fields and click Sort fields alphabetically.
- Add derived attributes: click New field to open the expressions editor.
- You can edit these attributes later by clicking on image1, on the “Field Type” column.
- In this editor, you have to specify the name of the new field and an expression. This expression can have constants, attributes of the projected views and functions (see the section “Functions for Conditions and Derived Attributes” of the VQL Guide).
- Click on the menu Help > Functions list to display the list of existing functions.
- Press Ctrl Space to display the auto-complete list that will help you form syntactically valid expressions.
- After defining a derived field, you can set its “Source type properties” by clicking on image1, in the cell that displays the field’s type. The section Viewing the Schema of a Base View explains what these properties are used for.
- If you have selected the Use group by check box in the Group by tab, you can add aggregation expressions to the view by clicking New. aggr. expression.
- Delete fields: select the check boxes of the fields you want to delete and click Remove Selected.
NOTE: To select several fields at once, select the first field, press Shift and then, select the last field. Also, you can right-click on each field and click Drop. - If you have deleted one or more fields from the output of the view and you want to restore them, click Restore > Fields. The Tool will display the “Restore Fields” dialog where you have to select the fields that you want to add again to the output.
- If you want to reset the entire output schema of the view, click Restore > Schema. This will remove the Group by definition of the view and add to the “Output” all the fields from the source views.
- Remove the duplicate rows from the output: select the DISTINCT clause check box.
- Order the output by one or more fields: select a field in the ORDER BY fields drop-down and click on image3. After adding an ORDER BY field, select if you want to order the output in ascending (ASC) or descending (DESC) order.
- Define the primary key of the view: select the check box beside the type of the fields that form the primary key and click on Set selected as PK. In addition, you can right-click on the field and click Set field as PK.
- See more about the primary key of a view in the section Primary Keys of Views.
- Add a description to the field by clicking on image1, on the column “Description”. You can obtain the description of the fields from:
- The JDBC interface.
- The ODBC interface.
- The Denodo stored procedure CATALOG_VDP_METADATA_VIEWS. The section CATALOG_VDP_METADATA_VIEWS of the VQL Guide describes this procedure.
- Metadata: tab to define the folder where the new view will be stored and provide a description for the new view.
- Click Save (image17)
- Model: tab where you have to drag the views that will form the union and establish the associations between the fields of the views. The fields linked by an association are considered a sole attribute in the output schema of the view.
- In our example using the Virtual DataPort administration tool:
- Click on the menu File > New > Union (extended).
- Drag the views phone_inc and internet_inc to the “Model” tab and add the association internet_inc.summary = phone_inc.description. See Creating the union view of phone_inc and internet_inc (“Model” tab).
- In the “Output” tab (see Creating the union view of phone_inc and internet_inc (“Output” tab)):
- Rename the view to incidents
- Remove the fields pinc_id, iinc_id, specific_field1, specific_field2 and specific_field3. To do this, select these fields and click on Remove selected.
- Define the primary key of the view with the field taxid. To do this, select the check box beside the type of taxid and click on Set selected as PK.
- After this, click Save to create the view. Then, the Tool will display the schema of the new view.
- In our example using the Design Studio:
- Click on the menu File > New > Union (extended).
- Drag the views internet_inc and phone_inc to the “Model” tab. If the views were not automatically linked, connect them by clicking on link button (on the top-right side of internet_inc view) and drag to phone_inc view.
- Add an association internet_inc.summary = phone_inc.description. To add the association, click on the link between the 2 views and in the panel below, with a drag and drop movement connect the summary field of the left side with the description field on the right side. See Creating the union view of phone_inc and internet_inc (“Model” tab).
- In the “Output” tab (see Creating the union view of phone_inc and internet_inc (“Output” tab)):
- Rename the view to incidents.
- Remove the fields pinc_id, iinc_id, specific_field1, specific_field2 and specific_field3. To do this, select these fields and click on Remove.
- Define the primary key of the view with the field taxid. To do this, click on the key icon in the las column of the taxid row.
- After this, click Save to create the view. Then, the Tool will display the schema of the new view.
Standard Union Views
Creating Standard Union Views
To create a standard union from the administration tool, follow these steps:
- Click the menu File > New > Union (standard SQL).
- In the Model tab:
- Drop the views to this tab. The views do not need to have the same number of fields.
- The first view added will be considered the “left view” of the union. This view will be highlighted with a thicker border.
- The schema of this new view will be composed by the fields of the left view that are linked with fields of the other views. The fields that are not linked will not participate in the UNION.
- An association is completed when it links a field of all views. In that case, the linked fields will be highlighted in bold and their “representative field”, the one in the left view, can be used in a condition in the Where Condition tab, as group by field or in the having condition in the Group by tab, or projected in the Output tab.
- When an association does not link a field of all views, fields will be highlighted in red and the tool will ask for completing the association or removing it.
- Fields that are not associated with other fields, do not participate in the UNION, so they will be ignored and cannot be used in the Where Condition, Group By or Output tabs.
- The fields of the left view will be projected in its original order, and the fields of the other views will be reordered to match the specified associations.
- The tabs Associations, Where Conditions, Group By and Metadata work in the same way as in the other derived views.
- Output tab: by default, this wizard creates a UNION ALL view (i.e. the duplicate rows are not removed). To create a UNION, which removes duplicated rows, clear the check box Union ALL.
NOTE: You should choose UNION ALL whenever is possible because it is much more efficient than UNION and UNION DISTINCT. With UNION and UNION DISTINCT, the optimizer cannot apply two of the most important optimizations: Aggregation Push-down and Join-Union Push-Down. This happens because this UNION performs an implicit DISTINCT operation, and therefore calculating a join or group by operation before or after the distinct is not equivalent in general. - For example:
- In the screenshot above, two views were added to the wizard. This will generate the following UNION ALL:
- SELECT customerid, customername, contactname FROM customer
- SQL UNION ALL
- SELECT supplierid, suppliername, contactname FROM supplier
- Note that address is not included in the UNION.
- Then if, for example, you add some where condition, enable group by and add a new aggregation field you can obtain something like this:
- SELECT count(*) FROM (
- SELECT customerid, customername, contactname FROM customer
- SQL UNION ALL
- SELECT supplierid, suppliername, contactname FROM supplier
- WHERE customername like 'someregex' GROUP BY customername
Partitioned Union Views
Creating Partitioned Union Views
- In a logical data warehouse scenario, it is common that the data from the facts table (and sometimes the dimensions) is located in two different systems. For example, let us say we are a retailer company and the information about sales is stored in two systems: an enterprise data warehouse and a data lake based on Hadoop. In this case, in order to create the canonical view containing all the sales information, you can create a view in Denodo called sales defined as a UNION of both tables.
- However, we know that each system has a specific part of the data, defined by a certain criterion:
- The enterprise data warehouse contains the sales data from the current year.
- The data lake based on Hadoop contains the sales data from previous years.
- Therefore, if a query asks for the sales from 2016, for instance, it is not necessary (and it would not be efficient) to access the data warehouse system as we know that the information from past years is in the data lake only. In cases like this, you need an extra step to provide Denodo the information about the partition criterion. What you need to do to build this partitioned union is:
- create an intermediate selection view over each partition containing the condition that partition fulfills, and
- create the union as the union of these intermediate selections.
- The conditions in our example would be:
- year = getyear(CURRENT_DATE) for the data warehouse, and
- year < getyear(CURRENT_DATE) for the data lake
- This way, if a query contains a filter condition that is not compatible with some of the partitions, the query optimizer will detect that situation and it will remove all the union branches that are not necessary for that particular query. Using the previous example, if the query contains the condition year = 2016 it will detect the condition year = getyear(CURRENT_DATE) is not compatible with the current selection and the only partition that is compatible is the one accessing the Hadoop system.
- Finally, there are cases where the partitioning criterion is not defined by a pre-existing field. For example, let us imagine our company sells products worldwide and the sales information is partitioned in three systems:
- One containing the information for EMEA (Europe, the Middle East and Africa)
- One containing the information for America
- One containing the information for APAC (Asia-Pacific)
- The partition in this case is made by region, but we do not have a column in sales specifying a region. In order to build the partitioned UNION, you can create the intermediate selections using view parameters. You can find an example of this scenario in the section Parameters of Derived Views.
Join Views
Creating Join Views
A join view executes the relational algebra operation JOIN on a series of input views. These views can belong to different databases.
To create a join view, click Join on the File > New menu or, right-click the Server Explorer and click Join on the menu New.
The Tool will open the “Join view” dialog and it will add the view that is currently selected in the Server Explorer. To add more views to the join view, click the Model tab and drag the views from the Server Explorer to this tab.
You can join more than two views in the same join view.
In our example, we will join the view incidents created in the previous section and the Web service view average_monthly_sales, created in the section Creating Base Views from a SOAP Web Service.
The “Join view” dialog has six tabs:
- Model: drag the views that will form the join. Consider this:
-
-
When you add two views to this tab, the Administration Tool will automatically create a join condition if there is an association between these views and this association is a referential constraint (i.e. it is a primary key/foreign key relationship). The join condition will be the same as the condition mappings of the association.
-
When adding a view to a join view, the Administration Tool now suggests the views you may want to add to the join view, based on the associations of that view. The suggestions are next to the label Associated views; just click the view to add it automatically.
-
To define a simple join condition do this:
-
-
If you use the Administration Tool, drag the field of one view to the field of the other view. To configure the operator used in this JOIN condition, double-click the arrow that joins both fields and select the operator. To add more complex join conditions, go to the “Join Conditions” tab.
-
If you use the Design Studio, first, click on the link button (on the top-right side of the source view) and drag it to the target view. After connecting the views, click the link and a dialog will show up at the bottom of the Model tab with the fields of both views. In this dialog, link a field of the left view with another of the right view. The source view and the target view are the left view of the join and the right view of the join respectively.
-
Connecting internet_inc with phone_inc using the Design Studio
To define a cross join, just drag the views into the dialog and do not link any field.
In the “Model” tab of the view, you can add “View parameters”. The section Parameters of Derived Views explains what are view parameters and the rules you need to follow when using them.
-
Join Conditions: tab where you can define the join strategy and the join conditions:
-
Configure the join strategy. To do this, click on the triangle icon located between the names involved in the join. The options are:
-
Join type: Inner, Left outer, Right outer and Full.
-
Execution method: Hash, Merge, Nested and Nested parallel. If you select nested parallel, you have to provide the maximum number of parallel subqueries (“Nested parallel number”). If you select the option “Any” method, the Server will try to select the best suitable join method.
-
Order in which the input views are considered. If it is “Ordered”, the view on the left of the join will be considered the first view.
-
If you want to define a join condition between a field of a simple type (int, long, etc.) and a subfield of a compound type, the subfield of the compound type has to be placed in the left side of the join condition.
For example, let us say that you want to create a join view with these two views:
-
view_a, which has a field f1 of type int
-
view_b, which has a field f2 of type array. This array has two int subfields: f2_a and f2_b.
If you want the join condition to be f1 = f2_b, you have to define the join condition as ... JOIN ... ON (f2).f2_b = f1 That is, the subfield of the array must be placed on the left side of the join condition.
-
Note: These preferences may have a significant impact on the performance of queries. Read the section Optimizing Join Operations for more information about this.
-
-
Edit the join conditions added in the “Model” tab, by clicking on them.
Click Add new condition to define a new join condition and Add cross join to define a cross join between two views.
In simple scenarios, you can define join conditions and cross joins graphically from the “Model” tab. However, in complex situations you need to define these from the “Join Conditions tab”:
-
By default, the conditions added with the “New Join Condition” dialog are added to the most internal join that contains all the fields of the condition.
In complex joins, you may need to specify to which join the new condition belongs. To do that, from the “New Join Condition” dialog, select the check box at the bottom and select the views involved in the target join.
-
From the “Model” tab, you cannot specify a join between a view and the result of a cross join of two views. E.g.:
(view_a CROSS JOIN view_b) INNER JOIN view_c ON view_a.field1 = view_c.fieldANote: In the Design Studio, to define a join between a view and the result of a cross join of two views, follow these steps:
-
Connect the views view_a and view_b (do not add join conditions)
-
Connect view_a and view_c.
-
Add the condition by connecting the field view_a.field1 with the field iew_c.fieldA.
The Add cross join button is not available in the Design Studio.
To create this view, follow these steps:
-
Drag the three views into the “Model” tab.
-
In the “Join Conditions” tab, click Add cross join, select view_a and view_b and click Ok.
-
Click Add join condition and enter the condition view_a.field1 = view_c.fieldA. Then, select the check box at the bottom of the dialog. In the Left subview list, select (view_a CROSS JOIN view_b) and in Right subview, view_c.
-
-
The section Join Conditions with Similarity Operators explains the meaning of the operator ~.
Note: To create Cross Join views, you just have to drag the views into the “Model” tab and do not define any join condition in the “Model” tab, the “Join” tab or the “Where condition” tab.
Note: Do not use fields that are “View parameters” in the join conditions.
-
-
Where Conditions: tab that allows you to add WHERE conditions to the definition of the view.
-
Group By: tab that allows you to add GROUP BY attributes to the view.
-
Output: tab that allows you to configure the output of the view. That is, renaming the view and its fields, add derived attributes, define the primary key of the view, etc.
-
Metadata: tab that allows you to define the folder where the new view will be stored and provide a description for the new view.
The tabs Where Conditions, Group By, Output and Metadata work in the same way as in the Union view dialog. The section Creating Union Views explains in more detail how to use them.
In our example:
-
Click on Join on the menu File > New.
-
Drag the views incidents and average_monthly_sales to the “Model” tab.
-
Add the join condition incidents.taxId=average_monthly_sales.taxId by dragging the name of the field taxId of one of the tables, to the field taxId of the other table.
-
In the “Output” tab:
-
Rename the view to incidents_sales.
-
Remove the field incidents.taxId. To do this, select this field and click on Remove selected.
-
After this, click Save to create the view. Then, the Tool will display the schema of the new view.
Join Conditions with Similarity Operators
Virtual DataPort supports join conditions that use the similarity operator. This operator returns a value between 0 and 1 that estimates the similarity between the operands using a certain algorithm. As well as the operands to compare, this operator receives the similarity algorithm to use and a minimum similarity threshold as parameters. Where the similarity between operands reaches the threshold, the condition is true. Otherwise, it is false.
The operator ~ (Alt-126) evaluates the similarity between operands of type text. The algorithms available are based on the edition distance between two texts:
-
ScaledLevenshtein
-
JaroWinkler
-
Jaro
-
Level2Jaro
-
MongeElkan
-
Level2MongeElkan
It also includes algorithms based on the appearance of words in both texts:
-
TFIDF
-
Jaccard
-
UnsmoothedJS
Besides, it incorporates combinations of some of these algorithms. I.e. JaroWinklerTFIDF.
If no algorithm is specified, Virtual DataPort chooses the one to apply. In that case, only the similarity threshold has to be specified.
Example: consider a variation of the example in which customers are identified by their name instead of their taxId. Unfortunately, the name of the customers in the incidents databases does not exactly match their name in the sales Web Service and, therefore, a join operation with the equality operator will not give good results. In this case, a join can be used with a similarity operator to solve this problem.
The incidents example included with the Denodo Platform can be used to reproduce this situation. Two additional relational tables are included known as internet_inc_cname and phone_inc_cname. These tables are similar to the internet_inc and phone_inc tables used in the example, although they include a customer_name attribute to indicate the name of the customer instead of the taxId attribute. The sales Web Service also includes an operation known as GetAverageMonthlyRevenueCName that returns a compound data element containing the name and the monthly revenue of all the customers. To solve the problem posed by this example, follow the steps below:
-
Create the JDBC base views for the tables internet_inc_cname and phone_inc_cname tables (see section Creating Base Views from a JDBC Data Source).
-
Open the Web service “sales”, click on Create base view beside the operation GetAverageMonthlyRevenueCName operation and select Do not stream output. Then, click Ok to see the schema of the new view and click Save to create the view.
-
Join the base views created for internet_inc_cname and phone_inc_cname (see section Creating Join Views).
-
“Flatten” the compound data element returned by the view created from the GetAverageMonthlyRevenueCName operation. See section Creating Flatten Views for a detailed explanation of the “flattening” process.
-
Define a join view using the customer_name field between the view obtained as a result of step 3 and the view obtained as a result of step 4. Specify operator ~ as the join operator and set the similarity threshold to 0.7.
Note: The simple editor is not currently unavailable in the Design Studio. You need to manually set the similarity condition of the join with the following syntax: join_incidents_cname.customer_name ~ (flatten_revenue_cname.cname,0.7). To specify the similarity algorithm for this operator, add the name of the algorithm as the third parameter surrounded with single quotes (e.g. join_incidents_cname.customer_name ~ (flatten_revenue_cname.cname,0.7,'Jaro'))
-
Run the view obtained in step 5. The join is made correctly, despite the fact that the customer names in the input views are not exactly the same.
Selection Views
A Selection view executes the relational algebra operations of selection (filtering) and projection on an input view. This view can belong to a different database.
To create a selection view, click Selection on the File > New menu or right-click on the Server Explorer and click Selection on the New menu.
The Tool will open the “Selection view” dialog and it will add the view that is currently selected in the Server Explorer. To change the input view, click the button X to remove the current view from the “Model” tab and drag another view from the Server Explorer.
In our example, we will drag the view incidents_sales created in the section Creating Join Views.
The “Selection view” dialog has five tabs:
-
Model: tab where you have to drag the view that will be the source of the new view. You can drag a view from another database.
In this tab, you can add view parameters.
-
Where Conditions: tab that allows you to add WHERE conditions to the definition of the view.
If you add WHERE conditions, you have to select one of the WITH CHECK OPTION clause options to decide if you want to force every row that is inserted or updated, to conform to the definition of this view (see section Use of WITH CHECK OPTION of the VQL Guide). The available options are:
-
None: nothing is checked when a client executes INSERT operations on this view.
-
Local: the selection condition of this view is checked when a row is inserted or updated. It does not check the conditions of the lower-level views (the views that participate in the definition of this view).
-
Cascade: the projection conditions of this view and the lower-level views are checked when a row is inserted or updated.
If the “Automatic simplification of queries” is enabled, when executing an INSERT/UPDATE/DELETE query over a derived view, the Server assumes that this view was created with the option “WITH CHECK OPTION”. As a result, the Server checks that the data inserted/updated/deleted meets the WHERE condition of the definition of the view.
To check if this option is enabled, click “Queries optimization” on the menu Administration > Server configuration.
-
-
Group By: tab that allows you to add GROUP BY fields to the view.
-
Output: tab that allows you to configure the output of the view. That is, renaming the view and its fields, add derived attributes, define the primary key of the view, etc.
-
Metadata: tab that allows you to define the folder where the new view will be stored and provide a description for the new view.
The tabs Where Conditions, Group By, Output and Metadata work in the same way as in the Union view dialog. The section Creating Union Views explains in more detail how to use them.
You will now create two views for our example:
-
Right-click on the view incidents_sales in the Server Explorer and click Selection, on the menu File > New. You can see that the Tool has added this view to the Model tab.
-
In the Where Conditions tab, add the condition revenue>600 to obtain data only of those clients with an average monthly volume of sales that exceeds 600 euros.
-
In the Output tab, rename the view to pref_clients_inc_sales.
After this, click Save to create the view. Then, the Tool will display the schema of the new view.
Now, you have to create the second view, which has a GROUP BY field:
-
Right-click on the view pref_clients_inc_sales in the Server Explorer and click Select, on the File > New menu.
-
In the Group By tab, do the following:
-
Select the Use group by check box.
-
Select the taxTd and revenue fields and click Add >>.
-
-
In the Output tab, do the following:
-
Rename the view to inc_grouped_by_pref_clients.
-
Click New Aggr. expression. The name of the new field isnum_incidents and the expression is COUNT (*).
-
-
Click Save.
Creating Conditions with the Compound Values Editor
NOTE: The Compound values editor is not currently available on the Design Studio and it is necessary to write the value directly using VQL syntax. Notice that future updates will extend the current options to add the Compound values editor.
All the dialogs to create and edit derived views (join, union, minus, intersect, flatten and selection) have a Where Conditions tab to add WHERE conditions to the definition of the view. When using the Simple condition mode of this tab, you have two options to create constants of compound types (register or array):
-
Write the value directly in VQL syntax. E.g. { ROW( 'B78596011' ), ROW( 'B78596012' ) }
See section Conditions with Compound Values of the VQL Guide for more details about this syntax.
-
Or, use the Compound values editor. To open it, click beside the right operand (only available for register or array fields).
As an example, we explain how to create a new Selection view that has a WHERE condition with a compound value. But first, we need to create a new base view:
-
Open the Web Service data source sales (created in the section Importing SOAP Web Service Sources) and click Create base view.
-
Click Create base view beside the operation getSumRevenuebyTaxTds.
-
Rename the new base view to RevenueSum .
-
Rename its input attribute to clients and the return one, to totalrevenue.
-
The clients attribute is an array of registers of the type getsumrevenuebyids_in0. Virtual DataPort generates this type automatically when it creates the view.
-
Each element in this array will be a register of the type getsumrevenuebyids_in0_string, which is also automatically generated by Virtual DataPort.
-
Rename the element of the clients array to taxTd.
-
Click Save to create the base view.
Parameters of Derived Views
A view can have View Parameters in addition to regular columns. View parameters are useful when you want to build a view containing a where condition but you do not want to specify a fixed filter value on the view definition, as you want it to be dependent on the query instead.
For example, let us say that we have two views:
-
A view client with these fields: name, income, and state.
-
And a view wealthy_client_by_state defined as:
CREATE VIEW wealthy_client_by_state AS
SELECT state, COUNT(*)
FROM client
WHERE income > 1000000
GROUP BY state
There is a limitation in the second view: the limit of income to consider a client wealthy is static. Therefore, we have to know this limit before creating the view. If we wanted to change this limit at runtime, we could remove the WHERE condition and add the field INCOME to the GROUP BY fields. But then, we would have to add this field to the GROUP BY
clause and we might not want to do that.
To avoid this problem, add a “View parameter” to the view. When you add a view parameter to a view, that parameter will be part of the output schema and you can use it in the same places you use a regular field.
Using view parameters, you can modify our previous example so you can specify the limit of income at runtime. To do this, edit the view and follow these steps:
-
Click the Model tab and then, click Edit beside “View parameters”. In this dialog, click Add new parameter to add the new view parameter.
Set the name of the parameter to wealthy_client_income_limit, select the type long and set the default value of the parameter to 1000000.
-
Click Where Conditions and set the condition income > wealthy_client_income_limit.
The view will have a new output field wealthy_client_income_limit, which you can use to change the condition at runtime. For example,
SELECT *
FROM wealthy_client_by_state
WHERE wealthy_client_income_limit = 250000
Will take into account clients with an income greater than 250,000.
As the parameter has a default value, if you execute
SELECT *
FROM wealthy_client_by_state
The query will take into account clients with an income greater than 1,000,000.
Considerations about View Parameters
View parameters are a special type of fields and they behave differently so take the following into account.
-
There are two ways of assigning a value to a view parameter:
-
In the WHERE condition of the query, with the operator = : WHERE <view parameter> = <value>
Where <value> is one of these:
-
A literal
-
An expression that only uses literals, not other columns. For example: WHERE <view parameter> = ADDYEAR( LOCALTIMESTAMP, -10)
-
Another view parameter.
-
-
In the JOIN condition of the query, also with the operator
=
, not with other operators. In this case, you can use the name of the column of the other view. For example: <view A> INNER JOIN <view B> ON <view A>.<column 1> = <view B>.<view parameter>
-
-
If the view parameter does not have a default value, it is mandatory for the query to assign a value to the parameter. Otherwise, the execution engine will reject the query.
-
If a query has a condition that compares a view parameter with another value, using an operator other than
=
, this query will use this condition to filter the results, not to assign a value to the parameter. For example, if the query has this condition: WHERE wealthy_client_income_limit > 100000This condition uses the operator > so it is not assigning a value to the view parameter “wealthy_client_income_limit”. In this case, if this view parameter does not have a default value, the query will fail. If this parameter has a default value, the query will use the default value of this parameter and it will return the rows that meet the condition “wealthy_client_income_limit > 100000”.
-
If a query uses the operator NOT, the query is not assigning a value to the parameter even if it uses the operator =. For example: WHERE NOT (wealthy_client_income_limit = 100000)
This condition is NOT assigning a value to the parameter.
-
If a query uses the operator
IN
to assign a value to a parameter, the result is a UNION. For example:FROM wealthy_client_by_state
WHERE wealthy_client_income_limit IN (100000, 2000000)
The result of this query is equivalent to this:
SELECT ...
FROM wealthy_client_by_state
WHERE wealthy_client_income_limit = 100000
UNION
SELECT ...
FROM wealthy_client_by_state
WHERE wealthy_client_income_limit = 2000000
-
Do not assign the result of a subquery to a view parameter because it will not be considered an assignment of value. For example, let us say that you execute this query:
SELECT *
FROM wealthy_client_by_state
WHERE wealthy_client_income_limit = ( SELECT MAX(income_limit) FROM state_limit WHERE state = 'CA' )The result of the subquery will not be assigned to the parameter wealthy_client_income_limit, even if you use the operator “=”. The condition will be used to filter the results and if the parameter does not have a default value, the execution engine will reject the query.
When Should You Use View Parameters
There are four main situations where view parameters are useful:
-
Enforce a filter condition
-
Group By Bypass
-
Outer Join Bypass
-
Partitioned UNIONs
Enforce a Filter Condition
You can add a parameter to force the user to always query a view specifying some filter. For instance, in cases where a facts table contains a high volume of data, you could use view parameters to force users to always query within a range of dates (start_date, end_date).
Group By Bypass
When you define an aggregation view, there are situations where you may want to query the view specifying a filter condition over a field that is not part of the group by. The previous example is a group by bypass scenario.
Outer Join Bypass
In a similar way as the group by bypass, there are situations where you may want to define an OUTER JOIN applying a filter on the inner side without filtering any of the rows coming from the outer side.
For example, let us say we are a retailer company and we want to obtain the list of all our products, and for the ones that have been returned by customers, we want the reason why the product was returned. We could create a view items_return_details as:
CREATE VIEW items_return_details AS
SELECT p.prod_name, p.prod_category, r.reason_cat, r.comments
FROM item p LEFT JOIN returns r ON(p.product_id = r.product_id)
Now imagine that we want the same information from all products but we only want the return information if the product was returned within the last 12 months. If we use the following query:
SELECT prod_name, prod_category, reason_cat, comments
FROM items_return_details
WHERE return_date >= addmonth(CURRENT_TIMESTAMP, -12)
We would not get the complete list of products, but only the ones that were returned in the last 12 months. The reason for this is that in SQL the WHERE clause is applied after the JOIN clause. Therefore, after obtaining all the products with the return information it will filter all the ones that were not returned in the last 12 months, and that includes all the products that were not returned at all.
Editing the view items_return_details and adding the condition to the ‘WHERE conditions’ tab would have the same problem as the conditions specified there are applied after the JOIN, not before.
In order to get the desired query, the SQL should look like the following:
SELECT p.prod_name, p.prod_category, r.reason_cat, r.comments
FROM PRODUCT p LEFT JOIN
(SELECT * from returns WHERE return_date >= addmonth(CURRENT_TIMESTAMP, -12)) r
ON(p.product_id = r.product_id)
But if you want to use a predefined view like items_return_details and still allow certain filters over the INNER side of an OUTER, you can follow these steps:
-
Create an auxiliary view over the view that you want to filter. In this case, we can create a view called recent_returns as a select view over returns.
-
Define a new view parameter in this auxiliary view that will pass the value to the filter condition. In our example we can create a view parameter called num_months.
-
Add a where condition using the new view parameter. In this case: return_date >= addmonth(now(), -1 * num_months).
-
Edit the view containing the OUTER JOIN to use the new auxiliary view instead.
As we are not providing a default value for num_months, this parameter becomes mandatory. Therefore, all the queries would have to provide the number of months that you want to check. For example:
SELECT prod_name, prod_category, reason_cat, comments
FROM items_return_details
WHERE num_months = 12
In this case, although the WHERE condition is set after the JOIN, as num_months is not a regular field but a view parameter, num_months = 12 is considered a parameter assignment and it will not filter any of the rows coming from the LEFT JOIN. If we don’t want to force the user to specify a number of months, and we want it to be optional, we could edit our recent_returns view in the following way:
-
Edit the view parameter to add a new default value 0
-
Change the condition to: return_date >= addmonth(CURRENT_TIMESTAMP, -1 * num_months) OR num_months = 0
Partitioned UNIONs
When you are building a partitioned UNION, each union branch needs to have a condition specifying the partitioning criterion. However, sometimes the tables do not contain a column that can be used for defining this criterion and you can use a view parameter instead. For example, let us imagine our company sells products worldwide and the sales information is partitioned in three systems:
-
One containing the information for EMEA (Europe, the Middle East and Africa)
-
One containing the information for America
-
One containing the information for APAC (Asia-Pacific)
The partition in this case is made by region, but we do not have a column in sales specifying a region. In order to do the partitioned UNION you can create the intermediate selections using view parameters. For example, for the APAC region you can:
-
Create a selection over the sales table from the APAC partition
-
Create a new view parameter ‘region‘ on that view with default value ‘APAC’
-
Add a WHERE condition region = ‘APAC’ on that new view
The same process would be needed for the views from the other partitions, replacing the ‘APAC’ value for the region corresponding to each partition.
Flatten Views
Virtual DataPort supports modeling data types with a complex structure using the types “register” and “array”. An element of the type array can be considered a sub-view and an array type always has an associated register type that acts like the schema of the sub-view it is modeling.
Sometimes you may need to “flatten” a compound field that contains an array of registers. This is particularly frequent when processing XML and Web service base views. This section describes how to do it.
NOTE: It is also possible to perform the inverse operation of “flatten” to create array elements from sets of tuples or register elements from several fields. This can be done by defining a GROUPBY operation using the NEST or REGISTER functions. See information about the NEST and REGISTER aggregation functions in the VQL Guide.
NOTE: If you need to “flatten” the fields of a register and not an array, do not create a Flatten view. Instead, create a Selection view and in the “Output” tab of the view, right-click on the register and click Project subfields of…
To follow the examples of this guide, before explaining how to create a Flatten view you have to create a Web service base view over the “sales” Web service data source (this data source was created in the section Importing SOAP Web Service Sources):
-
Double-click on the data source “sales”, in the Server Explorer and then, click Create base view.
-
Click Create base view beside the operation getAverageMonthlyRevenue and select Do not stream output.
-
Rename the base view to average_revenue_array.
-
Click Save ().
This view returns an array of elements. Therefore, it is very difficult to combine its data with data from other views. Thus, you will create a “flatten” view with the two attributes of the array: taxId and revenue.
The Flatten view dialog is very similar to the Union view dialog, except that it does not have the Associations tab. Read the section Creating Union Views to learn how the tabs of the Flatten view work.
In the “Model” tab of the view, you can add “View parameters”. The section Parameters of Derived Views explains what are view parameters and the rules you need to follow when using them.
Follow these steps to create the Flatten view of the example:
-
Right-click on the view average_revenue_array and click Flatten on the menu New.
You can see that the Tool has added this view to the Model tab. You can add a view from another database.
-
In the “Model” tab, you can see a selector next to each array field. You can select as many arrays as you want to flatten. Each time you change the selected array fields, the preview is updated showing the output schema. Following with this example, select the array return. After this, the flatten preview has two attributes: taxId and revenue.
-
In the Output tab, rename the new view to flat_revenue.
-
Click Save to create the view.
Follow these steps to create the Flatten view of the example using the Design Studio:
-
click on the three-dots icon next to average_revenue_array and click Flatten on the menu New.
You can see that the Tool has added this view to the Model tab. You can add a view from another database.
-
In the “Model” tab, click the flatten icon highlighted in this figure next to the array return. After this, the target view has two attributes: taxId and revenue.
-
In the Output tab, rename the new view to flat_revenue.
-
Click Save to create the view.
Intersect Views
An Intersect view executes the intersection operation, which selects the common rows of two or more input views. These views can belong to other databases.
For example, if we have three views with the following contents:
View_A
A | B |
---|---|
1 | a |
2 | b |
4 | d |
View_B
A | B |
---|---|
1 | a |
2 | b |
6 | f |
View_C
A | B |
---|---|
2 | b |
5 | e |
6 | f |
If we create an intersection view over the views “View_A”, “View_B” and “View_C” and query this new view, the result will be the following:
View_A Intersect View_B Intersect View_C
A | B |
---|---|
2 | b |
The result of querying this view is the common rows of the three input views.
The wizard to create new intersect views is very similar to the wizard of union views (see section Creating Union Views)
To create an intersect view, click Intersect on the File > New menu or right-click the Server Explorer and click Intersect on the New menu.
The Tool will open the “Intersect view” dialog and it will add the view(s) that are currently selected in the Server Explorer. To add more views, click the Model tab and drag the views from the Server Explorer to this tab.
IMPORTANT: All the input views of an intersect view must have the same number of fields.
The “Intersect view” dialog has six tabs:
-
Model: tab where you drag the input views and associate their fields. The wizard automatically associates the fields with the same name, but you have to associate the rest.
It is mandatory to add associations between all the fields of the input views.
In the View order box, you have to indicate the order in which the intersection(s) are performed. The view order does not affect the results of the query, but can have a great impact over its performance.
E.g. if the view order is View_a ∩ View_b ∩ View_c, the Server performs the intersection between View_a and View_b and then, between this intermediate result and View_c.If the view order is View_a ∩ (View_b ∩ View_c), the Server performs the intersection between View_b and View_c and then, between this intermediate result and View_a.
In the “Model” tab of the view, you can add “View parameters”. The section Parameters of Derived Views explains what are view parameters and the rules you need to follow when using them.
-
Associations: tab that lists the associations between fields. To delete one, click on .
-
Where Conditions: tab that allows you to add WHERE conditions to the definition of the view.
-
Group By: tab that allows you to add GROUP BY fields to the view.
-
Output: tab that allows you to configure the output of the view. That is, renaming the view and its fields, add derived attributes, etc.
-
Metadata: tab that allows you to define the folder where the new view will be stored and provide a description for the new view.
The tabs Where Conditions, Group By, Output and Metadata work in the same way as in the Union view dialog. The section Creating Union Views explains in more detail how to use them.
Minus Views
A Minus view selects all the rows from the first input view and then removes from the result the rows of the second view. If the minus view has a third input view, its rows are also removed from the result. Minus views can have two or more input views. These views can belong to different databases.
For example, if we have three views with the following contents:
View_A
A | B |
---|---|
1 | a |
2 | b |
4 | d |
View_B
A | B |
---|---|
1 | a |
2 | b |
6 | f |
View_C
A | B |
---|---|
2 | b |
5 | e |
6 | f |
If we create a minus view over the views View_A, View_B and View_C and query this new view, the Server executes the query View_A minus View_B first. The result is the following:
View_A Minus View_B
A | B |
---|---|
2 | b |
4 | d |
View_B contains the first row of View_A so it is removed from the result.
Then, the Server executes <first intermediate result> minus View_C.
The result of querying the view View A Minus View B Minus View C is the following:
View_A Minus View_B Minus View_C
A | B |
---|---|
4 | d |
As you can see, the second row of View_A is removed from the result because View_C also contains this row.
The wizard to create new minus views is very similar to the wizard of union views (see section Creating Union Views)
To create a minus view, click Minus on the File > New menu or right-click the Server Explorer and click Minus on the menu New.
The Tool will open the “Minus view” dialog and it will add the view(s) that are currently selected in the Server Explorer. To add more views, click the Model tab and drag the views from the Server Explorer to this tab.
Important
All the input views of a minus view must have the same number of fields.
The “Minus view” dialog has six tabs:
-
Model: tab where you drag the input views of the new minus view and associate the fields of the views. The wizard automatically associates the fields with the same name, but you have to associate the rest.
It is mandatory to add associations between all the fields of the input views.
In the View order box, you have to indicate the order in which the minus operation is performed. Note that the view order may affect the results of the view.
E.g. if the view order is View_a \ View_b \ View_c, the Server performs the minus operation between View_a and View_b and then, between this intermediate result and View_c.
If the view order is View_c \ View_b \ View_a, the Server performs the minus operation between View_c and View_c and then, between this intermediate result and View_a. Although in both “View orders” the views involved are the same, the result may be different.
In this tab, you can add “View parameters”. The section Parameters of Derived Views explains what are view parameters and the rules you need to follow when using them.
-
Associations: tab that lists the associations between fields. You can delete them by clicking on .
-
Where Conditions: tab that allows you to add WHERE conditions to the definition of the view.
-
Group By: tab that allows you to add GROUP BY attributes to the view.
-
Output: tab that allows you to configure the output of the view. That is, renaming the view and its fields, add derived attributes, etc.
-
Metadata: tab that allows you to define the folder where the new view will be stored and provide a description for the new view.
The tabs Where Conditions, Group By, Output and Metadata work in the same way as in the Union view dialog (see section Creating Union Views).
Interface Views
Interface views are a special type of views that consist only of a definition of fields and a reference to another view. Interface views decouple the consumer of the view from the implementation of the view. The two main use cases are:
-
Contract-based development:
-
The client application and the manager of the project in Denodo agree on a view schema.
-
Afterwards, the team that develops the views in Denodo builds the implementation view for this interface view.
-
-
Data model abstraction:
-
Useful in projects that migrate the underlying data stores to the cloud but you want to make sure the view schema does not change.
-
When the Server executes a query that involves the interface, the query is delegated to the implementation view. Therefore, the operations that can be executed on an interface view (SELECT, INSERT, UPDATE or DELETE) are the same that can be performed on its implementation view.
The status of an interface view can be:
-
Ok: all the fields of the interface view have a valid “implementation expression”. The queries involving this view will work.
-
Without implementation: the interface was created without an implementation view or the implementation view has been deleted. The queries involving this view will fail.
-
Invalid: the interface has an implementation view, but some of the fields of the interface do not have an implementation expression or it is invalid. This may happen if you create an interface with an implementation view and later, the implementation is modified so the expressions are no longer valid. The queries involving this view will fail.
Consider this regarding the “source type properties” of the fields of an interface view (e.g. Type, Type size…):
-
The fields of the interface views may or may not define their own “source type properties”:
-
If in an interface view, a field does not have its own properties, Virtual DataPort propagates these properties from the implementation view to the interface view. Thus, if the implementation view changes, the properties of the fields of the interface view will change as well.
-
If in an interface view, a field defines its own property values, these will not be updated when the implementation view changes.
For example, let us say we have an interface view “I_1” with a field called “f_1” of type “text”, without an explicit source type, and that in its implementation view, the source type of “f_1” is “VARCHAR” with length 15. If you query “I_1”, the type of “f_1” reported by the Denodo JDBC driver will be “VARCHAR(15)”.
If you edit the implementation view and change the source type to “NVARCHAR”, the driver will report the type “NVARCHAR”. But if you leave the implementation view with “VARCHAR” and you edit the interface column to “NVARCHAR”, the driver will report “NVARCHAR” and future changes of the property in the implementation view will not be reflected on the interface.
-
-
Because of the above, to change the “source type properties” of a field, do it in the “Edit view” dialog of the implementation view and not in the interface view if you want to always propagate implementation values.
See more about these properties and why are they useful in the section Viewing the Schema of a Base View.
-
If you use a view as a template to build an interface, by dropping the view on the “Definition” tab, all the field properties of the dropped view will be copied to the interface, that is, properties will be set explicitly at interface level.
When dropping a view used by the implementation of an interface or used directly as the implementation of an interface, the Administration Tool will warn that such interface will change to Without implementation status.
To create an interface view, click Interface on the File > New menu, or right-click on the Server Explorer and click Interface on the New menu. The definition of the interface will be populated with the fields of the selected view.
The Interface View wizard has three tabs:
-
Definition: in this tab, you can configure the definition of the view:
-
Change the name of the interface view.
-
Manage the name and type of the fields of the interface:
-
If you want to base the definition of the interface on the schema of an existing view, drag a view to this tab; the existing schema will be replaced with the schema of the dragged view.
-
To add a new field to the interface, click New field.
In the box beside the name of the field, you can select the type of the field. By default, this box only contains the basic data types (int, float, text, etc.). If you want to select a compound field, select the Enable compound types check box.
-
To delete a field from the interface, select its check box and click Remove selected.
-
If you have deleted a field accidentally and you want to restore it, click Restore fields. The Tool will display the “Restore Fields” dialog where you have to select the fields that you want to add again to the interface view.
-
-
-
Implementation: in this tab, you define how the fields of the interface view are mapped to the fields of the implementation view.
First, you have to add an implementation view by dragging it from the Server Explorer. The implementation view can belong to a different database than the interface view.
When adding the view, the fields with the same name are automatically linked. You can link the others by dragging a field of the definition to the field you want to link in the implementation view. Alternatively, enter a field name or an expression below, in the “Implementation Expression” column.
In the figure Creating a new interface view: “Implementation” tab you can see the following:
-
A field can be mapped to a field of the implementation (ttime and taxid) or to a complex expression (summary).
The implementation expressions must have the same type or compatible with the types of the definition. For example, if the interface has a field F_1 of type double, the implementation expression must have type double or a compatible type: int, long or float. If, for example, the field is type int, at runtime the values of this field are converted to the type of the interface, double.
-
The fields of an interface view that are mapped to a complex expression are not updateable. That is, INSERT, UPDATE or DELETE operations that involve these fields will fail.
-
The implementation view can have more fields than the definition. The extra fields are just ignored. In the figure Creating a new interface view: “Implementation” tab, the field “inc_type” of the implementation view is not mapped to any field of the definition.
When there is an error in the interface view, this tab is marked with the icon and a tooltip explaining the error. The possible reasons are:
-
The interface view does not have an implementation view.
-
Some fields of the definition do not have an “Implementation expression”.
-
-
Metadata: tab that allows you to define the folder where the interface will be stored and provide a description.