Denodo - Creating Data Sources from Box
Box integrations use Box “applications” which are authenticated with the OAuth 2.0 specification. OAuth 2.0 is a token-based authentication system. This document will provide instructions for the creation of the Box application and the configuration of OAuth 2.0.
Files
Best Practice: Use CSV files rather than Excel files. CSV files tend to be more stable in Denodo in general and we have also noticed that Excel files accessed over HTTP paths have limitations. Excel limitations can be avoided by converting them to CSV before integrating them with Denodo.
Best Practice: Use the same basic format for all your CSV files. Not only is consistency a good idea, but it will make the Box integration more stable as well. In this document we will standardize on using the comma ( , ) character for the column delimiter, \n for the end of line delimiter, and the first row will have column headers.
Each file in Box is given a unique ID. When you click on a file in box the number just to the right of “file/” in the URL is the file ID.
Example: https://app.box.com/file/1471507741297. The file ID is 1471507741297. You will need to know the IDs of the files you want to integrate in Denodo.
Creating Box Custom Applications
The first step is to create an "application" in the Box Dev Console. This creates the account that will be used to authenticate into Box.
- Log into the Box Developer Console: https://uofi.app.box.com/developers/console
- Select the Custom App option.
- NOTE: If you have already created an app, click the Create New App button in the top right.
- Enter a name for the app according to your naming standards in the App Name field.
- Enter a concise Description.
- For Purpose, select Integration.
- For Categories, select Collaboration.
- For Which external system are you integrating with?, enter “Denodo” into the field.
- For Who is building this application?, select Customer.
- Click Next.
- Select User Authentication (OAuth 2.0).
- Click Create App.
- Select the Configuration tab.
- Scroll down to the OAuth 2.0 Credentials section.
- Copy and Paste the Client ID and Client Secret to a convenient, easily accessible location, such as Keepass.
- NOTE: The developer token is not necessary for our use. However, you will need the Client ID and Client Secret for the data source configuration in Denodo.
- In the OAuth 2.0 Redirect URIs section, add the OAuth 2.0 Redirect URI of the environment you'll be using with this app and click Add. The URIs for the different environments in Denodo are listed below:
Production: https://datahub.uillinois.edu/oauth/2.0/redirectURL.jsp
QA: https://qa.datahub.uillinois.edu:9443/oauth/2.0/redirectURL.jsp
Dev: https://dev.datahub.uillinois.edu:9443/oauth/2.0/redirectURL.jsp
- In the Application Scopes section, make sure the check boxes are checked for both Read all files and folders stored in Box and Write all files and folders stored in Box.
- Click the Save Changes button in the top right.
- Click the My Apps option on the left will bring you back to your list of application.
- Click the Create New App button in the top right and create 2 more Box applications - one for the QA environment and one for the Production environment,
- After creating these three applications, your list of apps should include three for Denodo.
OAuth 2.0 Tokens
This document will not cover all the details of the OAuth 2.0 specification, but it is helpful to know a little bit about what is happening so the next steps make a little more sense.
In the above steps we created a Box Custom Application which generated a Client ID and Client Secret. You can think of this Client ID and secret like an application service account, with a password. However, the Client ID and Secret are not sufficient to fully authenticate into Box and access data. Instead, the client application, Denodo in this case, will request an access token from the Box OAuth server. When Denodo asks for an access token the OAuth server returns both an access token and a refresh token. This access token is used to make requests for data within Box. However, the access token is only good for about an hour. After that time the client application (Denodo) will need to use the refresh token to request a new access token from the OAuth server. The OAuth server returns a new access token and a new refresh token. This new refresh token must be used to make the next request for a new access token.
Denodo will perform the actions of using refresh tokens to get new access tokens, but there are things that cause this process to become unstable. The suggestions in this document are meant to help avoid those issues.
Create a Date Source in Denodo
For this document we will use CSV files. In this example we have 3 CSV files of course data, instructor salary data, and awarded grant data.
Best Practice: Only use the Box App credentials in a single Denodo connection. Creating more than one Denodo connection that re-uses the same Box App Client ID will cause the access tokens associated with that Client ID to interrupt one another, and your integration will become unstable.
Since we do not want to use the same Box app credentials for more than one connection, it is a good idea to make the data source reusable. The below instructions show how to make a reusable CSV connection to Box.
- In Denodo Design Studio, click the context menu for 01 - Data Sources and select New > Data source > Delimited file.
- Give the data source a name.
- For Data route, select HTTP Client.
- Leave the HTTP method set to GET, and add the following URL to the Base URL field.
https://api.box.com/2.0/files/@FileID/content
The link above uses an interpolation variable for the File ID, which is what will make this connection reusable.
- Expand the Authentication section, and select OAuth 2.0 from the Authorization options.
- Leave the Authorization Grant value set to Authorization code grant, and paste the Client identifier and Client secret from your Box app in their respective fields.
- Click the link titled Launch the OAuth 2.0 credentials wizard to help you obtain these credentials, which should now be enabled. If the link is not enabled, go to the Tools tab at the top right and select OAuth credentials wizards > OAuth 2.0 wizard.
- Populate the Token endpoint URL and the Authorization server URL fields with the following values.
Token endpoint URL: https://account.box.com/api/oauth2/token
Authorization server URL: https://account.box.com/api/oauth2/authorize
- For Redirect URI, select the Other radio button and populate that field with one of the following redirect URIs, based on which environment you're working with.
Production: https://datahub.uillinois.edu/oauth/2.0/redirectURL.jsp
QA: https://qa.datahub.uillinois.edu:9443/oauth/2.0/redirectURL.jsp
Dev: https://dev.datahub.uillinois.edu:9443/oauth/2.0/redirectURL.jsp
- Expand 2. Generate the Authorization URL, click the Generate link to populate the Authorization URL field, and then click the Open URL link.
- If a web page opens asking you to Grant access to Box, click on that blue button to grant access.
- Copy the URL on line 3 of the web page that opens.
- Expand 3. Paste the authorization response URL and paste the link you just copied.
- Expand 4. Obtain the OAuth 2.0 credentials and click the OAuth 2.0 credentials link.
- You should receive the following confirmation that the OAuth 2.0 credentials were obtained.
- Click the Copy credentials to the clipboard link, and then click OK.
- Scroll down to the Column delimiter field and add a comma ( , ), and check the box next to Header.
- Click Save.
- Once the data source is saved, we can click Test Connection, which will bring up an Interpolation variable screen where we'll need to enter a File ID for the first CSV file in Box. See the Files section above to learn how to obtain these File IDs. Click OK.
Best practice: Once your connection is initially tested, avoid using the “Test Connection” feature. We have found that doing so could introduce an issue with the authentication.
Create Base Views
In this document we will create 3 base view to illustrate how to reuse the box data source for multiple CSV files.
- Open the Data Source created above and click the Create Base View link.
- You will be prompted for a FILEID. Enter the File ID of the first CSV file and click Ok.
- In the context menu for the base view we just created, select Rename and give the base view a new name.
- Drag the base view from the 01 - Connectivity / 01 - Data Sources folder to the 01 - Connectivity / 02 - Base Views folder.
- Repeat the steps above to create base views for the other two CSV files. When complete, you should have one data source and three base views.
Create Derived Views that Encode the File IDs in the View
There is one aspect of this solution to make the data source reusable that is a little annoying. If we were to execute these base views, we would be prompted for the File ID each time. If we use the wrong File ID, our query may not return any data. We can take care of that by creating integrated views of these base views.
There are a few ways to accomplish creating these integrated views.
- Using the GUI
- In the context menu for the base view, select New > Selection.
- On the Where Conditions tab, enter fileid = 1652491525507
- Using a VQL script
- Run the following script in the VQL Shell to create the integrated view and have them placed in the 02 - Integration folder. Note: your integrated view name, folder names, base view name and fileid will be unique to your data source and VDB.
CREATE OR REPLACE VIEW iv_box_csvs_course_schedule_data FOLDER = ' / ahall1 / 02 - Integration'
AS
select * from bv_box_csvs_course_schedule_data where fileid = '1652491525507';
Now, we can query these integrated views directly, without having to enter a File ID each time.
Note: You may need to click the File tab at the top and select Refresh to see the new integrated views you just created.