Topics Map > Technology > Denodo
Denodo - Setting up Caching
Setting Up a Cache Database
To configure a cache database, use these steps.
- Provision a JDBC compatible database.
- This happens outside of Denodo. Perhaps this is a request to your DBA team to create a database for this purpose.
- JDBC compatible databases include SQL Server, Oracle, MySQL, Postgres, etc.
NOTE: A full list of supported databases can be found here: https://community.denodo.com/docs/html/browse/8.0/en/vdp/administration/cache_module/cache_module
- From the Denodo Design Studio, create a new JDBC data source called ds_cache.
- From the data sources folder, select the context menu, then select New, then select Data Source, then select JDBC.
- Configure the connection for the provisioned cache database.
- In Denodo, select the context menu, then select Administration, then select Database management.
- Select the context menu for your VDB, then select Cache.
- On the Cache configuration for database X screen,
- Disable Default Configuration.
- Enable Cache status.
- Disable Default data source.
- Enable Specify and existing data source as cache.
- Specify your database name.
- Select ds_cache.
- Optional: Enter a Default Time To Live.
- The time to live is how long data in the cache will be valid before becoming invalid. Data in an invalid state will not be returned when querying the view from the cache.
- Never expire is recommended to avoid confusion with queries in the future.
- Optional: Enable maintenance and enter a Maintenance period.
- It is recommended to Disable maintenance and create a Scheduler job to handle cache maintenance. This allows more control over when cache maintenance takes place. It is recommended that cache maintenance be performed at an off-peak time. Instructions for creating a Scheduler job for cache maintenance is in the next portion of this guide.
- If you cannot create a Scheduler job, then enable maintenance. Running maintenance daily is generally sufficient.
- Select Ok.
Setting Up Cache Maintenance
When data in the Denodo cache expires or is made invalid, it sets a flag on that row in a table within the cache database indicating that it is invalid; it does not physically remove the row. Cache maintenance cleans invalid rows out of the cache tables in the database
Cache maintenance can be performed by executing the built-in stored procedure CLEAN_CACHE_DATABASE. It is recommended to use the Denodo Scheduler for both the maintenance of the cache, and for tasks that keep the cache data up to date. This section focuses on cache maintenance.
- Login to Denodo Scheduler
- If you haven't already, create a project to hold your Scheduler content.
- Select Projects, then Add Project.
- Enter a name and description, then select save.
- If you haven't already, create a connection to your VDB.
- Select Data Sources, then Add data source, then VDP
- Enter the appropriate Project, Name, Connection URI, and user credentials.
- Select Jobs, then Add Job, then VDP.
NOTE: Do not select VDPCache. - Select the appropriate project, and enter a name and description.
- Select the Extraction section tab.
- Provide the appropriate data source and enter "call clean_cache_database('[name of database]');" in Parameterized query.
- Select the Triggers section tab.
- Select Add Trigger
- This is the section that defines your schedule which uses Cron.
- To run the job at 4AM every day simply put 4 for the Hours box.
- Select Save
- Once saved, you will see an overview of your job. Notice that the “Next execution” value is 4 AM the following day.
- To test your job, select the context menu on the far right and select Start.
Cache usage
First let’s consider the types of things you would do with a cache database, as this may influence how we configure the cache.
API-based sources
One of Denodo’s most powerful features is its ability to server up real-time data. However, there are use cases where real time data isn’t needed and use cases where things break down when attempting to do them in real time. The use of API based sources tends to break down, especially if you’re trying to run an analytic workload. So far we have talked about Salesforce, Box, and SharePoint integrations – all of these are API based integrations.
Analytic workloads tend to use a large volume of data. Pulling large volumes of data, or even medium volumes of data, over API interfaces can be very slow. Furthermore, many API based sources will limit the amount of data you can pull in a single call creating a situation that is difficult to manage. Finally, since these sources are not databases, Denodo cannot push the query logic down to those sources, meaning that Denodo will have to do the query processing itself, consuming valuable resources within the Denodo platform.
Enabling the cache for base views that are sourced from API connections leads to many advantages:
- The query doesn’t have to call the API and pull the data through a slow interface. Instead, the data will come from the cache DB giving you a much faster throughput of the data.
- If all the views used in a query are cached, then Denodo can push down the entire query to the cache DB instead of doing a lot of processing in the Denodo platform.
- You can even set up indices on views that are cached to even further boost performance of pushed down queries.
Views with large amounts of processing
Views that involve integrating serval systems, complex transformation logic, or heavy aggregations can consume a lot of system resources to process. The cache can help with this situation. By enabling the cache, the processing of the complex view can happen once at the time of caching. Consumers of the view will then pull from the cache where the heavy processing has already been completed.
Cache a View
To cache a view simply do the following.
- Open the view that you want to cache.
- Select Options, then select a cache option.
- Select Save.
There are several cache modes, each with their own nuances. Below is a brief description of each mode that is meant to be a very quick guide to help decide which cache mode is best for your needs. This is to designed to be brief, not complete. You can find complete and detailed information about the cache including the cache modes here:
https://community.denodo.com/docs/html/browse/8.0/en/vdp/administration/cache_module/cache_module
Cache Mode - Full without Incremental
This is the simplest cache mode and used the most frequently. In this cache mode, all queries against the view will come from the cache. If the cache has not yet been loaded, the query will return no rows.
This is a good mode to use if you have views built on slow sources (like an API based source) that do not change frequently or you do not need real-time data.
To load or refresh the cache:
- Select the Execution Panel of the view.
- Select the box for Store results in cache, Retrieve all rows, and Invalidate existing results.
- Select Execute.
Executing in the above fashion will query the source, and all rows retrieved from the source will be loaded into the cache. Any rows that were in the cache from a previous load will be invalidated, thus only the rows from this execution will be present in the cache.
You must load the cache explicitly either manually (like described above) or using Scheduler.
Cache Mode – Full with Incremental
This mode is designed to produce more up-to-date data. When a view with this cache mode is queried, Denodo will pull from the cache, but will also query the source for any rows updated since the cache was last refreshed. The rows are then merged together before completing the request.
This method requires the following:
- The source must have a reliable timestamp of when the row was last updated and that timestamp must be synchronized with the Denodo server.
- The view must be a base view and have a primary key.
- The source must allow push downs of the query with a where clause containing the last update timestamp.
Loading of the cache still needs to be done explicitly either manually or using Scheduler.
Cache Mode – Partial
Partial cache is tricky, so all of the details will not be explained here. The short of it is that when you query a view in this cache mode Denodo will see if the data is in the cache, if so, it uses the data in the cache, if not, it pulls it from the source. If Explicit loads is disabled, then queries that must pull from the source are then loaded in the cache so that the next query for the same data will use the cache.
Details on this cache mode can be found here: https://community.denodo.com/docs/html/browse/8.0/en/vdp/administration/cache_module/cache_modes/partial_mode