ETL Cron Process:
To aid in delivering reports accurately and quickly, ELIS Reports builds a data warehouse
of data from your Moodle/ELIS site. The ETL Cron
process is the tool that performs this function. ETL stands for "Extract, Transform, Load" - this describes what the cron process is doing - we extract data from your Moodle logs, ELIS and Moodle tables, etc. We transform this data into easily retrievable forms - for example for the time in course
data we run an algorithm to estimate time spent, and load the results into the data warehouse. Finally, we load the extracted and transformed results into the data warehouse for retrieval by the ELIS Reports BI
Technically the ELIS data warehouse is optimized for reading data, whereas many of the tables it pulls that data from are optimized for writing data. By extracting data from write optimized tables and loading it into read optimized tables in the data warehouse, we make the retrieval of the data much more efficient and rapid. This, along with the algorithmic processing described above, enables ELIS Reports to deliver very large recordset reports in a reasonable timeframe (see test results below for a ~50,000 user site on a Remote-Learner Level 4 cluster).
To avoid having the ETL process impact site performance, we process this data in small chunks. Every 5 minutes the process is initiated - however if the process detects the server is under heavy load from user activity or other processes, it will cancel itself, and check at the next 5 minute interval. This means that most processing is automatically done at times when your site is not under load from users or other processes. This optimization also means that processing on an active site is never really 'done' as there is always new data being delivered to the log tables by user activity. For example in the site below, out of ~5.7 million log records, there are 40 records remaining to be processed.
This small number of unprocessed records won't greatly impact the accuracy of the reports. For example if a user is showing they have spent 40 hours on the site, and today they spend another hour, their 41st hour might not show up in reports until the ETL process has caught up with today's records. If you consistently see a high number (>200) in the 'remain to be processed', then you may need to have a larger server or more server resources assigned to your RHEV
instance - contact your Remote-Learner sales representative to inquire about adding more resources to your server. A separate reporting server (dedicated only to running reports) may also be an option.
ELIS 1.9.1 has a new message that is displayed when the ETL process is in progress. The message will display the number of records processed and the number of records that remain to be processed. The following image is showing an example.
The following data provides some rough numbers for performance changes in the PHP report engine between the ELIS 1.9.1 and 1.9.2 versions.
Some specific notes:
- these numbers specifically reflect the performance related to running the Individual User and Individual Course Progress reports for a user with a small dataset, with about 50,000 users present on the site
- memory usage numbers represent the "peak" usage within the execution
- run times are measured using firebug, with the exception of the report export cases, which were measured through PHP
|Scenario Description ||Memory Usage (1.9.1 code) ||Run Time (1.9.1 code) ||Memory Usage (1.9.2 code) ||Run Time (1.9.2 code) ||Comments|
|- Individual User Report - Click on report and view the parameters screen for the first time after logging in||183.7 MB ||35.2s (onload: 31.86s) ||88.7 MB||9.14s (onload: 20.38s)|
|- Individual Course Progress Report - Click on report and view the parameters screen for the first time after logging in||183.9 MB ||36.68s (onload: 33.56s)||88.7 MB||19.28s (onload: 21.44s)|
|- Individual User Report - Running the report for the RL Admin user||298.1 MB ||25.66s ||77.7 MB||5.58s|
|- Individual Course Progress Report - Running the report for the RL Admin user||297.2 MB ||24.81s||77.8 MB||5.18s|
|- Individual User Report - Returning to the parameter screen after running the report||125.6 MB ||13.56s||77.7 MB||6.21s|
|- Individual Course Progress Report - Returning to the parameter screen after running the report||125.9 MB ||16.42s ||77.8 MB||17.15s||This is now slower because the report parameters are not cached |
|- Individual User Report - Hitting the "Refresh" button ||164.3 MB ||20.13s ||18.7 MB||0.421s|
|- Individual Course Progress Report - Hitting the "Refresh" button||163.2 MB ||25.7s||19.0 MB||0.414s|
|- Individual User Report - Sorting by first column||163.9 MB ||20.3s||18.7 MB||0.531s |
|- Individual Course Progress Report - Sorting by first column||163,7 MB ||25.48s||19.0 MB||0.607s|
|- Individual User Report - CSV export||162.8 MB ||8.4809s||19.6 MB||0.1761s|
|- Individual Course Progress Report - CSV Export||162.8 MB ||13.5448||19.7 MB||0.1699s|
|-Individual User Report - PDF Export||162.8 MB ||8.5597s||20.2 MB||0.1872s|
|- Individual Course Progress Report - PDF Export||162.9 MB ||13.8428||20.4 MB||0.2033s|