|By Jessica Qiu||
|May 26, 2014 11:55 AM EDT||
Recently, a performance issue occurs in a big detail report, where the original data volume in the two data tables as it involved has exceeded 3 million rows; the data volume that the report can retrieve after filtering remains at round 2 million rows. As we expected before, by using paged retrieve method, this report would present a much higher performance. However, for the fact that the user also needs to export the data from the report, the only option is that the data retrieve can be achieved from Oracle at a time. As the JDBC of Oracle runs quite slowly, it takes least 6-7 minutes to retrieve the data of more than 2 million rows from the Oracle. In general, such performance cannot meet the expectation of the customer.
Through great efforts we have made for a long period, this issue was finally solved by a way that the retrieve time can be cut down to dozens of seconds, thus to achieve the performance as the user requires. The solution mainly involves the use of parallel mechanism of esProc, and by multiple JDBCs, to achieve the synchronization of data retrieves.
HP ProLiant DL380 G7, CPU Intel Xeon E5620*2, RAM 20G, HDD Raid5 1T
Oracle11g64-bit, esProc, CentOS 6.0, JDK 1.6u43
Data Volumes of 2 Data Tables:
VM: 100K rows, 168 cols; and 65k rows after filtering
CT: 3 million rows, 16 cols; and 1.95 million rows after filtering
By utilizing the parallel mechanism of esProc, the task of retrieving data from the two Oracle tables can be divided into a number of sub-tasks, which can be executed synchronously. And then the retrieved results are merged into one for each table. Finally, the two tables with merged results are joined to return the set of results for the report.
Main Program Script
First, the database connection is created, in order to fetch the records from the two tables. Second, the data is segmented as per the number of sub-tasks (the screen shot shows the case of 16 sub-tasks), and the query range is assigned for a subroutine. Third, the subroutine is called by using callx to complete the query process. In the end, the main program is available to merge the results and produce a set of the end results which will then be returned to the report.
It is easy for the subroutine to complete the query process within a specified query range.
Pre-parallel (No Parallel)
As a result, here we see a leap from 400s to 40s in the data retrieve process.
1. What is described here covers only the part on how the data retrieve process can be expedited and how the esProc is used to generate the data source for the report in a fast way. However, the description on design and presentation of report is not mentioned here.
2. When the parallel retrieve process is used, it is required to create a connection with the database by each subtask, for example, connect(conn), which can be found in the subroutine script;
3. Parallel retrieve process is suitable for the case where the Oracle resources are spare (e.g., the number of connections is still far below the upper limit). In other words, this feature can enable the report to make full use of the database resources.
Dec. 6, 2016 12:30 PM EST Reads: 2,088
Dec. 6, 2016 12:30 PM EST Reads: 998
Dec. 6, 2016 12:28 PM EST
Dec. 6, 2016 11:54 AM EST
Dec. 6, 2016 11:45 AM EST Reads: 274
Dec. 6, 2016 11:30 AM EST Reads: 1,013
Dec. 6, 2016 11:15 AM EST Reads: 415
Dec. 6, 2016 11:00 AM EST Reads: 561
Dec. 6, 2016 10:30 AM EST Reads: 957
Dec. 6, 2016 10:30 AM EST Reads: 1,172
Dec. 6, 2016 10:15 AM EST Reads: 543
Dec. 6, 2016 09:15 AM EST Reads: 2,146
Dec. 6, 2016 09:00 AM EST Reads: 3,967
Dec. 6, 2016 09:00 AM EST Reads: 5,813
Dec. 6, 2016 09:00 AM EST Reads: 713