|By Jessica Qiu||
|May 26, 2014 11:55 AM EDT||
Recently, a performance issue occurs in a big detail report, where the original data volume in the two data tables as it involved has exceeded 3 million rows; the data volume that the report can retrieve after filtering remains at round 2 million rows. As we expected before, by using paged retrieve method, this report would present a much higher performance. However, for the fact that the user also needs to export the data from the report, the only option is that the data retrieve can be achieved from Oracle at a time. As the JDBC of Oracle runs quite slowly, it takes least 6-7 minutes to retrieve the data of more than 2 million rows from the Oracle. In general, such performance cannot meet the expectation of the customer.
Through great efforts we have made for a long period, this issue was finally solved by a way that the retrieve time can be cut down to dozens of seconds, thus to achieve the performance as the user requires. The solution mainly involves the use of parallel mechanism of esProc, and by multiple JDBCs, to achieve the synchronization of data retrieves.
HP ProLiant DL380 G7, CPU Intel Xeon E5620*2, RAM 20G, HDD Raid5 1T
Oracle11g64-bit, esProc, CentOS 6.0, JDK 1.6u43
Data Volumes of 2 Data Tables:
VM: 100K rows, 168 cols; and 65k rows after filtering
CT: 3 million rows, 16 cols; and 1.95 million rows after filtering
By utilizing the parallel mechanism of esProc, the task of retrieving data from the two Oracle tables can be divided into a number of sub-tasks, which can be executed synchronously. And then the retrieved results are merged into one for each table. Finally, the two tables with merged results are joined to return the set of results for the report.
Main Program Script
First, the database connection is created, in order to fetch the records from the two tables. Second, the data is segmented as per the number of sub-tasks (the screen shot shows the case of 16 sub-tasks), and the query range is assigned for a subroutine. Third, the subroutine is called by using callx to complete the query process. In the end, the main program is available to merge the results and produce a set of the end results which will then be returned to the report.
It is easy for the subroutine to complete the query process within a specified query range.
Pre-parallel (No Parallel)
As a result, here we see a leap from 400s to 40s in the data retrieve process.
1. What is described here covers only the part on how the data retrieve process can be expedited and how the esProc is used to generate the data source for the report in a fast way. However, the description on design and presentation of report is not mentioned here.
2. When the parallel retrieve process is used, it is required to create a connection with the database by each subtask, for example, connect(conn), which can be found in the subroutine script;
3. Parallel retrieve process is suitable for the case where the Oracle resources are spare (e.g., the number of connections is still far below the upper limit). In other words, this feature can enable the report to make full use of the database resources.
Sep. 29, 2016 07:15 AM EDT Reads: 1,726
Sep. 29, 2016 07:00 AM EDT Reads: 2,933
Sep. 29, 2016 07:00 AM EDT Reads: 3,483
Sep. 29, 2016 07:00 AM EDT Reads: 3,409
Sep. 29, 2016 07:00 AM EDT Reads: 2,271
Sep. 29, 2016 06:45 AM EDT Reads: 1,671
Sep. 29, 2016 06:45 AM EDT Reads: 3,179
Sep. 29, 2016 06:15 AM EDT Reads: 1,471
Sep. 29, 2016 06:00 AM EDT Reads: 2,760
Sep. 29, 2016 06:00 AM EDT Reads: 2,679
Sep. 29, 2016 06:00 AM EDT Reads: 524
Without a clear strategy for cost control and an architecture designed with cloud services in mind, costs and operational performance can quickly get out of control. To avoid multiple architectural redesigns requires extensive thought and planning. Boundary (now part of BMC) launched a new public-facing multi-tenant high resolution monitoring service on Amazon AWS two years ago, facing challenges and learning best practices in the early days of the new service. In his session at 19th Cloud Exp...
Sep. 29, 2016 05:45 AM EDT Reads: 1,052
Sep. 29, 2016 05:45 AM EDT Reads: 1,502
Using new techniques of information modeling, indexing, and processing, new cloud-based systems can support cloud-based workloads previously not possible for high-throughput insurance, banking, and case-based applications. In his session at 18th Cloud Expo, John Newton, CTO, Founder and Chairman of Alfresco, described how to scale cloud-based content management repositories to store, manage, and retrieve billions of documents and related information with fast and linear scalability. He addres...
Sep. 29, 2016 05:30 AM EDT Reads: 2,961
Apache Hadoop is a key technology for gaining business insights from your Big Data, but the penetration into enterprises is shockingly low. In fact, Apache Hadoop and Big Data proponents recognize that this technology has not yet achieved its game-changing business potential. In his session at 19th Cloud Expo, John Mertic, director of program management for ODPi at The Linux Foundation, will explain why this is, how we can work together as an open data community to increase adoption, and the i...
Sep. 29, 2016 05:30 AM EDT Reads: 1,148