Welcome!

Article

Top Three Salespersons Ranking by Monthly Sales Amount

R language and esProc on complex logic computation

Both R language and esProc have the outstanding ability to perform the stepwise computations. However, in the particulars they differ from each other. A comparison between them will be done by the following example:

A company’s Sales department wants to select out the outstanding salespersons through statistics, that is, the salespersons whose sales amounts are always among the top 3 in each month from the January this year to the previous year. The data is mainly from the order table of MSSQL database: salesOrder, and the main fields include the ID of order: ordered, name of salesperson: name, sales amount: sales, and date of order: salesDate.

The solution is like this substantially:

1. Compute the beginning dates of this year and this month, and filter the data by date.

2. Group by month and salesperson, and compute the sales amount of each salesperson in each month.

3. Group by month, and compute the rankings of sales amount in each group.

4. Filter out the top 3 salespersons from each group.

5. Compute the set of intersections of each group, that is, salespersons always among the top 3 in each month.

The solution of R language is as shown below:

01   library(RODBC)

02   odbcDataSources()

03   conn<-odbcConnect("sqlsvr")

04   originalData<-sqlQuery(conn,'select * from salesOrder')

05   odbcClose(conn)

06   starTime<-as.POSIXlt(paste(format(Sys.Date(),'%Y'),'-01-01',sep=''))

07   endTime<-as.POSIXlt(paste(format(Sys.Date(),'%Y'),format(Sys.Date(),'%m'),'01',sep='-'))

08   fTimeData<-subset(originalData,salesDate>=starTime&salesDate<endTime)

09   gNameMonth<-aggregate(fTimeData$sales,list(fTimeData$name,format(fTimeData$salesDate,'%m')),sum)

10   names(gNameMonth)<-c('name','month','monthSales')

11   gNameMonth$rank<- do.call(c, tapply(gNameMonth$monthSales, gNameMonth$month,function(x) rank(-x)) )

12   rData<-subset(gNameMonth,rank<=3)

13   nameList<- split(rData$name, rData$month)

14   Reduce(intersect, nameList)

The solution of esProc is as shown below:

esProc

Then, let’s compare the two solutions by checking the database access firstly:

R language solution implements the data access from Line01 to 05 through relatively a few more steps, and this is acceptable considering it as the normal operations.

esProc solution allows for directly inputting SQL statements in the cell A1, which is quite convenient.

In respect of database access, R language and esProc differ to each other slightly. Both solutions are convenient.

Secondly, compare the time function:

R language solution computes the beginning dates of this year and this month through line 06-07. Judging from this point, R language is abundant in the basic functions.

esProc solution completes the same computation in A2 and B2, in which pdate function can be used to compute the beginning date of this month directly, which is very convenient.

In respect of date function, it seems that esProc is slightly better, while R language has a huge amount of 3rd-party-function library, and maybe there is any date function that is easier to use.

The focal point is stepwise computation:

Firstly, filter by date, group by month and sales person and then summarize by sales amount. The above functionalities are implemented respectively in line 8-9 for R language and cell A3-A4 for esProc. The difference is not great.

Proceed with the computation. According to the a bit straightforward thought of analysis, the steps followed should be: 1 Group by month; 2 Add the field of ranking in the group, and compute the rankings; 3 Filter by ranking, and only keep the salespersons that achieved the sales amounts ranking the top 3 in each group; 4. Finally, compute the set of intersection on the basis of the data in each group.

The corresponding codes of R language are from line 10 – 14 in the order of 2->3->1->4. In order words, rank the data in each group throughout the whole table, and then group. Have you noticed anything awkward? Although it is the ranking within the group, users of R language have to sort first and then group! This is because R language is weak in the ability to group first and then process. To barely compose the statements following the train of thought of 1->2->3->4, users of R language must have a strong technical background to handle the complex iteration statement expressions. The style of reverse thinking on this condition will greatly simplify the codes.

esProc solution completes the similar computation in the cell A5 – A8, not requiring any reverse thought. esProc users can simply follow their intuitive thinking of 1->2->3->4. This is because that esProc provides the ingenious representing style of ~. The ~ represents the current member that takes part in the computation. For this case, the ~ is each 2-dimension table in the group (corresponds to the data.frame of R language or the resutSet of SQL). In this way, ~.monthSales can be used to represent a certain column of the current 2-dimension table. By compassion, users of R language can only resort to some rather complicated means like loops to access the current member, which is more troublesome for sure.

With regard to this comparison, esProc is more intuitive with relatively more advantageous.

Next, let’s study on their abilities in computing the intermediate results.

R language allows users to view the result of each step by clicking the variable name at any time, with RStudio and other tools.

esProc provides only one official tool, that is, click the cell to view the result of this step.

Regarding this ability, esProc does not differ from R language much. Considering that R language supports for a great many of 3rd party tools, maybe there is any tool capable of providing the better observed results.

Then, let compare their abilities to reference the result.

R language users are only required to define a variable for the result of computation in each step to conveniently reference the result in the steps followed with regard to the R language solution.

esProc users can also define variables to reference, however, using the cell name as the variable name is more convenient and saves the trouble of finding a meaningful name.

Next, let’s compare their performances on set of intersection.

In the last step, the intersection set of data of every group are to be computed. R language provides the intersect function at the bottom layer, using together with Reduce function, the intersection set of multiple groups of data can be computed.

esProc provides isect function to compute the set of intersection on multiple sets, which is quite convenient.

Comparatively, R language provides the Reduce function of greater imaginary space, and esProc is easier.

As it can be seen from the above case, R language boasts the abundant fundamental functions and a huge amount of library functions from the 3rd party.

In respect of data member access, esProc provides the excellent representing style, in particular the grouping at multi-levels. By comparison, R language relies more on the loop statements.

Both esProc and R language solutions have excellent performances in respect of interaction.

About esProc: http://www.raqsoft.com/product-esproc

More Stories By Jessica Qiu

Jessica Qiu is the editor of Raqsoft. She provides press releases for data computation and data analytics.

Latest Stories
DX World EXPO, LLC, a Lighthouse Point, Florida-based startup trade show producer and the creator of "DXWorldEXPO® - Digital Transformation Conference & Expo" has announced its executive management team. The team is headed by Levent Selamoglu, who has been named CEO. "Now is the time for a truly global DX event, to bring together the leading minds from the technology world in a conversation about Digital Transformation," he said in making the announcement.
"Space Monkey by Vivent Smart Home is a product that is a distributed cloud-based edge storage network. Vivent Smart Home, our parent company, is a smart home provider that places a lot of hard drives across homes in North America," explained JT Olds, Director of Engineering, and Brandon Crowfeather, Product Manager, at Vivint Smart Home, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that Conference Guru has been named “Media Sponsor” of the 22nd International Cloud Expo, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. A valuable conference experience generates new contacts, sales leads, potential strategic partners and potential investors; helps gather competitive intelligence and even provides inspiration for new products and services. Conference Guru works with conference organizers to pass great deals to gre...
DevOps is under attack because developers don’t want to mess with infrastructure. They will happily own their code into production, but want to use platforms instead of raw automation. That’s changing the landscape that we understand as DevOps with both architecture concepts (CloudNative) and process redefinition (SRE). Rob Hirschfeld’s recent work in Kubernetes operations has led to the conclusion that containers and related platforms have changed the way we should be thinking about DevOps and...
The Internet of Things will challenge the status quo of how IT and development organizations operate. Or will it? Certainly the fog layer of IoT requires special insights about data ontology, security and transactional integrity. But the developmental challenges are the same: People, Process and Platform. In his session at @ThingsExpo, Craig Sproule, CEO of Metavine, demonstrated how to move beyond today's coding paradigm and shared the must-have mindsets for removing complexity from the develop...
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering m...
The next XaaS is CICDaaS. Why? Because CICD saves developers a huge amount of time. CD is an especially great option for projects that require multiple and frequent contributions to be integrated. But… securing CICD best practices is an emerging, essential, yet little understood practice for DevOps teams and their Cloud Service Providers. The only way to get CICD to work in a highly secure environment takes collaboration, patience and persistence. Building CICD in the cloud requires rigorous ar...
Companies are harnessing data in ways we once associated with science fiction. Analysts have access to a plethora of visualization and reporting tools, but considering the vast amount of data businesses collect and limitations of CPUs, end users are forced to design their structures and systems with limitations. Until now. As the cloud toolkit to analyze data has evolved, GPUs have stepped in to massively parallel SQL, visualization and machine learning.
"Evatronix provides design services to companies that need to integrate the IoT technology in their products but they don't necessarily have the expertise, knowledge and design team to do so," explained Adam Morawiec, VP of Business Development at Evatronix, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. In his session at @BigDataExpo, Jack Norris, Senior Vice President, Data and Applications at MapR Technologies, reviewed best practices to ...
Widespread fragmentation is stalling the growth of the IIoT and making it difficult for partners to work together. The number of software platforms, apps, hardware and connectivity standards is creating paralysis among businesses that are afraid of being locked into a solution. EdgeX Foundry is unifying the community around a common IoT edge framework and an ecosystem of interoperable components.
"ZeroStack is a startup in Silicon Valley. We're solving a very interesting problem around bringing public cloud convenience with private cloud control for enterprises and mid-size companies," explained Kamesh Pemmaraju, VP of Product Management at ZeroStack, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Large industrial manufacturing organizations are adopting the agile principles of cloud software companies. The industrial manufacturing development process has not scaled over time. Now that design CAD teams are geographically distributed, centralizing their work is key. With large multi-gigabyte projects, outdated tools have stifled industrial team agility, time-to-market milestones, and impacted P&L stakeholders.
"Akvelon is a software development company and we also provide consultancy services to folks who are looking to scale or accelerate their engineering roadmaps," explained Jeremiah Mothersell, Marketing Manager at Akvelon, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Enterprises are adopting Kubernetes to accelerate the development and the delivery of cloud-native applications. However, sharing a Kubernetes cluster between members of the same team can be challenging. And, sharing clusters across multiple teams is even harder. Kubernetes offers several constructs to help implement segmentation and isolation. However, these primitives can be complex to understand and apply. As a result, it’s becoming common for enterprises to end up with several clusters. Thi...