Welcome!

Article

Top Three Salespersons Ranking by Monthly Sales Amount

R language and esProc on complex logic computation

Both R language and esProc have the outstanding ability to perform the stepwise computations. However, in the particulars they differ from each other. A comparison between them will be done by the following example:

A company’s Sales department wants to select out the outstanding salespersons through statistics, that is, the salespersons whose sales amounts are always among the top 3 in each month from the January this year to the previous year. The data is mainly from the order table of MSSQL database: salesOrder, and the main fields include the ID of order: ordered, name of salesperson: name, sales amount: sales, and date of order: salesDate.

The solution is like this substantially:

1. Compute the beginning dates of this year and this month, and filter the data by date.

2. Group by month and salesperson, and compute the sales amount of each salesperson in each month.

3. Group by month, and compute the rankings of sales amount in each group.

4. Filter out the top 3 salespersons from each group.

5. Compute the set of intersections of each group, that is, salespersons always among the top 3 in each month.

The solution of R language is as shown below:

01   library(RODBC)

02   odbcDataSources()

03   conn<-odbcConnect("sqlsvr")

04   originalData<-sqlQuery(conn,'select * from salesOrder')

05   odbcClose(conn)

06   starTime<-as.POSIXlt(paste(format(Sys.Date(),'%Y'),'-01-01',sep=''))

07   endTime<-as.POSIXlt(paste(format(Sys.Date(),'%Y'),format(Sys.Date(),'%m'),'01',sep='-'))

08   fTimeData<-subset(originalData,salesDate>=starTime&salesDate<endTime)

09   gNameMonth<-aggregate(fTimeData$sales,list(fTimeData$name,format(fTimeData$salesDate,'%m')),sum)

10   names(gNameMonth)<-c('name','month','monthSales')

11   gNameMonth$rank<- do.call(c, tapply(gNameMonth$monthSales, gNameMonth$month,function(x) rank(-x)) )

12   rData<-subset(gNameMonth,rank<=3)

13   nameList<- split(rData$name, rData$month)

14   Reduce(intersect, nameList)

The solution of esProc is as shown below:

esProc

Then, let’s compare the two solutions by checking the database access firstly:

R language solution implements the data access from Line01 to 05 through relatively a few more steps, and this is acceptable considering it as the normal operations.

esProc solution allows for directly inputting SQL statements in the cell A1, which is quite convenient.

In respect of database access, R language and esProc differ to each other slightly. Both solutions are convenient.

Secondly, compare the time function:

R language solution computes the beginning dates of this year and this month through line 06-07. Judging from this point, R language is abundant in the basic functions.

esProc solution completes the same computation in A2 and B2, in which pdate function can be used to compute the beginning date of this month directly, which is very convenient.

In respect of date function, it seems that esProc is slightly better, while R language has a huge amount of 3rd-party-function library, and maybe there is any date function that is easier to use.

The focal point is stepwise computation:

Firstly, filter by date, group by month and sales person and then summarize by sales amount. The above functionalities are implemented respectively in line 8-9 for R language and cell A3-A4 for esProc. The difference is not great.

Proceed with the computation. According to the a bit straightforward thought of analysis, the steps followed should be: 1 Group by month; 2 Add the field of ranking in the group, and compute the rankings; 3 Filter by ranking, and only keep the salespersons that achieved the sales amounts ranking the top 3 in each group; 4. Finally, compute the set of intersection on the basis of the data in each group.

The corresponding codes of R language are from line 10 – 14 in the order of 2->3->1->4. In order words, rank the data in each group throughout the whole table, and then group. Have you noticed anything awkward? Although it is the ranking within the group, users of R language have to sort first and then group! This is because R language is weak in the ability to group first and then process. To barely compose the statements following the train of thought of 1->2->3->4, users of R language must have a strong technical background to handle the complex iteration statement expressions. The style of reverse thinking on this condition will greatly simplify the codes.

esProc solution completes the similar computation in the cell A5 – A8, not requiring any reverse thought. esProc users can simply follow their intuitive thinking of 1->2->3->4. This is because that esProc provides the ingenious representing style of ~. The ~ represents the current member that takes part in the computation. For this case, the ~ is each 2-dimension table in the group (corresponds to the data.frame of R language or the resutSet of SQL). In this way, ~.monthSales can be used to represent a certain column of the current 2-dimension table. By compassion, users of R language can only resort to some rather complicated means like loops to access the current member, which is more troublesome for sure.

With regard to this comparison, esProc is more intuitive with relatively more advantageous.

Next, let’s study on their abilities in computing the intermediate results.

R language allows users to view the result of each step by clicking the variable name at any time, with RStudio and other tools.

esProc provides only one official tool, that is, click the cell to view the result of this step.

Regarding this ability, esProc does not differ from R language much. Considering that R language supports for a great many of 3rd party tools, maybe there is any tool capable of providing the better observed results.

Then, let compare their abilities to reference the result.

R language users are only required to define a variable for the result of computation in each step to conveniently reference the result in the steps followed with regard to the R language solution.

esProc users can also define variables to reference, however, using the cell name as the variable name is more convenient and saves the trouble of finding a meaningful name.

Next, let’s compare their performances on set of intersection.

In the last step, the intersection set of data of every group are to be computed. R language provides the intersect function at the bottom layer, using together with Reduce function, the intersection set of multiple groups of data can be computed.

esProc provides isect function to compute the set of intersection on multiple sets, which is quite convenient.

Comparatively, R language provides the Reduce function of greater imaginary space, and esProc is easier.

As it can be seen from the above case, R language boasts the abundant fundamental functions and a huge amount of library functions from the 3rd party.

In respect of data member access, esProc provides the excellent representing style, in particular the grouping at multi-levels. By comparison, R language relies more on the loop statements.

Both esProc and R language solutions have excellent performances in respect of interaction.

About esProc: http://www.raqsoft.com/product-esproc

More Stories By Jessica Qiu

Jessica Qiu is the editor of Raqsoft. She provides press releases for data computation and data analytics.

Latest Stories
All organizations that did not originate this moment have a pre-existing culture as well as legacy technology and processes that can be more or less amenable to DevOps implementation. That organizational culture is influenced by the personalities and management styles of Executive Management, the wider culture in which the organization is situated, and the personalities of key team members at all levels of the organization. This culture and entrenched interests usually throw a wrench in the work...
SYS-CON Events announced today that Calligo, an innovative cloud service provider offering mid-sized companies the highest levels of data privacy and security, has been named "Bronze Sponsor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Calligo offers unparalleled application performance guarantees, commercial flexibility and a personalised support service from its globally located cloud plat...
SYS-CON Events announced today that Massive Networks will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Massive Networks mission is simple. To help your business operate seamlessly with fast, reliable, and secure internet and network solutions. Improve your customer's experience with outstanding connections to your cloud.
In his session at 20th Cloud Expo, Mike Johnston, an infrastructure engineer at Supergiant.io, discussed how to use Kubernetes to set up a SaaS infrastructure for your business. Mike Johnston is an infrastructure engineer at Supergiant.io with over 12 years of experience designing, deploying, and maintaining server and workstation infrastructure at all scales. He has experience with brick and mortar data centers as well as cloud providers like Digital Ocean, Amazon Web Services, and Rackspace. H...
FinTechs use the cloud to operate at the speed and scale of digital financial activity, but are often hindered by the complexity of managing security and compliance in the cloud. In his session at 20th Cloud Expo, Sesh Murthy, co-founder and CTO of Cloud Raxak, showed how proactive and automated cloud security enables FinTechs to leverage the cloud to achieve their business goals. Through business-driven cloud security, FinTechs can speed time-to-market, diminish risk and costs, maintain continu...
21st International Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Me...
IoT is at the core or many Digital Transformation initiatives with the goal of re-inventing a company's business model. We all agree that collecting relevant IoT data will result in massive amounts of data needing to be stored. However, with the rapid development of IoT devices and ongoing business model transformation, we are not able to predict the volume and growth of IoT data. And with the lack of IoT history, traditional methods of IT and infrastructure planning based on the past do not app...
SYS-CON Events announced today that DXWorldExpo has been named “Global Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Digital Transformation is the key issue driving the global enterprise IT business. Digital Transformation is most prominent among Global 2000 enterprises and government institutions.
SYS-CON Events announced today that Datera, that offers a radically new data management architecture, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Datera is transforming the traditional datacenter model through modern cloud simplicity. The technology industry is at another major inflection point. The rise of mobile, the Internet of Things, data storage and Big...
"Outscale was founded in 2010, is based in France, is a strategic partner to Dassault Systémes and has done quite a bit of work with divisions of Dassault," explained Jackie Funk, Digital Marketing exec at Outscale, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
SYS-CON Events announced today that Calligo has been named “Bronze Sponsor” of SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Calligo is an innovative cloud service provider offering mid-sized companies the highest levels of data privacy. Calligo offers unparalleled application performance guarantees, commercial flexibility and a personalized support service from its globally located cloud platfor...
"At the keynote this morning we spoke about the value proposition of Nutanix, of having a DevOps culture and a mindset, and the business outcomes of achieving agility and scale, which everybody here is trying to accomplish," noted Mark Lavi, DevOps Solution Architect at Nutanix, in this SYS-CON.tv interview at @DevOpsSummit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We were founded in 2003 and the way we were founded was about good backup and good disaster recovery for our clients, and for the last 20 years we've been pretty consistent with that," noted Marc Malafronte, Territory Manager at StorageCraft, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"DivvyCloud as a company set out to help customers automate solutions to the most common cloud problems," noted Jeremy Snyder, VP of Business Development at DivvyCloud, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Kubernetes is an open source system for automating deployment, scaling, and management of containerized applications. Kubernetes was originally built by Google, leveraging years of experience with managing container workloads, and is now a Cloud Native Compute Foundation (CNCF) project. Kubernetes has been widely adopted by the community, supported on all major public and private cloud providers, and is gaining rapid adoption in enterprises. However, Kubernetes may seem intimidating and complex ...