Big Data Analytics
in higher Education
Walaa Adel Mahmoud
for Science and Technology and Maritime Transport,
Abstract— Big Data
provides an opportunity to educational Institutions to use their Information
Technology resources strategically to improve educational quality and guide
students to higher rates of completion, and to improve student persistence and
outcomes. This paper explores the attributes of big data that are relevant to
educational institutions, investigates the factors influencing adoption of big
data and analytics in learning institutions and seeks to establish the limiting
factors hindering use of big data in Institutions of higher learning. This
paper has been use the dataset “Academic
Ranking of World Universities, 2003-2017”, we studied and analyzed to forecast
how university’s management and faculty could adapt to changes to improve their
education and thereby the ranking of their universities in the upcoming years.
Microsoft SQL Server Data Mining Add-ins Excel
was employed as a software mining tool for predicting the trending
university ranking. This research paper concentrates upon predictive analysis
of university ranking using forecasting based on data mining technique.
Keywords — Big
Data analytics, Mining Big Data, Education.
As an emerging ?eld within education, a number of scholars
have contended that Big Data framework is well positioned to address some of
the key challenges currently facing higher education 1. Global trending is
affecting education, additionally; there has been pressure from political and
social changes for institutions of higher education to respond to these rapid
changes effectively and on time. In the context of the strategic planning of
higher education, Big Data Analytics is relevant nowadays since both regular
and distance education brings about new data useful to support the making of
decisions 2. The plethora of useful data generated makes decision making
tough, however, if higher educational institutes trace data they can adapt
Knowledge discovery and data mining approaches have been
utilized to make sense of the unstructured data. There are several techniques
or algorithms that are helpful in extracting the characteristics of the data
and building a pattern. Big data has found its place in education and is
predicted to be extensively implemented in institutions of higher education.
Analytics can be defined as the process of determining,
assessing, and interpreting meaning from volumes of data. It has been
categorized in three different categories – descriptive, predictive and
prescriptive. Predictive analysis can serve many segments of society as it can
reveal hidden relationship which may not be apparent with descriptive modeling.
Analytics advancement plays an important role in higher education planning.
Not only, data
analytics helps in analyzing Below points but also can be helpful in predictive
modeling for faculty, administrative and students groups who are looking out
for genuine results about the university rankings, based on which they make
their decisions. Using the dataset “Academic Ranking of World Universities,
2003-2017″, we studied and analyzed to forecast how university’s management and
faculty could adapt to changes to improve their education and thereby the ranking
of their universities in the upcoming years. Microsoft SQL Server Data Mining
Add-ins Excel was employed as a software mining tool for predicting the
trending university ranking. This research paper concentrates upon predictive
analysis of university ranking using forecasting based on data mining
1.1 The Contribution of this paper
The paper was guided by the following specific
is limited research into big data in higher education, despite growing
interests in exploring and unlocking the value of the increasing data within
higher education environment.
paper contributes to the conceptual and theoretical understanding of Big Data
and Analytics within higher education.
introduces the notion of Big Data and outlines its relevance to higher
describes the opportunities this growing research are abrings to higher
education as well as major challenges associated with its exploration and
2. Big Data
and analytics in higher education
Big Data describes data that is
fundamentally too big and moves too fast, thus exceeding the processing
capacity of conventional database systems 4. Big data has some key properties
among them are: Volume, Velocity, Veracity, Variety, Volume etc. In addition to
these properties, the stages required to unlock the value of data are – data
collection, data analysis, visualization and application. Some of them are
classification, clustering, regression etc.
Big Data is a knowledge system
that is already changing the objects of knowledge and social theory in many
?elds while also having the potential to transform management decision-making
theory (Boyd & Crawford, 2012). Big Data incorporates the emergent research
?eld of learning analytics (Long & Siemen, 2011), which is already a
growing area in education. However, research in learning analytics has largely
been limited to examining indicators of individual student and class
performance. Big Data brings new opportunities and challenges for institutions
of higher education. Long and
Siemen(2011) indicated that Big Data presents the most dramatic framework in
ef?ciently utilizing the vast array of data and ultimately shaping the future
of higher education. The application of Big Data in higher education was also
echoed by Wagner and Ice(2012), who noted that technological developments have
certainly served as catalysts for the move towards the growth of analytics in
In the context of higher
education, Big Data connotes the interpretation of a wide range of
administrative and operational data gathered processes aimed at assessing
institutional performance and progress in order to predict future performance
and identify potential issues related to academic programming, research,
teaching and learning (Hrabowski, Suess & Fritz, 2011a, 2011b; Picciano,
2012). Others indicated that to meet the demands of improved productivity,
higher education has to bring the tool of analytics into the system. As an
emerging ?eld within education, a number of scholars have contended that Big
Data framework is well positioned to address some of the key challenges
currently facing higher education (see, eg, Siemens, 2011).
At this early stage much of the
work on analytics within higher education is coming from interdisciplinary
research, spanning the ?elds of Educational Technology, Statistics,
Mathematics, Computer Science and Information Science. Acore element of the
current work on analytics in education is centered on data mining.
Big Data in higher education also
covers database systems that store large quantities of longitudinal data on
students’ right down to very speci?c transactions and activities on learning
and teaching. When students interact with learning technologies, they leave
behind data trails that can reveal their sentiments, social connections,
intentions and goals. Researchers can use such data to examine patterns of
student performance over time—from one semester to another or from 1 year to
On a higher level, it could be
argued that the added value of Big Data is the ability to identify useful data
and turn it into usable information by identifying patterns and deviations from
patterns. Long and Siemen (2011) indicated that Big Data is now well positioned
to start addressing some of the key challenges currently facing higher
education. An OECD (2013) report suggested that it may be the foundation on
which higher education can reinvent both its business model and bring together
the evidence to help make decisions about educational outcomes.
From an organizational learning
perspective, it is well understood that institutional effectiveness and
adaptation to change relies on the analysis of appropriate data (Rowley, 1998)
and that today’s technologies enable institutions to gain insights from data
with previously unachievable levels of sophistication, speed and accuracy
(Jacqueline, 2012). As technologies continue to penetrate all facets of higher
education, valuable information is being generated by students, computer
applications and systems (Hrabowski & Suess, 2010).
Furthermore, Big Data Analytics
could be applied to examine student entry on a course assessment, discussion
board entries, blog entries or wiki activity, which could generate thousands of
transactions per student per course. These data would be collected in real or
near real time as it is transacted and then analyzed to suggest courses of
action. As Siemens (2011) indicated that” learning analytics are a
foundational tool for informed change in education” and provide evidence on
which to form understanding and make informed (rather than instinctive)
Big Data can also address the
challenges associated with ?nding information at the right time when data are
dispersed across several unlinked different data systems in institutions. By
identifying ways of aggregating data across systems, Big Data can help improve
volumes of student information, including enrollment, academic and disciplinary
records, institutions of higher education have the data sets needed to bene?t
from a targeted analytics. Big Data and analytics in higher education can be
transformative, altering the existing processes of administration, teaching,
learning, academic work (Baer&Campbell,2011), contributing to policy and
practice outcomes and helping address contemporary challenges facing higher
education. Big Data can provide institutions of higher education the predictive
tools they need to improve learning outcomes for individual students as well
ways ensuring academic programmers are of high-quality standards. By designing
programmers that collect data at every step of the students learning processes,
universities can address student needs with customized modules, assignments,
feedback and learning trees in the curriculum that will promote better and
One of the ways higher education can
utilize Big Data tools is to analyze the performance and skill level of
individual students and create personalized learning experiences that meet
their speci?c learning path ways. When used effectively, Big Data can help
institutions enhance learning experience and improving student performance
across the board, reduce dropout rates and increase graduation numbers (Figure
The key contribution of Big Data will
depend on the application of three data models (descriptive, relational and
predictive) and the utility of each to guide better decision making (Figure 2).
Fig 1: Key
Big Data opportunities for three end-users in higher education
Fig 2: Three
Big Data Analytical models in higher education
A literature review of academic
research associated with data analytics and descriptive modeling in the
Educational sector reveals the following facts:
Competition for Admissions:
The advent of ranking systems has given students and society more data to evaluate
the quality of an educational institute. Unlike the olden days when people had
less knowledge about the quality of education being imparted in an educational
institute, thanks to the extensive amount of data available in this age of
information, many organizations that engage in ranking universities have come
into existence and help college-goers choose the best institute that fits their
set of requirements 11. However, there has been little evidence that high
competition has had positive effects on what students learn.
Student Performance –
Predictive Analysis: Research papers also pointed out towards a few factors
that pre-empted the probability of success of a student 12. These were:
Past Performance: If a
student has a past record of scoring good grades, it
became a strong
Indicator of the future
performance of the student. Demographic Outlook: Multiple research articles and
surveys also proved that students who are married performed better at studies
than single students. It was also mentioned in the research papers that the
older the student is, the higher the changes of a better GPA are.
Subject Choice: It has come
to the fore through various researches that those students who chose math and
honors in high school were deemed to succeed in undergraduate and graduate
studies than those students that chose other subjects.
Other Factors: There were
some other factors noted in the research that proved to be strong indicators of
students’ success. These included the performance of a student in online
classes and the ratio of attempted to that of credits completed.
Academics & Business
Intelligence: In all the researches that were undertaken, it was discovered
that business intelligence was hardly used in the educational sector 13.
However, it has tremendous potential and can be used by educational institutes
in increasing the enrollment numbers as well as sifting through student
Machine Learning: Another
angle to data analytics in educational institutes that was explored in all the
research literature was to do with machine learning algorithms. The C4.5
algorithm which is essentially a decision tree algorithm can be used to
effectively design predictive models from the student data that has been
accumulated over the years 14, 15.
This paper focuses on the data
mining add in of Microsoft SQL Server Data Mining Add-ins Excel. A sample data
set “Academic Ranking of World Universities, 2003-2017” extracted to
undergo the lifecycle of a data mining process, which includes
formulating/refining data, evaluating and analyzing mining models, thereby
predicting results with the use of spreadsheet. For this process, user must
have installed Microsoft Excel for the Table Analysis and Data Mining Client
add-ins. Since the approach was based on Table Analysis Tools, we had to
convert our raw data into table format that was supported by Excel.
The steps involved during the
process were: Data Preparation, Data Modeling, Accuracy and Validation, and
Model Usage. In the Data Preparation process, picking the correct attributes
from the source (exploring data), removing the outliers (cleaning data),
splitting the data set into samples (partitioning data) were the common
Several Data Models are supported
by the add-in, such as: Clustering, Decision Tree, Time-Series, Pie Chart,
Neural Networks, Sequencing Clusters, and Histogram etc.
Accuracy and Validation generate
estimation models that evaluate against the test data. Classification matrix,
Accuracy Chart and Profit Chart are few of the parameter evaluators. In the
Model Usage, there are two phases wherein in the browse part we explore the
patterns from the output. In second phase, we query the model to predict from
the new data.
Our dataset, “Academic Ranking of
World Universities, 2003-2017” had various factors on which the descriptive and
predictive modeling was done. Some of the factors were-
a) Alumni which had around 10% of
total- It refers to the number of the alumni who wins Nobel Prizes and other
b) Award which had around 20% of total- The
total number of the staff winning Nobel Prizes.
c) Highly Cited (HiCi) with total
20% – referring to of Highly Cited Researchers in twenty one different subject
categories. d) Publication PUB (20%) – Count of papers indexed in Science
Citation Index and Social Science Citation Index in 2017.
e) Per Capita Performance PCP
(10%) – weighted scores of above stated five values divided by the count of
full-time equivalent academic staff.
Based on different parameters,
the ranking of university changes. For example in the below figure, Figure3, on
the basis of PCP in 2017, the ranking for the university is high (good) for
higher score. In 2017, Harvard Institute of Technology had the lowest score of
PCP, so its ranking was the best (6). This analysis is helpful for universities
who can focus on improving their PCP score which is dependent on above stated
indicators. More publication, more HiCI, more awards can help them get a better
Fig 3: Ranking
of universities in USA based on PCP score.
Fig 3: Ranking
of universities in world
Similar models were generated using data
mining add-in to analyze the factors and their influence on improving or
deteriorating the ranking of universities. In addition, from our research we
examined that criteria like cultural, economic and historical stature cannot be
the basis on which universities can be ranked. These ranking barriers may
mislead students in deciding the university for their bright future.
6. Challenges of implementation
Some of the issues faced while implementing
the data mining process for analyzing the trend in the university ranking were:
a. Data Fog situation, accuracy, multiple
truths and extraction of data.
b. Finding the correct and related data set
for the research.
c. Cleaning and refining the data set
according to the requirements of the software.
d. Lack of data governance.
e. Understanding the algorithms provided by
the data-mining add-in.
Through the proper use of big data
analytics the revolutionary development on the education sector could be
achieved. Instead of some innate challenges, big data analytics can represents
customized learning environments to the learners, can reduce potential dropouts
and failure and can develop long term learning plans. All of these are possible
through the effective development and use of big data analytics in the
educational institutions. Microsoft SQL Server Data Mining Add-ins Excel, the
tool used could provide meaningful predictions upon which universities can take corrective
measures to enhance the quality of education system, improve their faculty
contribution towards society. Further, the descriptive modeling can help
evaluate the teaching staff and their excellence in imparting the education.
This study provided vital information on which universities need to formulate
new policies. They can design strategies according to the parameters they are
falling behind on. However, for universities to incorporate the data mining
technique into their current systems will not be an easy endeavor. Bringing in
changes to the already existing setup would require enormous transformation in
terms of cost, resources and tools.
1. Siemens, G., How
data and analytics can improve education, July 2011. Retrieved on August, 2011. 8.
2. Amorim, J.A., et
al. Big Data Analytics in the Public Sector: Improving the Strategic Planning
in World Class Universities. in Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC),
2013 International Conference on. 2013.
3. Daniel, B., Big
Data and analytics in higher education: Opportunities and challenges. British
Journal of Educational Technology, 2015. 46(5): p. 904-920.
4. Manyika, J., et
al., Big data: The next frontier for innovation, competition, and productivity.
5. Reyes, J., The
skinny on big data in education: Learning analytics simplified. TechTrends:
Linking Research & Practice to Improve Learning, 2015. 59(2): p. 75- 80.
6. Bichsel, J., Analytics
in higher education: Benefits, barriers, progress, 2012, and Recommendations.
7. Demchenko, Y., E.
Gruengard, and S. Klous. Instructional Model for Building Effective Big Data
Curricula for Online and Campus Education. in Cloud Computing Technology and
Science (CloudCom), 2014 IEEE 6th International Conference on. 2014.
8. Michalik, P., J.
Stofa, and I. Zolotova. Concept definition for Big Data architecture in the
education system. in Applied Machine Intelligence and Informatics (SAMI), 2014
IEEE 12th International Symposium on. 2014.
9. Lias, T.E. and T.
Elias, Learning Analytics: The
Definitions, the Processes, and the Potential. 2011.
10. Kantardzic, M.,
Data mining: concepts, models, methods, and algorithms2011: John Wiley &
11. M’Hammed, A., H.
Wu, and Y. Cherng- Jyh, Using Data Mining for Predicting Relationships between
Online Question Theme and Final Grade. Journal of Educational Technology &
Society, 2012. 15(3): p. 77-88.
12. Ramesh, V., P.
Parkavi, and K. Ramar, Predicting student performance: a statistical and data
mining approach. International Journal of Computer Applications, 2013. 63(8):
13. mar Pal, A.K. and
S. Pal, Analysis and Mining of Educational Data for Predicting the Performance
of Students. 2013.
14. Bound, J., B.
Hershbein, and B.T. Long, Playing the Admissions Game: Student Reactions to
Increasing College Competition. The Journal of Economic Perspectives, 2009.
23(4): p. 119-146.
15. Guster, D. and C.
Brown, The application of business intelligence to higher education: Technical
and managerial perspectives. J. of Information Technology Management, 2012.
16. Marsh, O., Maurovich-Horvat, L., &
Stevenson, O. (2014). Big Data and Education: What’s the Big Idea. Big Data and
Education conference. UCL
17. Hervatis, V., Loe, A., Barman, L.,
O’Donoghue, J., & Zary, N. (2015). A Conceptual Analytics Model for an
Outcome-Driven Quality Management Framework as Part of Professional Healthcare
Education. JMIR Medical Education , 1 (2)