Data Skills Gap – Addressing the Shortages

Learning Light, in partnership with VeryViz Ltd and Sheffield Hallam University (SHU), are developing a software environment using a unique and proven approach to enable easy access and understanding of data query construction in Standard Query Language (SQL). This builds on our proven approach to visualisation techniques pioneered in spreadsheets such as Excel. 

SQLView will provide an environment for practical training, development and awareness building of SQL skills, helping with the ‘democratisation of data analytics‘ as Gartner puts it.  SQLview will help reduce user errors and slips and thus improve the overall quality data analyses conducted using SQL. No other general solution of this nature is currently available. 

SQL Visualisation

The challenge addressed is the education, training and business need for supporting working with data. SQL, the standard relational database query language, is the bread and butter of basic data analytics and related ICT courses and training. However, as a computer language it can be difficult to master and in the worst case users of SQL do not understand fully what they’ve expressed.

With the growing proliferation of data sources and greater expectation upon non-expert users to engage with data querying and analysis, there is a demand to support users more effectively in their understanding and use of SQL queries. The SQLview tool will provide innovative and novel means of supporting SQL tutors, learners and users. By doing so, education and training in the area will benefit and skill development will be enhanced. This will result in more efficient training and less likelihood of data analysis errors.

The innovation to be developed is to provide a novel visualisation for the common database query language (SQL). This will be developed drawing upon existing research expertise in SHU (Sheffield Hallam University) and the iterative development of a solution informed by early adopter feedback. The SQLview tool will be designed to make the SQL used in Relational Data Base Management Systems (RDBMS) accessible and understandable (

The tool will be developed to become one of a range of visualisation solutions provided by VeryViz Ltd. This will help establish VeryViz’s distinctive position in the market of visualisation tools to support the use of complex languages. For Learning Light these visualisation tools will help diversify its products and services into the edtech market. Learning Light is a shareholder in VeryViz and a distributer of e-learning materials through its e-learning centre operation.  For SHU the collaboration will help establish research impact (SHU has licenced its spreadsheet visualisation technology to VeryViz).

 

Scope of Our Project

The project is aimed at developing an innovative visualisation technology and there is immediate benefit to a significant international market (that of education and training focused upon databases and data analysis). Our approach is one in which early user feedback is employed to ensure market relevance.

The primary objectives of the project are to develop a prototype product to provide visualisations capable of supporting non-expert trainees and learner (and their trainers and educators). This will provide a novel tool within the education and training market, with the high potential appeal in related non-expert markets. Hence a second stage market will be far boarder and aligned with the democratisation of data analytics as argued by Gartner and noted by Innovate UK and NESTA.

The SQLview tool will build upon visualisation expertise within VeryViz and relevant academic expertise in the area. The core innovation will support the comprehension of language syntax and semantics (using illustrative data samples). This is the unique feature of VeryViz’s existing tool (EQUS).  Our innovative approach to development is one of ensuring that the visualisation reflects relevant user perceptions of expressions within the target language (SQL).

We believe that SQLview will have significant impact on the Big Data skills gap that is widely reported and will provide a major export opportunity as SQL is a global software language. The global database market is worth $33 billion. The UK market is worth $3.3 billion. 

In making our market projections we have analysed the size of the SQL market, the SQL training market, the numbers of data scientists/computer scientists studying in HE, the skills shortages in data analytics highlighted by UKCES, BIS and market analysts such as McKinsey, levels of UK job vacancies requiring SQL skills, along with projected growth of Business Intelligence, Big Data and Business analytics requirements as a market segment.  

 

The Need for SQLView

Visualisation is a popular concept within data analysis and Big Data. However, existing visualisation approaches tend to focus upon data visualisation, indicating the scales, distributions and trends evident within given data. This enables the human interpretation of data. The SQLview approach is distinct in that it focuses upon visualising the query language – the language by which reports, results and findings, and even data visualisations are defined. User errors within such queries are frequently only evident when the outcomes they generate are patently wrong – if that never happens, queries containing mistakes may be repeatedly used and their outputs mistakenly trusted.

Existing visual support for SQL queries have some limited syntax support (see MS SQL Server), and limited query expressiveness (see MS Access). The common support tools support the trialling of queries (and the formatting them to be easier to read). However none that we have found combine query structure (syntax) with the query function (semantics). Two good illustrations of semantic visualisation are Query Vis (see: http://queryviz.com/online) and  SnowFlakeJoins (see https://snowflakejoins.com/) However, they provide no syntactic support and thus encourage a trial and error approach to understanding and correcting any mistakes.

We have already established the conceptual design of visualisations for ‘languages’ in similar contexts. Specifically, we have developed a visualisation approach for spreadsheets using these concepts (called  EQUS) – the EQUS tool is licenced to VeryViz by SHU. After spreadsheets (principally Excel), the most common means of manipulation of data is within Relational Database management systems (RDBMS). An assessment of the growing market in data analytics and the democratisation of data analytics indicate the potential value of SQLview.

We have noted the comments from NESTA, Innovate UK and Gartner regarding the democratisation of data by building much greater understanding of how data can be manipulated. We believe SQLview will play a part in this as corporate decision makers will be able to gain insight and visualisation to database queries and assess their efficacy and indeed ethical dimensions.  

 

SQLview – Innovation to Enhance Data Skills

The innovation being developed is to provide a novel visualisation of standard database query languages that will allow users to easily recognise syntax and semantics. The syntax shows the related elements that go to form a query, and the semantics indicates the functionality of elements and their combination. The combined visualisation will help reduce errors in such queries and thus support exploratory development and user confidence. No other general solution of this sort currently exists.

SQLview will improve upon the current state of the art by focusing upon query visualisation that combines syntactic and semantic information. Support for query comprehension within existing tools include: syntax highlighting, auto-formatting and some syntactic visualisation tools. No tools attempt to combine these two related perspectives.

Many tools tend to cater for expert developers needs and thus focus upon sophisticated semantic details. Our focus upon the training, education, and non-expert user markets means the nature of the support provided by our visualisation will have to be less reliant upon detailed technical competence. The visualisation approach we take focuses specifically upon these details to help disabuse users of misconceptions of what a query is doing.

The focus of the innovation in SQLview is design of and dynamic generation visualisation of users’ queries. Initial research focused on infer relevant information from a given query to ensure the visualisation is as informative as possible. In addition, SQLview will address the challenges of what to do with a mal formed query (one with the incorrect syntax) and supporting the user in mapping from the visualisation back to the query text that might need changing.

VeryViz is currently developing and promoting a visualisation approach for spreadsheets (called EQUS – a prototype demonstration of EQUS is available):

Viewable on YouTube:
https://www.youtube.com/watch?v=yn5dYIYUpeQ&feature=youtu.be )

The project will help develop the company’s offering and in the domain of visualisation. For Learning Light, VeryViz’s product range represent tools highyl relevant their expertise in the learning and e-learning markets. (Learning Light are shareholders within VeryViz.)  

For the research partner (SHU) the SQLview project strengthens  the University’s strategic engagement with knowledge transfer, local industry, and entrepentureship and research impact.

5) The project will broaden the domains in which VeryViz provides solutions, while keeping to a similar business model grounded primarily in education and training. VeryViz will be able to demonstrate its ability to innovate and provide solutions across domains, this will be strengthen its reputation and help develop new business opportunities. At present Excel is the dominant tool for Business Intelligence

VeryViz seeks to become a market leading company in providing visualisation tools for Business Intelligence, Business Analytics and Big Data and SQLview is the second of the planned tool sets.

 

Data / Analytics Terminology

 

What is Business Analytics?

Business Analytics is the intersection of business and technology, offering new opportunities for a competitive advantage. Business analytics unlocks the predictive potential of data analysis to improve financial performance, strategic management, and operational efficiency.

Analytics is “the extensive use of data, statistical and quantitative analysis, explanatory and predictive models, and fact-based management to drive decisions and actions.” Data analytics software and advanced analytics techniques, include predictive analytics, text analytics and text mining, customer analytics and data mining.[1]

 

What is BI?

BI is the “computer-based techniques used in spotting, digging-out, and analyzing ‘hard’ business data, such as sales revenue by products or departments or associated costs and incomes. Objectives of BI implementations include (1) understanding of a firm’s internal and external strengths and weaknesses, (2) understanding of the relationship between different data for better decision making, (3) detection of opportunities for innovation, and (4) cost reduction and optimal deployment of resources.” (Business Dictionary). The most widely used BI tool is Microsoft Excel.[2]

BI is about “sense and respond.” Analytics is about “anticipate and shape” models.

 

What is Big Data?

Big Data refer to data scenarios that grow so large (petabytes and more) that they become awkward to work with using traditional database management tools. The challenge stems from data volume + flow velocity + noise to signal conversion. Big Data is spawning new tools that are mix of significant processing power, parallelism and statistical, machine learning, or pattern recognition techniques

Data visualization tools, include mashups, executive dashboards, performance scorecards and other data visualization technology, is becoming a major category.[3] (Practical Analytics).

This trend towards data analytics, business intelligence and Big Data is creating considerable (projected) skills shortages and immediate operational issue –principally one of understanding of the potential by managers (outside the mega corporations).

 

The Global Database and Data Management Market

The global database market is worth $33 billion (Gartner). The UK market is worth $3.3 billion. We estimate the France and Germany have markets of similar size and the USA has a market worth $19 billion.

In the UK, we estimate that the addressable training market for SQL is £40 million from the total ICT training market of £700 million. We anticipate this market to grow to £60 million as the UK addresses its skills shortages.

The UK requirement is to create 56,000 Big Data professionals per annum (SAS). McKinsey & Co. has placed the global shortfall of Big Data experts between 140,000 to 190,000 by 2018. In 2016 IT Jobs Watch reported 22,000 SQL skill based job and SQL skills attracted the highest pay rates of all IT professionals.

UKCES 2015 Employer Skills Survey reports that 29% reported complex numerical/statistical skills shortages. National Skill Academy research (2011) reported that 75% of employers reported skills shortages in Database related skills (second only to Spreadsheet skills). The USA is the largest market and employs over 100,000 database administrators alone – we see this market as worth £360 million for SQL skills training.  We estimate the global expenditure on SQL training to be in excess of £550 million per annum and growing rapidly. The current SQL training market (outside education) is serviced by books (1078 on Amazon).

 

Database / SQL Courses

Hot Courses offers 80 courses on databases and 693 online courses for SQL, MOOCs (edX lists 102 courses) and next generation platforms such as Udemy (172 courses) with boot camps and code camps are becoming popular. In education, within the UK over 70,000 students are taking Computing courses likely to have SQL as an element (provided by over 80% over UK Universities).  

The international market includes over 600 universities offering relevant courses (topuniversities.com). Our plan is to engage with education publishers and corporate training providers to distribute SQLview across the world into education and corporations respectively. 

Training list publisher Hot Courses offers 80 courses on databases and 693 online courses for SQL.

  •         A search of www.amazon.co.uk for SQL results in 16 pages, 17,255 results, equivalent to 1078 books.
  •         There are 172 courses available on Udemy addressing SQL and databases
  •         edX MOOC platform lists 102 courses with relevance to database and SQL on EdX
  •         Reed, a market leading training provider, offers 73 Big Data courses and 675 related to databases
  •         QA another commercial training provider (with a strong Microsoft focus) offers 79 courses addressing SQL including SQL server 29, database application 14, HP 5 and MySQL 3 courses.
  •         The SQL courses are not (with the exception of MOOCs) in the main low value courses, but expensive multi-day tutor led classroom or blended learning led courses or online courses with very considerable practical application work.
  •         CodeCamps and Boot Camps are a growing phenomenon across world and SQL skills are appearing to be in demand.
  •         A recent Udemy SQL boot camp saw 2974 students enrolled.

All of these statistics show that there is a demand, and therefore market, for SQL training.

IT skills training is one the high propensity to train markets using technology and as a sector one enjoying strong growth.

Research for Forfas (Oxford Economics, 2014c) in Ireland identifies three types of skills associated with Big Data:

  • ‘deep analytical talent’, based on a combination of advanced statistical, analytical and machine learning skills
  • ‘Big Data and analytics savvy roles’ for individuals that understand the value that can be extracted from Big Data, interpret results and use them to inform business decisions;
  • ‘supporting technology roles’ fulfilled by those who develop and implement the hardware and software.

 

 

Benefits for Other Sectors

The democratisation of analytics – Gartner says that as analytics has become increasingly strategic to most businesses and central to most business roles, every business is an analytics business, every business process is an analytics process and every person is an analytics user.

“It is no longer possible for chief marketing officers (CMOs) to be experts only in branding and ad placement,” said Mr. Bertram. “They must also be customer analytics experts. The same is true for the chief HR, supply chain and financial roles in most industries.”

To meet the time-to-insight demanded by today’s competitive business environment, many organisations want to democratise analytics capabilities via self-service.[4]

An additional application that the project may address is the interoperability issues in SQL between various RDBMS which will improve organisations’ productivity.

Additional benefits flowing from the project are furthering the development of Sheffield Hallam University expertise in data visualisation techniques, leading to further products being developed for higher level languages such as R.. VeryViz Ltd is developing with a series of eco-systems emerging around it: Consultancy and training businesses using the VeryViz tool sets for risk management work – for example or trouble shooting or data analytics consultancy work. A new eco-system of teachers and trainers using the VeryViz tools as a component in learning modules or courses on database technologies that can be marketed through course aggregation platforms such as Udemy or TES.

 

Further Reading:      

 

[1] https://practicalanalytics.co/2013/07/14/blessed-are-the-mid-markets-for-they-shall-scale-big-data/

[2] https://practicalanalytics.co/2013/07/14/blessed-are-the-mid-markets-for-they-shall-scale-big-data/

[3] https://practicalanalytics.co/2013/07/14/blessed-are-the-mid-markets-for-they-shall-scale-big-data/

[4] http://www.gartner.com/newsroom/id/3198917