Research methodology

1.     Introduction

This document highlights the PQRS research methodology and provides a breakdown of how data is obtained, cleaned, verified and applied for solar PV installations for South Africa and regions throughout Africa.

The PQRS PV database is considered to be a living research document and is updated regularly as and when new data is made available.

Processes described below are used to collect information and data for the purpose of monitoring trends and issuing reports on Solar PV sector growth. These reports allow stakeholders participating in the data collection initiative to make informed business decisions, promoting the use of PV as a renewable source of energy.

2.     Data gathering

Research methods for obtaining data include:

  1. Collecting data through
    • Publication & online magazine research; telephonic interviews; online surveys; social media sites and also using data submitted by installers and suppliers.
  2. Other research techniques include, noting information on installations taking place whilst presenting training sessions to candidates and, gathering information from other industry events and meetings to include both present and historical information.

3.     Data categorization

Diagram 3 below shows how data is categorized and the number of permutations that exist for a single listing. For more info on listings, please see the section called “Descriptors”.

4.     Data sources – verification and cleaning

Any form of data need to be cleaned prior to being added to the database. Our data is scrutinized internally for errors and checked against existing data for possible duplication.

4.1.  Publication & online magazine research

How new items are applied into the data set:

A number of online magazines and articles have been used as a means of referencing data and adding data into the database. This includes amongst others, and is not limited to:

  • Engineering news

Listing reference nr. AAB041 can be seen at the following link: http://www.engineeringnews.co.za/print-version/solar-farms-move-to-the-cities-2016-08-18. The Magazine however lists the site as being 1056kWp and the PQRS listing shows 1100kWp due to the way in which the listing was submitted by one of the stakeholders involved in the project. The site was then further researched in order to find confirmation of the installation from other sources. Values used in listings may vary slightly from values published in articles or vice versa. Variations are described further in this document under the section discussing “error margin”.

4.2.     Telephonic interviews:

Telephonic interviews were used as an example, to determine the total solar PV sales for 2016. These sales values were then applied to determine the percentage of installed capacity that has been listed on the PQRS database.

4.3.     Online surveys

An online survey was used with “the storage market analysis”; a report that was done for an American based Li-ion battery manufacturer. During this survey more than 100 installers referenced 457 different installations.

4.4.     Social Media

Some listings may be sourced and or cross-referenced with the installer or EPC’s Facebook page or website.

4.5.     Data from installers and suppliers

Installers and suppliers are encouraged to submit installation data. Listings submitted by installers are cross-referenced when suppliers submit data. Multiple stakeholders regularly submit the same sites; and these listings are scanned for duplication.

5.     Error Margins for data collection

The integrity of data for the PQRS database is considered to be a primary focus by avoiding duplication through various means of affirmation and by cross referencing different data sets provided by a variety of sources as indicated in the section called “Verification and cleaning”. A number of organizations including universities and government institutions have checked and verified the integrity of our database.

The existence of errors is not excluded from the data and could be present in the following instances.

5.1. Typing errors

Data entered into incorrect cells, meaning the wrong information is manually entered, or the right information is entered into the incorrect cells

5.2.  DC vs AC generation

A certain percentage of power generated is lost due to inefficiencies or losses in the system due to the conversion process from AC to DC and other losses. Some individuals will therefore provide the AC rating of the system and others might submit the DC nameplate rating.

5.3.  Inverter vs PV generation sizing

Although this type of error has a negligible impact on overall data, it should be raised as a possible area of concern that would affect totals. These errors would typically only occur in small to very small systems. Examples of these types of errors can be seen where contractors install a 5kW inverter and only 1kWp of PV power on the roof. 5kWp may be submitted as system size although only 1kWp was installed.

5.4.  Interpretation

Contractors might not know how to interpret system sizing and may list a system as a 5kW system because the system generates 5kWh of energy per day and not 5kWp. Although not a regular occurrence, this type of listing has been identified as being present in the data to some extent. The impact on total installed capacity would be negligible as these instances occur mainly on very small to very-very small systems.

5.5.  Terminology

Terminology remains a mystery with installers and may be misunderstood such as the difference between off grid, hybrid, backup and grid tied installations. Again these instances will occur predominantly in the small to very small system sizes through Ranges 4 and 5.

5.6.  Type of installation

Some installers consider a system connected to the grid, which is only used to drive a UPS type inverter in a dedicated circuit to be ‘off-grid’. Whereas in our data this type of system may be reflected as being grid tied as it is widely accepted by municipalities to be grid tied as it is technically tied to the grid even though it might not have the functionality of being able to feed back. This type of instance will only really occur on very small systems and won’t have a visible affect on the overall listed generation capacity.

5.7.  Generation and power factor

kVA vs kW. Transformers are rated in kVA and certain inverters may be rated in kVA but sold in kW. Due to the possible variation in kW and kVA it should be noted that all figures quoted in the data is quoted in DC kWp STC values as per the data label on the module.

6.     Error margins for reports

Four levels of estimation are noted as being relevant to reports issued by PQRS.

It must be noted that the margins of error are average values assigned to each level and margins of error do not have a mathematical reasoning or basis due to the varying degree of reporting from a broad segment of industry.

To attempt creating measurable margins of error for each level of reporting is not considered at this stage, due to the number of possible permutations in which these error-reporting instances may occur. Creating a more accurate basis for margin of error will be reconsidered when sufficient information is available to simulate different models that could enhance accuracy.

In order to summarize the sentence above, for the time being, PQRS margins of error are estimated and not mathematically calculated.

 

The following margins of error are provided for each level or reporting respectively.

 

Conservative – 5% up or down

Realistic – 10% up or down

Accelerated – 15% up or down

Wild Card – 20% up or down

7.     Missing data / Bulk data / Supplemental data

Let’s face it, . . . . .we don’t have all the data; in other words we don’t have 100% of all installations listed. Some data is missing, and we call data not reflected in our database, “missing data” for lack of a better term. Missing data can be calculated in many instances. Data collection for the solar PV industry is like building a puzzle with some of the pieces of the puzzle that seems to be known, some more obvious, and some pieces being missing. Missing data are pieces of the puzzle that are either not obvious or can’t be seen.

8. Predicting trends using data

Data tells a story. Sufficient volumes of data will tell the same story provided that the analysis is viewed from the same angle. Data visualization allows us the ability to discover trends that are locked within data sets. These trends are stories that want to be told. An in-depth understanding of the trends, allows an analyst to see the story.

Trend prediction is done on a basis of statistical probability.

The PQRS database and data reporting is multi-dimensional. Reporting can be represented by a 3 dimensional complex vector-space-type image showing an X, Y and Z-axis as indicated in diagram 1 below.

A vector is defined as “a quantity (nr of installations) that has direction (over time) as well as magnitude” (specific size of installation). Through a statistical school of thought, it is possible to predict the position of sales by picturing the different points named x-value, y-value and z-value in space and viewing the data as being actively changing with the z-axis being present time moving along the x-axis. The calculation will however not create a linear result as is indicated in the diagram due to the dynamic nature of the PV industry and the diagram is therefore simply to aid in being a visual expression of the concept.

1.1. Historical trends

Trends are determined using known data. This method is good for understanding historical trends but not ideal for determining future growth/potential.

1.2. Future predictions

Determining future growth and projections are done using a combination of existing and estimated statistics preferably applied into triangulated arguments that increase accuracy and draw comparison.

9. Reports

The PQRS PV database is considered to be a living research document and is updated regularly as and when new data is made available. As data is usually supplied with some of the columns left blank and information omitted ideal data is not always available meaning that certain anomalies exist within the data. These anomalies result in error margins.

 

Four levels of estimation are noted as being relevant to data and reporting sets issued by PQRS when using PQRS data. More than one level of estimation may be noted in a single report meaning that a single graph will be issued against one of the four levels and a graph in another section of the same report may be issued applying another level of estimation.

 

These four levels are:

1.2.1. Conservative

Conservative approach represents less than the anticipated value could be argued, estimated or indicated

1.2.2. Realistic

Realistic represents the values as is submitted by contractors and stakeholders.

DC Peak power is based on PV module nameplate rating at STC conditions and realistic values would be captured in the column labeled “size” in the data sheet using the DC nameplate rating for modules.

1.2.3. Accelerated

Accelerated represents estimated values used for issuing reports that reflect a progressive approach to data. This approach would also be used to compensate for the lack of other available data sources.

1.2.4. Wild Card

A Wild Card approach to values may represent the opinion or perceived values when there is no specific evidence to support the content published or when the evidence may not represent the greater portion of industry. Wild card approaches are also used when a certain value is suspected and could be an indication of the general sense and perceived trends experienced with specific values over a period of time. Let’s summarize;

Wild card = Guess = Hunch. We do not like using “wild-cards” as they have no statistical, scientific or mathematical basis.

10. A listing

A listing is described as a single row of data containing information about a solar PV installation and is used in various forms whether the data is considered to be ideal data or not. Table 1 below shows a spreadsheet with 3 individual listings. The 3 listings represent 17 systems.

1.1. Ideal data

In summary, “ideal data” is considered to be data that can or has been confirmed by more than one source and has been cleaned. Where possible our data is cross-referenced between sources to reduce duplication and confirm various parameters associated to each listing.

In order for data to be considered “ideal”, preferred data metrics need to be quantitative, categorical, time related and detailed (un-aggregated) thereby allowing data to be confirmed through more than one source.

  1. Quantitative meaning that each listing has one or more numeric metric associated with it.
  2. Categorical meaning data can be organized into a finite number of categories such as agricultural, off-grid, C&I, medical, residential, etc.
  3. Time related meaning each listing is linked to a function of time and at least imprinted with the year of installation.
  4. Detailed meaning for each listing there is a combination of single data points captured in columns that would / could be used to validate or confirm the various data sources and contain a project name that could act as a point of reference.

11. Columns

When data collection started in 2014 only 5 columns were populated and included in the original research conducted. Within the spreadsheet various labeled columns describe parameters associated to each listing and there are more than 37 columns for each listing as labeled below. Not all columns contain data as some columns have been added based on request from stakeholders requiring specific information and are added as and when required and labeled accordingly.

 

For more information on column descriptors please see the section labeled “Column Descriptors”

 

The term “Unkown”

PQRS aims to only capture actual installations. When installations are listed, certain cells in the row may be left blank. The listing may therefore indicate the name of the installation as well as the capacity, but the province, module used, type application or other aspects may be left blank. Within the data, the empty cells will be populated with the term “unknown”

 

This table shows columns 1 to 37 in the database as well as the respective column descriptor

Column nr. Column Descriptor Column nr. Column Descriptor
1 Range nr. 20 Inverter Brand
2 Nr of systems 21 Module brand
3 Country 22 Support mech brand
4 Province 23 Charge controller
5 Project name 24 Battery brand
6 Reference nr 25 Battery bank
7 Sector 26 Physical address
8 Activity 27 Co-ordinates
9 Size 28 Date checker
10 Stage 29 Verification
11 Date commissioned 30 Supplier
12 Date Checker 31 Inverter ref
13 Annum 32 City / Town / Suburb/Municipality
14 Type 33 Installation detail
15 Project Lead 1 34 Funding
16 Project Lead 2 35 Exchange rate at the time
17 Installer 1 36 Rand per watt
18 Installer 2 37 System Cost ex VAT
19 Project Manager
1.1. Column Descriptors

This section describes the use of general terms and phrases used in data and reports when circulated or published and is used as headers in columns in order to define labels for each column.

1.1.1. Range

PQRS draws a distinction between 6 ranges of PV installations, which can be seen in table 3 and visually represented in diagram 2. The range of systems depict that there are more systems per nr of installations in range 5 than in range 0 for the same or similar generation capacity. Hence the pyramid being used as a visual representation.

1.1.2. Nr of systems

Number of systems may reflect instances where there was a development with a number of homes and the organization submitting the data did not want to list each installation individually. The default value should be 1 unless the listing represents multiple installations. Where the listing represents 10 installations the value appearing in the column would = 10.

1.1.3. Country

Reflects the country in which the installation was done.

1.1.4. Province

Reflects the province in which the installation was done.

1.1.5. Project name

A Reference name for the project. In some case the contractor submitting the name may render the name in order to make the project “anonymous” thereby not disclosing the name of the client. In some cases the client wishes to remain anonymous for a number of reasons.

1.1.6. Reference nr.

Unique reference numbers are assigned to each listing. This number is generated by PQRS except when it is provided by an online portal, in which case the number assigned to the listing by the online portal is used as a reference.

1.1.7. Sector

Four types of sectors are provided for in the data, i.e. Agricultural; C & I; Residential; Unknown.

1.1.8. Activity

Describes the activity of the application. Under C & I, an installation could be sub categorized as Retail, Medical, Factory, etc.

1.1.9. Size

Indicates system size in kWp. This value is an indicator of the module generation capacity and not the inverter rating installed on site.

1.1.10. Stage

Some projects are built or constructed over a period of time. A client may do a 100kWp system in three stages due to the way in which capital is made available or due to a variety of other reasons. Each stage would be captured individually as being Stage 1= 30kWp, Stage 2 = 50kWp and stage 3 = 20kWp.

1.1.11. Date commissioned

The month and year the installation was completed. PQRS only really uses the Annum value in reports. When a contractor submits information and the month is not known, Jun is used as a default.

1.1.12. Date Checker

The information written into the date commissioned column that has been re-written in a text format.

1.1.13. Annum

Provides the year in which the system was installed.

1.1.14. Type
  • Backup,

A System that does not contain PV at the time of installation and has the capacity for PV to be connected at any stage. This section also contains systems listed as UPS’s

  • Off Grid,

Systems installed in the absence of municipal or utility power.

  • Hybrid,

A PV system that has the ability to be configured as off-grid, grid tied, or a hybrid inverter and is installed in the vicinity of a Grid.

  • Grid tied,

Typically a PV system without storage connected to a Grid and that has the ability to feed back into the grid.

  • Mixed,

Listings that have been listed as mixed contain multiple installations within a single listing and include a combination of Backup, Hybrid and off grid systems in the same listing. This type of listing occurs where a contractor submits information in bulk without separating listings or making individual system information available.

Unknown

1.1.15. Project Lead 1 & 2

This label is assigned to stakeholders involved in a solar PV project and is a generic name aimed at describing a relationship between a contractor, developer, salesman, EPC or any other type of stakeholder with relation to a solar PV installation. The JBCC contracts use the term ‘Principle agent” and “agent” to describe the various responsibilities of stakeholders to a project.

“JBCC Definition: “AGENT: The entity dealing with specific aspects of the works appointed by the employer or delegated by the principal agent”. With PV projects there are a number of stakeholders and each stakeholder may fulfill a different and unique function which is similar to the construction industry but different in the sense that there are parties that may be selling.” The South African business environment has a diverse nature with contractors and subcontractors fulfilling different roles for different applications and types and sizes of systems. All stakeholders are broken down into 5 categories over 3 basic responsibilities. Therefore 5 stakeholders can be listed against a single project. As a result all 5 stakeholders are credited with the same system as all 5 stakeholders were involved in this project in one way or another.

When a project is listed however, the same stakeholder is assigned three roles by default.

The 3 responsibilities are Sales and Design, installation and project management. Sales and design are labeled Project lead.

 

1.1.16. Project Lead 2

See Project Lead 1

1.1.17. Installer 1

Used to describe the installer involved in the project

1.1.18. Installer 2

Used to describe the installer appointed as a sub contractor involved in the same project.

1.1.19. Project Manager

Used to describe the Project manager involved with managing the project dynamics, operations and construction phases of a project.

1.1.20. Inverter Brand

Describes the brand name of the inverter

1.1.21. Module brand

Describes the brand name of the module

1.1.22. Support mechanism brand

Describes the brand name of the mounting structure

1.1.23. Charge controller

Describes the brand name of the charge controller

1.1.24. Battery brand

Describes the brand name of the battery brand used in the installation

1.1.25. Battery bank

Tells us more about the number of batteries and the energy storage capacity

1.1.26. Physical address

Where the installation is based

1.1.27. Co-ordinates

GPS or spatial co-ordinates

1.1.28. Date checker

A duplication of column 1.1.12

1.1.29. Verification

Verification is the test of the source and verifies the existence of the installation data for the specific listing

1.1.30. Supplier

The supplier information is indicated in this column

1.1.31. Inverter ref

The inverter reference or model number is included

1.1.32. City / Town / Suburb/Municipality

Municipal region

1.1.33. Installation detail

Ground mount, rooftop, carport and other interesting site related information that should be noted

1.1.34. Funding

PPA, Cash, Rent to own, financed, outright purchase, added to bond, etc

1.1.35. Exchange rate at the time

Used when cost of system is available

1.1.36. Cost in currency per watt

Cost per watt for whatever currency was used based on country of installation

1.1.37. System Cost ex VAT

Cost for the complete system

1.1.38. Inspection nr.

Used in cases where an inspection was done on site.