# UK Civil Society Almanac 2018 / How to Create a Database for Charity Commission Data

## What Data is Available

NCVO use the data obtained from the Charity Commission to produce the Almanac and since this data is publicly available, we have put together this guide to help you make the data accessible to your organisation.

The NCVO Research Team receive a lot of requests for information about charities and so having a database containing monthly data updates means we have the ability to answer questions at our fingertips.  We can answer a range of queries on charity accounts such as as income, expenditure or classifications with the most up-to-date information that the Charity Commission hold.  It also means that we can analyse and break down the data in whichever way suits our needs at any given time.

The contents of each table in the data files will be explained in the instructions below but to get you started, here is some general information on what to expect from the Charity Commission:

Information on file types

Each month, the Charity Commission produce three folders that contain the ability to access the latest financial information received from each registered charity.  Because charity accounts do not have to adhere to normal strict guidelines on when they return their accounts after the end of the financial year (as long as it’s returned within the ten months after this date), the Charity Commission can receive information for a given financial year in the months following 31st March.  This means that the Charity Commission give a regular update on what they receive and that is what the three files they place on their site refer to.

The web page that contains the files can be found on their site here.  The three files are called:

• Charity register extract (Zip file, 120Mb)
• SIR data (Zip file, 43.1Mb)
• Table build scripts (Zip file, 8Kb)

The zipped folder that contains the actual data and the one that NCVO downloads is called ‘Charity register extract’.  The files inside are in the file format ‘.bcp’, which stands for Bulk Copy Program.  Without getting too technical, all we need to know about this file format is that it is used to import data into a Microsoft SQL Server database.  However, MS SQL Server is an enterprise piece of software and as such is chargeable to all organisations, including charities.  Microsoft do produce a free, “Express”, version of the software but it does have size limitations (a maximum of 10GB of data storage per database) that can have significant impact later on if data capacity needs are not reasonably or properly anticipated.

Presumably, the Charity Commission use MS SQL Server but here at NCVO, we do not. The reason being that there are a number of excellent “Open Source” alternatives that can be utilised instead. One such ‘Open Source’ database server is called ‘MySQL’ and it is this that NCVO uses once the .bcp files have been translated into MySQL format.

If you are using MS SQL Server then you do still have the option to import the data in the .bcp source files using the Command Prompt, or MS SQL Server itself, to a MS SQL Server database but we do not use this method so unfortunately are unable to give you any guidance. Please do this at your own caution; our instructions relate only to MySQL.

## Instructions to Create a Database

David Kane, a former senior researcher at NCVO, has created a ‘GitHub repository’ that contains the necessary files, a link to the Charity Commission web page, explanations of the tables and, finally, instructions on how to create the CSV spreadsheets and a database using Python: You can find the repository here – it is free to download the files.  The instructions below have been taken from those found in David’s repository but with a few extra bits to help clarify the process.

If you have some technical expertise then you can simply follow the instructions found in David’s repository to access the data.

Need more help? We have described the process in more detail below.

We have now updated the scripts to be compatible with Python version 3.5 but you can continue to use them with versions of 2.7 if you prefer

### Converting from .bcp to .csv

The ‘Charity register extract’ file, once downloaded, is called ‘RegPlusExtract_xx_20yy’ (where ‘xx’ is the month and ‘yy’ is the year) and contains a list of .bcp files, as explained above.  Initially, we translate the files into CSV files and from there we upload to our database.  To do this, we use various scripts, made with the Python programming language, that read the data in the ‘.bcp’ files and output it into spreadsheets, and from these into tables that we have created in our Almanac database.  We have tables that correspond to the file names – for example, we have ‘cc_financial’ that relates to ‘extract_financial.bcp’ – and we have custom tables that have been made to accommodate a combination of the others.  The instructions below relate to the tables for which the Charity Commission provide data.

#### Steps

First use these python scripts to extract the table files from the ZIP and transform into CSV files. (Never used python before? Start here)

2. Open the command line and navigate to the folder where you want to store the ‘import.py’ script
• Hold the ‘Shift’ key and click the right mouse button on the folder to bring up the menu
• Select ‘Open command window here’
3. At the cursor, type:

python import.py \path\to\download\zip\RegPlusExtract_xx_20yy.zip

This command will ask the ‘import.py’ script to find the zip folder downloaded from the Charity Commission on your computer, convert the .bcp files to CSV files and then place them in the folder in which you opened the command window.

Notes on executing the script

• Please remember to type the correct file name and folder path in the above command, and take note of the space after ‘import.py’ and before you tell the program where the folder is located. For example, if you were downloading and using October 2016’s Charity Commission zip folder and you placed the script and the zip folder in folder called ‘cc_downloads’ on your C drive then the following command would work:

python import.py C:\cc_downloads\RegPlusExtract_October_2016.zip

• If you are using a folder with spaces in its name then please use speech marks around the path and file name in the command, like so:

python import.py "C:\cc downloads folder\RegPlusExtract_October_2016.zip"

Example

As an example of how to store and execute the files, you could do the following:

• Create a folder called ‘import_charities’ on your ‘C’ drive and inside this folder create two additional folders called ‘script’ and ‘data’.  You should then place the import script in ‘script’ and the zip file in the relevant month folder (‘cc_201701’ for January 2017) inside the ‘data’ folder.
• Open the command line interface in the ‘script’ folder, type

 python import.py "C:\import_charities\data\cc_201701\RegPlusExtract_January_2017.zip"

The command line interface should look like this:

• Press the enter key on the keyboard

If you’re unsure about getting the correct file path then you can hold ‘Shift’ and right-click on the file to bring up the menu, then choose ‘Copy as path’ to copy to your clipboard and paste this into the command prompt next to ‘python import.py’ (to paste this, you’ll have to right click in the command prompt to bring up the copy and paste menu).

#### What the file contains

The Charity Commission data extract is in the form of a ZIP folder that contains 15 files. The included files are shown below but you can also find the Charity Commission’s guide to the fields found in each file here.

#### Field names

extract_charity : Gives the main details about each charity (one record per charity). Contains the contact information and address of each charity.

extract_registration : Details of the date that charities were registered and removed, along with a code representing the reason for removal. Codes can be looked up from extract_remove_ref table. Note that some charities have more than one entry in this table, if (for example) they were removed in error and then reregistered.

extract_name : Contains a record of any name the charity has informed the Charity Commission that it works under. This includes: former names that are no longer used, alternative names, acronyms and abbreviations, and names of subsidiary or merged organisations.

extract_class : Contains a record of each classification category associated with a charity. Categories cover three broad areas:

• theme (eg health or education)
• activity (eg providing services, providing health)
• beneficiaries (eg older people, animals)

Each charity can have multiple categories across these three areas, and there is no “primary” value available.

extract_trustee : Contains the name of the trustees associated with each charity

extract_objects : Contains the charitable objects of a charity. The objects are split into text strings, and there are multiple rows for each charity depending on how long their objects are. The objects then have to be concatenated to produce the full objects.

extract_charity_aoo : Contains structured geographical data relating to the areas in which the charity operates. Most of the codes relate to common geographical aras, such as Local Authorities, or countries, but there are some additional codes such as “Throughout England and Wales” or “Throughout Greater London”.

extract_ar_submit : Contains details of when the charity’s annual returns were submitted.

extract_main_charity : Contains one record for every main registered charity (but doesn’t include removed or subsidiary charities). This contains additional details such as their website, email address and company number.

extract_acct_submit : Contains a record of when the charity’s accounts were submitted.

extract_financial : Contains a row for each year for each charity, showing what its income and expenditure was. Tends to only include the last five years or so.

extract_partb : Contains a row for each year for each charity, with more detailed financial information. Charities only have to fill in this information if their income is greater than £500,000 in that year.

Also included are a number of reference tables which list codes and values for some of the fields used in the data.

extract_class_ref : Lookup reference for the Charity Commission classification scheme, used in the extract_class table.

extract_remove_ref : Lookup reference for codes used to indicate the reasons why a charity has been removed from the register of charities, used in the extract_registration table.

extract_aoo_ref : Lookup reference for codes used in the extract_charity_aoo table.

Note: Subsidiary vs main charity

The data in these tables relates to both subsidiary and main charities, although not every table has records for both. Subsidiary charities are attached to a main charity, and share their registration number (usually shown as regno). The main charity within each group has a subno of 0, whereas actual subsidiaries have subno greater than 0. In general, when analysing the data these subsidiaries are excluded.

#### Importing the data into a spreadsheet for analysis

Because the files form a relational database, they aren’t particularly suitable for analysis in a program that relies on single tables, such as Excel. If you want to look at the files in Excel, the best place to start is extract_charity.csv, which includes a record for every charity.

After opening the CSV in Excel, filter by subno = 0 to get a list of “main” charities (see note above).

To perform more sophisticated analyses it may be possible to open two files and use a VLOOKUP function to merge the two sources. But generally it is easier to import the data into a database and use it there.

### Creating the database structure

The data is designed to be used in a database. If you use a SQL database, such as MySQL, you can use the included table-definitions.sql file to create the needed files.  This should be run in an SQL client, such as phpMyAdmin or HeidiSQL (see below).  If you are new to SQL, then you can find the basics of it here.

### Importing the data into a database

You can then use something like PhpMyAdmin to import the data into the files. The SQL table definitions have been tested with MySQL but may be suitable for use with other SQL databases with some adjustment.

You can also use the included MS Access template to set up an Access database, which you can then import the CSV files into.

When navigating the database, generally the unique identifier is the regno field (the charity’s registration number). But make sure you also build relationships with the subno key which identifies subsidiary charities.

#### HeidiSQL

An alternative way to import the data into the database is to use an open source piece of software called HeidiSQL, which is an SQL client for you to manage the database server (i.e., your database/s).  It allows you to connect to multiple databases, manage what they do and query them as well.  One important function it has is the ability to import data from CSV files, which you have ready!

You can use the instructions found on this page to complete the importing.  It is fairly straightforward to do but there are a few things to know beforehand:

• You import the data table by table
• The source column names (the ones in your CSV files) need to be exactly the same as the destination column names (the ones setup using the table-definitions.sql  file from above)
• Truncate (delete) the data if you have data in the table from a previous erroneous import
• To do this, select ‘Truncate destination table before import’ under ‘Options’

After you are happy that the tables in the CSVs contains all the data in the correct columns with the correct names then you are ready to import.  You will need to do this separately for each table that you want to import to your database.

Once the CSV files have been imported into the correct tables then you have you created your very own Charity Commission charity database! Happy querying!!

Share:

Published: 21-06-2016

https://data.ncvo.org.uk/a/almanac18/how-to-create-a-database-for-charity-commission-data/