%%capture
%load_ext sql
%sql sqlite:///dbs/w02/library.db
%config SqlMagic.style = '_DEPRECATED_DEFAULT'
Week 02: Library Survey Tables Case Study
0. Creating the Library Survey Tables
In the US, the Institute of Museum and Library Services (IMLS) measures library activity as part of its annual Public Libraries Survey. The survey collects data from more than 9000 library administrative entities, defined by the survey as agencies that provide library services to a particular locality. Data includes the number of branches, staff, books, hours open per year, etc. To teach the concepts below, we will build three tables containing the data from the survey related to the years of 2016, 2017 and 2018. For doing so, we read from the CSV files downloaded from their website. More especifically, some columns will be selected in the process to reduce the amount of non used attributes.
We are running SQL queries in a Jupyter environment.
This will open our database library.db
for us. Don’t bother yourself with the config details. That is a trick to run the SQL queries in this environment.
Now we turn to our tables. We have to create 3 tables. Let’s start with the table for 2018.
%%sql
-- Creating the 2018 Public Libraries Survey table
-- We drop an old copy of the table, if it exists.
;
DROP TABLE IF EXISTS libraries_2018
CREATE TABLE libraries_2018 (
stabr text NOT NULL,
fscskey text CONSTRAINT fscskey_2018_pkey PRIMARY KEY,
libid text NOT NULL,
libname text NOT NULL,
address text NOT NULL,
city text NOT NULL,zip text NOT NULL,
county text NOT NULL,
phone text NOT NULL,
c_relatn text NOT NULL,
c_legbas text NOT NULL,
c_admin text NOT NULL,
c_fscs text NOT NULL,
geocode text NOT NULL,
lsabound text NOT NULL,
startdate text NOT NULL,
enddate text NOT NULL,
popu_lsa integer NOT NULL,
popu_und integer NOT NULL,
centlib integer NOT NULL,
branlib integer NOT NULL,
bkmob integer NOT NULL,8,2) NOT NULL,
totstaff numeric(
bkvol integer NOT NULL,
ebook integer NOT NULL,
audio_ph integer NOT NULL,
audio_dl integer NOT NULL,
video_ph integer NOT NULL,
video_dl integer NOT NULL,
ec_lo_ot integer NOT NULL,
subscrip integer NOT NULL,
hrs_open integer NOT NULL,
visits integer NOT NULL,
reference integer NOT NULL,
regbor integer NOT NULL,
totcir integer NOT NULL,
kidcircl integer NOT NULL,
totpro integer NOT NULL,
gpterms integer NOT NULL,
pitusr integer NOT NULL,
wifisess integer NOT NULL,
obereg text NOT NULL,
statstru text NOT NULL,
statname text NOT NULL,
stataddr text NOT NULL,10,7) NOT NULL,
longitude numeric(10,7) NOT NULL
latitude numeric(; )
* sqlite:///dbs/w02/library.db
Done.
Done.
[]
This is an empty table. To fill the table, we need to convert our CSV entries to entitites in our database.
Here I came up with another trick. I created a subprocess that opens SQLite3 and calls the command to convert a CSV file into a table inside the database. The code is as follows:
import subprocess
import tempfile
import os
# Paths and settings
= "dbs/w02/pls_fy2018_libraries.csv"
csv_path = "dbs/w02/library.db"
db_path = "libraries_2018"
table_name
# Step 1: Create a temporary CSV file without the header
with open(csv_path, 'r', encoding='utf-8') as original, tempfile.NamedTemporaryFile('w', delete=False, newline='', encoding='utf-8') as noheader:
next(original) # skip the header
for line in original:
noheader.write(line)= noheader.name
temp_csv_path
# Step 2: Build the SQLite shell command
= f"""
sqlite_cmd .mode csv
.import '{temp_csv_path}' {table_name}
"""
# Step 3: Run the command with subprocess
"sqlite3", db_path], input=sqlite_cmd, text=True)
subprocess.run([
# Step 4: Clean up the temp file
os.remove(temp_csv_path)
print(f"✅ Imported '{csv_path}' into '{table_name}' in '{db_path}' (header skipped)")
✅ Imported 'dbs/w02/pls_fy2018_libraries.csv' into 'libraries_2018' in 'dbs/w02/library.db' (header skipped)
Let’s see if the data was loaded? Let’s peak on the 10 first rows.
%%sql
* FROM libraries_2018
SELECT 10; LIMIT
* sqlite:///dbs/w02/library.db
Done.
stabr | fscskey | libid | libname | address | city | zip | county | phone | c_relatn | c_legbas | c_admin | c_fscs | geocode | lsabound | startdate | enddate | popu_lsa | popu_und | centlib | branlib | bkmob | totstaff | bkvol | ebook | audio_ph | audio_dl | video_ph | video_dl | ec_lo_ot | subscrip | hrs_open | visits | reference | regbor | totcir | kidcircl | totpro | gpterms | pitusr | wifisess | obereg | statstru | statname | stataddr | longitude | latitude |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AK | AK0001 | AK0001-002 | ANCHOR POINT PUBLIC LIBRARY | 34020 NORTH FORK ROAD | ANCHOR POINT | 99556 | KENAI PENINSULA | 9072355692 | NO | NP | SO | Y | CI1 | N | 7/1/17 | 6/30/18 | 2057 | 2040 | 1 | 0 | 0 | 0.67 | 18201 | 0 | 179 | 0 | 3320 | 0 | 0 | 9 | 1245 | 6032 | 4 | 1726 | 11316 | 2294 | 60 | 8 | 1495 | 1182 | 08 | 00 | 00 | 00 | -151.825 | 59.77965 |
AK | AK0002 | AK0002-011 | ANCHORAGE PUBLIC LIBRARY | 3600 DENALI STREET | ANCHORAGE | 99503 | ANCHORAGE | 9073432892 | NO | CO | MO | Y | MA1 | N | 1/1/17 | 12/31/17 | 295365 | 292940 | 1 | 4 | 0 | 76.65 | 370812 | 21587 | 30963 | 12992 | 76404 | 0 | 14 | 7311 | 11400 | 723272 | 54306 | 135828 | 1574942 | 553651 | 1924 | 165 | 126846 | 90135 | 08 | 00 | 00 | 00 | -149.876 | 61.18744 |
AK | AK0003 | AK0003-002 | ANDERSON COMMUNITY LIBRARY | 101 FIRST STREET | ANDERSON | 99744 | DENALI | 9075822628 | NO | CI | SO | Y | CI1 | N | 7/1/17 | 6/30/18 | 269 | 267 | 1 | 0 | 0 | 0.75 | 15314 | 0 | 201 | 0 | 1403 | 0 | 0 | 0 | 480 | 592 | 300 | 113 | 1137 | 264 | 4 | 5 | 225 | 0 | 08 | 00 | 00 | 00 | -149.187 | 64.34363 |
AK | AK0006 | AK0006-002 | KUSKOKWIM CONSORTIUM LIBRARY | 420 CHIEF EDDIE HOFFMAN HIGHWAY | BETHEL | 99559 | BETHEL | 9075434516 | NO | MJ | SO | Y | CI1 | N | 7/1/17 | 6/30/18 | 6135 | 6085 | 1 | 0 | 0 | 3 | 34860 | 21587 | 425 | 12992 | 3782 | 0 | 0 | 5 | 2040 | 51000 | 1530 | 1912 | 14067 | 3746 | 406 | 8 | 6600 | 5716 | 08 | 00 | 00 | 00 | -161.771 | 60.79114 |
AK | AK0007 | AK0007-002 | BIG LAKE PUBLIC LIBRARY | 3140 SOUTH BIG LAKE ROAD | BIG LAKE | 99652 | MATANUSKA-SUSITNA | 9078617635 | NO | CO | SO | Y | CO1 | N | 7/1/17 | 6/30/18 | 12847 | 12742 | 1 | 0 | 0 | 3 | 28698 | 21587 | 2140 | 12992 | 3805 | 0 | 0 | 47 | 2551 | 62249 | 887 | 2890 | 36670 | 15691 | 363 | 18 | 15022 | 8125 | 08 | 00 | 00 | 00 | -149.819 | 61.5475 |
AK | AK0008 | AK0008-002 | CANTWELL COMMUNITY LIBRARY | 1 SCHOOL ROAD | CANTWELL | 99729 | DENALI | 9077682372 | NO | NP | SO | Y | CI1 | N | 7/1/17 | 6/30/18 | 202 | 200 | 1 | 0 | 0 | 0.8 | 10376 | 21587 | 275 | 12992 | 562 | 0 | 0 | 15 | 720 | 2500 | 390 | 183 | 2828 | 1198 | 45 | 1 | 50 | 95 | 08 | 00 | 00 | 00 | -148.9 | 63.3912 |
AK | AK0011 | AK0011-002 | CHINIAK PUBLIC LIBRARY | 43318 SPRUCE WAY | CHINIAK | 99615 | KODIAK ISLAND | 9075120880 | NO | NP | SO | N | CI1 | N | 7/1/17 | 6/30/18 | 44 | 44 | 1 | 0 | 0 | 0 | 2300 | 21587 | 0 | 12992 | 257 | 0 | 0 | 18 | 132 | 302 | 0 | 39 | 465 | 301 | 27 | 2 | 75 | 220 | 08 | 00 | 00 | 00 | -152.231 | 57.61245 |
AK | AK0014 | AK0014-002 | COLD BAY PUBLIC LIBRARY | 10 BARANOV ROAD | COLD BAY | 99571 | ALEUTIANS EAST | 9075322878 | NO | NP | SO | Y | CI1 | N | 7/1/17 | 6/30/18 | 63 | 62 | 1 | 0 | 0 | 0.25 | 3226 | 0 | 445 | 0 | 2673 | 0 | 0 | 0 | 400 | 1080 | 31 | 36 | 121 | 11 | 5 | 4 | 235 | 893 | 08 | 00 | 00 | 00 | -160.693 | 55.69687 |
AK | AK0015 | AK0015-002 | COOPER LANDING COMMUNITY LIBRARY | 18511 BEAN CREEK ROAD | COOPER LANDING | 99572 | KENAI PENINSULA | 9075951241 | NO | NP | SO | N | CI1 | N | 7/1/17 | 6/30/18 | 265 | 263 | 1 | 0 | 0 | 0 | 4163 | 0 | 192 | 0 | 926 | 0 | 0 | 1 | 1092 | 2452 | 70 | 834 | 2232 | 772 | 11 | 4 | 395 | 357 | 08 | 00 | 00 | 00 | -149.823 | 60.49404 |
AK | AK0016 | AK0016-002 | CORDOVA PUBLIC LIBRARY | 601 FIRST STREET | CORDOVA | 99574 | VALDEZ-CORDOVA | 9074246667 | NO | CI | SO | Y | CI1 | N | 1/1/17 | 12/31/17 | 2360 | 2341 | 1 | 0 | 0 | 5 | 17888 | 21587 | 560 | 12992 | 1200 | 0 | 0 | 48 | 2575 | 20844 | 1024 | 2527 | 14435 | 4298 | 282 | 6 | 5357 | 6997 | 08 | 00 | 00 | 00 | -145.758 | 60.54716 |
Now, lets count how many rows are in this table.
%%sql
*) FROM libraries_2018; SELECT count(
* sqlite:///dbs/w02/library.db
Done.
count(*) |
---|
9261 |
The same should be done for the years 2016 and 2017. The code below is long, but identical to what we’ve done above.
%%sql
-- Creating the 2016 Public Libraries Survey table
-- We drop an old copy of the table, if it exists.
;
DROP TABLE IF EXISTS libraries_2016;
DROP TABLE IF EXISTS libraries_2017
CREATE TABLE libraries_2016 (
stabr text NOT NULL,
fscskey text CONSTRAINT fscskey_2018_pkey PRIMARY KEY,
libid text NOT NULL,
libname text NOT NULL,
address text NOT NULL,
city text NOT NULL,zip text NOT NULL,
county text NOT NULL,
phone text NOT NULL,
c_relatn text NOT NULL,
c_legbas text NOT NULL,
c_admin text NOT NULL,
c_fscs text NOT NULL,
geocode text NOT NULL,
lsabound text NOT NULL,
startdate text NOT NULL,
enddate text NOT NULL,
popu_lsa integer NOT NULL,
popu_und integer NOT NULL,
centlib integer NOT NULL,
branlib integer NOT NULL,
bkmob integer NOT NULL,8,2) NOT NULL,
totstaff numeric(
bkvol integer NOT NULL,
ebook integer NOT NULL,
audio_ph integer NOT NULL,
audio_dl integer NOT NULL,
video_ph integer NOT NULL,
video_dl integer NOT NULL,
ec_lo_ot integer NOT NULL,
subscrip integer NOT NULL,
hrs_open integer NOT NULL,
visits integer NOT NULL,
reference integer NOT NULL,
regbor integer NOT NULL,
totcir integer NOT NULL,
kidcircl integer NOT NULL,
totpro integer NOT NULL,
gpterms integer NOT NULL,
pitusr integer NOT NULL,
wifisess integer NOT NULL,
obereg text NOT NULL,
statstru text NOT NULL,
statname text NOT NULL,
stataddr text NOT NULL,10,7) NOT NULL,
longitude numeric(10,7) NOT NULL
latitude numeric(;
)
CREATE TABLE libraries_2017 (
stabr text NOT NULL,
fscskey text CONSTRAINT fscskey_2018_pkey PRIMARY KEY,
libid text NOT NULL,
libname text NOT NULL,
address text NOT NULL,
city text NOT NULL,zip text NOT NULL,
county text NOT NULL,
phone text NOT NULL,
c_relatn text NOT NULL,
c_legbas text NOT NULL,
c_admin text NOT NULL,
c_fscs text NOT NULL,
geocode text NOT NULL,
lsabound text NOT NULL,
startdate text NOT NULL,
enddate text NOT NULL,
popu_lsa integer NOT NULL,
popu_und integer NOT NULL,
centlib integer NOT NULL,
branlib integer NOT NULL,
bkmob integer NOT NULL,8,2) NOT NULL,
totstaff numeric(
bkvol integer NOT NULL,
ebook integer NOT NULL,
audio_ph integer NOT NULL,
audio_dl integer NOT NULL,
video_ph integer NOT NULL,
video_dl integer NOT NULL,
ec_lo_ot integer NOT NULL,
subscrip integer NOT NULL,
hrs_open integer NOT NULL,
visits integer NOT NULL,
reference integer NOT NULL,
regbor integer NOT NULL,
totcir integer NOT NULL,
kidcircl integer NOT NULL,
totpro integer NOT NULL,
gpterms integer NOT NULL,
pitusr integer NOT NULL,
wifisess integer NOT NULL,
obereg text NOT NULL,
statstru text NOT NULL,
statname text NOT NULL,
stataddr text NOT NULL,10,7) NOT NULL,
longitude numeric(10,7) NOT NULL
latitude numeric(; )
* sqlite:///dbs/w02/library.db
Done.
Done.
Done.
Done.
[]
# Importing the 2017 Public Libraries Survey data
# Paths and settings
= "dbs/w02/pls_fy2017_libraries.csv"
csv_path = "dbs/w02/library.db"
db_path = "libraries_2017"
table_name
# Step 1: Create a temporary CSV file without the header
with open(csv_path, 'r', encoding='utf-8') as original, tempfile.NamedTemporaryFile('w', delete=False, newline='', encoding='utf-8') as noheader:
next(original) # skip the header
for line in original:
noheader.write(line)= noheader.name
temp_csv_path
# Step 2: Build the SQLite shell command
= f"""
sqlite_cmd .mode csv
.import '{temp_csv_path}' {table_name}
"""
# Step 3: Run the command with subprocess
"sqlite3", db_path], input=sqlite_cmd, text=True)
subprocess.run([
# Step 4: Clean up the temp file
os.remove(temp_csv_path)
print(f"✅ Imported '{csv_path}' into '{table_name}' in '{db_path}' (header skipped)")
✅ Imported 'dbs/w02/pls_fy2017_libraries.csv' into 'libraries_2017' in 'dbs/w02/library.db' (header skipped)
# Importing the 2016 Public Libraries Survey data
# Paths and settings
= "dbs/w02/pls_fy2016_libraries.csv"
csv_path = "dbs/w02/library.db"
db_path = "libraries_2016"
table_name
# Step 1: Create a temporary CSV file without the header
with open(csv_path, 'r', encoding='utf-8') as original, tempfile.NamedTemporaryFile('w', delete=False, newline='', encoding='utf-8') as noheader:
next(original) # skip the header
for line in original:
noheader.write(line)= noheader.name
temp_csv_path
# Step 2: Build the SQLite shell command
= f"""
sqlite_cmd .mode csv
.import '{temp_csv_path}' {table_name}
"""
# Step 3: Run the command with subprocess
"sqlite3", db_path], input=sqlite_cmd, text=True)
subprocess.run([
# Step 4: Clean up the temp file
os.remove(temp_csv_path)
print(f"✅ Imported '{csv_path}' into '{table_name}' in '{db_path}' (header skipped)")
✅ Imported 'dbs/w02/pls_fy2016_libraries.csv' into 'libraries_2016' in 'dbs/w02/library.db' (header skipped)
Let’s check the rows count for each of the tables now.
%%sql
2016 AS 'Year', count(*) AS NumRows FROM libraries_2016
SELECT
UNION ALL2017, count(*) AS NumRows FROM libraries_2017
SELECT
UNION ALL2018, count(*) AS NumRows FROM libraries_2018; SELECT
* sqlite:///dbs/w02/library.db
Done.
Year | NumRows |
---|---|
2016 | 9252 |
2017 | 9245 |
2018 | 9261 |
1.1 Counting Distinct Values in a Column
We can combine DISTINCT
with combinations of values like count()
. This will return a count of distinct values from a column.
%%sql
SELECT count(libname); FROM libraries_2017
* sqlite:///dbs/w02/library.db
Done.
count(libname) |
---|
9245 |
%%sql
SELECT count(DISTINCT libname); FROM libraries_2017
* sqlite:///dbs/w02/library.db
Done.
count(DISTINCT libname) |
---|
8455 |
Removing duplicates reduces the number of library names. It happens that some libraries share their names with other agencies. We will see how we can see the duplicates soon.
1.2 Finding Maximum and Minimum Values using max() and min()
The max()
and min()
funtions return the largest and smallest values in a column. In our case, we will be using them to detect some annomalies in the data. Let’s check the number of visits for 2018.
%%sql
max(visits), min(visits)
SELECT ; FROM libraries_2018
* sqlite:///dbs/w02/library.db
Done.
max(visits) | min(visits) |
---|---|
16686945 | -3 |
The value of -3 is unexpected. It happens that the negative values for this database indicate “not applicable”, used when a library has closed temporarily or permanently. This is a terrible way of coding missing data, but we can cope with that.
We could use WHERE min(visits > 0)
to guarantee that only positive numbers are in the game.
A better way of handling these situations would be using NULL values and adding a column to hold text explaining why the value is NULL.
1.3 Aggregating with GROUP BY
We can use the GROUP BY
clause to group rows that have the same values in specified columns into summary rows. The GROUP BY
statement is often used with aggregate functions like COUNT()
, SUM()
, AVG()
, etc., to perform operations on each group of data.
Let’s see how many libraries are in each state. But first, let’s check which states are in the database.
%%sql
SELECT stabr
FROM libraries_2018
GROUP BY stabr; ORDER BY stabr
* sqlite:///dbs/w02/library.db
Done.
stabr |
---|
AK |
AL |
AR |
AS |
AZ |
CA |
CO |
CT |
DC |
DE |
FL |
GA |
GU |
HI |
IA |
ID |
IL |
IN |
KS |
KY |
LA |
MA |
MD |
ME |
MI |
MN |
MO |
MP |
MS |
MT |
NC |
ND |
NE |
NH |
NJ |
NM |
NV |
NY |
OH |
OK |
OR |
PA |
RI |
SC |
SD |
TN |
TX |
UT |
VA |
VI |
VT |
WA |
WI |
WV |
WY |
Notice there are no duplicates in state names. Those are the 50 states plus Washington D.C. and several U.S. territories. We don’t have to worry about duplicates in this case.
We could also have grouped by city and state, using two columns instead of one. Let’s see how would that work.
%%sql
SELECT city, stabr
FROM libraries_2018
GROUP BY city, stabr
ORDER BY city, stabr20; LIMIT
* sqlite:///dbs/w02/library.db
Done.
city | stabr |
---|---|
ABBEVILLE | AL |
ABBEVILLE | LA |
ABBEVILLE | SC |
ABBOTSFORD | WI |
ABERDEEN | ID |
ABERDEEN | SD |
ABERNATHY | TX |
ABILENE | KS |
ABILENE | TX |
ABINGDON | IL |
ABINGDON | VA |
ABINGTON | CT |
ABINGTON | MA |
ABINGTON | PA |
ABIQUIU | NM |
ABSECON | NJ |
ACCOMAC | VA |
ACKERMAN | MS |
ACKLEY | IA |
ACTON | MA |
We will limit the number of rows to 10, so we can see the result. The LIMIT
clause is used to specify the maximum number of records to return. As you can see, the results are sorted by the first column, which is the city name and then by the state name. The ORDER BY
clause is used to sort the result set in ascending or descending order. By default, it sorts in ascending order. We could have used the DESC
keyword to sort in descending order. I will leave this as an exercise for you.
By combinint GROUP BY
with an aggregate function, we can get the number of libraries in each state. Let’s see how many libraries are in each state.
%%sql
*) as total
SELECT stabr, count(
FROM libraries_2018
GROUP BY stabr
ORDER BY total DESC20; LIMIT
* sqlite:///dbs/w02/library.db
Done.
stabr | total |
---|---|
NY | 756 |
IL | 623 |
TX | 560 |
IA | 544 |
PA | 451 |
MI | 398 |
WI | 381 |
MA | 369 |
KS | 325 |
NJ | 295 |
ME | 263 |
OH | 251 |
NE | 245 |
IN | 236 |
NH | 225 |
AL | 224 |
CA | 218 |
CT | 192 |
TN | 186 |
VT | 184 |
This is returning the values in the stabr
column and the count of how many rows have a given stabr
value. The COUNT(*)
function counts all rows in each group. The GROUP BY
clause groups the result set by the stabr
column, and the ORDER BY
clause sorts the result set by the count of libraries in descending order.
Library agencies can have many branches. Not all of them will provide an outlet for the public to walk in and borrow books. Some of them are just a warehouse for books, or a place to store the books that are not in use. We can see how many branches are in each state by checking the centlib
and branlib
columns. You can use the SUM()
function in this case. This will be left as an exercise for you.
We can also use GROUP BY
to group by multiple columns. For example, we can verify how many libraries changed their addresses by state. The column stataddr
contains a code indicating whether the agency’s address changed or not. The code is as follows:
- 00 - No change
- 07 - Moved to a new address
- 15 - Minor address change
The code below will show how many libraries changed their address by state.
%%sql
*) as total
SELECT stabr, stataddr, count(
FROM libraries_2018
GROUP BY stabr, stataddr
ORDER BY stabr, stataddr20; LIMIT
* sqlite:///dbs/w02/library.db
Done.
stabr | stataddr | total |
---|---|---|
AK | 00 | 82 |
AL | 00 | 220 |
AL | 07 | 3 |
AL | 15 | 1 |
AR | 00 | 58 |
AR | 07 | 1 |
AR | 15 | 1 |
AS | 00 | 1 |
AZ | 00 | 88 |
AZ | 15 | 1 |
CA | 00 | 180 |
CA | 07 | 2 |
CA | 15 | 36 |
CO | 00 | 113 |
CT | 00 | 190 |
CT | 07 | 1 |
CT | 15 | 1 |
DC | 00 | 1 |
DE | 00 | 21 |
FL | 00 | 79 |
As expected, most libraries did not change their address.
📙 Visits over the years
We saw before some negative numbers in the visits
column due to the way data was collected. Now we turn to see the trends in library visits over the years. We will use the SUM()
function to get the total number of visits for each year. The code below will show how many visits were made in each year. We will discard the negative values for this analysis. The SUM()
function will ignore the negative values, so we don’t have to worry about them.
%%sql
sum(visits) AS visits_2018
SELECT
FROM libraries_2018> 0; WHERE visits
* sqlite:///dbs/w02/library.db
Done.
visits_2018 |
---|
1292348697 |
%%sql
sum(visits) AS visits_2017
SELECT
FROM libraries_2017> 0; WHERE visits
* sqlite:///dbs/w02/library.db
Done.
visits_2017 |
---|
1319803999 |
%%sql
sum(visits) AS visits_2016
SELECT
FROM libraries_2016> 0; WHERE visits
* sqlite:///dbs/w02/library.db
Done.
visits_2016 |
---|
1355648987 |
We went from 1.36 billion visits in 2016 to 1.32 billion in 2017, and then to 1.29 billion in 2018. This is a decrease of about 5% over the three years.
But we know that the number of libraries change as they close or open. To have a better idea of the trend, we will only consider libraries present in the three years. We can do this by using the INNER JOIN
clause. The code below will show how many visits were made in each year, but only for libraries that were present in all three years.
%%sql
sum(pls18.visits) AS visits_2018,
SELECT sum(pls17.visits) AS visits_2017,
sum(pls16.visits) AS visits_2016
FROM libraries_2018 AS pls18= pls17.fscskey
JOIN libraries_2017 AS pls17 ON pls18.fscskey = pls16.fscskey
JOIN libraries_2016 AS pls16 ON pls18.fscskey >= 0
WHERE pls18.visits >= 0
AND pls17.visits >= 0; AND pls16.visits
* sqlite:///dbs/w02/library.db
Done.
visits_2018 | visits_2017 | visits_2016 |
---|---|---|
1278148838 | 1319325387 | 1355078384 |
The results didn’t change much. We can also take a look at the number of wifi accesses the year in the same way we did for the visitors.
%%sql
sum(pls18.wifisess) AS wifisess_2018,
SELECT sum(pls17.wifisess) AS wifisess_2017,
sum(pls16.wifisess) AS wifisess_2016
FROM libraries_2018 AS pls18= pls17.fscskey
JOIN libraries_2017 AS pls17 ON pls18.fscskey = pls16.fscskey
JOIN libraries_2016 AS pls16 ON pls18.fscskey >= 0
WHERE pls18.wifisess >= 0
AND pls17.wifisess >= 0; AND pls16.wifisess
* sqlite:///dbs/w02/library.db
Done.
wifisess_2018 | wifisess_2017 | wifisess_2016 |
---|---|---|
349767271 | 311336231 | 234926102 |
Even though the number of visitors decreased, the number of wifi accesses increased. This is a clear indication that people are using the library for more than just borrowing books. They are also using it as a place to study and work.
To investigate if the decrease in the number of visitors is a trend in all regions, we must group by state. The code below will show how many visits were made in each year, but only for libraries that were present in all three years.
%%sql
SELECT pls2018.stabr,sum(pls2018.visits) AS visits_2018,
sum(pls2017.visits) AS visits_2017,
sum(pls2016.visits) AS visits_2016,
round ( (
sum(pls2018.visits) - sum(pls2017.visits) ) * 1.0 / sum(pls2017.visits) ) * 100, 1
(
) AS visits_ratio_2018_2017,round ( ( (sum(pls2017.visits) - sum(pls2016.visits) ) * 1.0 /
sum(pls2016.visits) ) * 100, 1) AS visits_ratio_2017_2016
FROM libraries_2018 AS pls2018= pls2017.fscskey
JOIN libraries_2017 AS pls2017 ON pls2018.fscskey = pls2016.fscskey
JOIN libraries_2016 AS pls2016 ON pls2018.fscskey >= 0
WHERE pls2018.visits >= 0
AND pls2017.visits >= 0
AND pls2016.visits
GROUP BY pls2018.stabr; ORDER BY visits_ratio_2018_2017 DESC
* sqlite:///dbs/w02/library.db
Done.
stabr | visits_2018 | visits_2017 | visits_2016 | visits_ratio_2018_2017 | visits_ratio_2017_2016 |
---|---|---|---|---|---|
SD | 3824804 | 3699212 | 3722376 | 3.4 | -0.6 |
MT | 4332900 | 4215484 | 4298268 | 2.8 | -1.9 |
FL | 68423689 | 66697122 | 70991029 | 2.6 | -6.0 |
ND | 2216377 | 2162189 | 2201730 | 2.5 | -1.8 |
ID | 8179077 | 8029503 | 8597955 | 1.9 | -6.6 |
DC | 3632539 | 3593201 | 3930763 | 1.1 | -8.6 |
UT | 15326963 | 15295494 | 16096911 | 0.2 | -5.0 |
NH | 7045010 | 7028800 | 7236567 | 0.2 | -2.9 |
ME | 6746380 | 6731768 | 6811441 | 0.2 | -1.2 |
DE | 4122181 | 4117904 | 4125899 | 0.1 | -0.2 |
OK | 13399265 | 13491194 | 13112511 | -0.7 | 2.9 |
WY | 3338772 | 3367413 | 3536788 | -0.9 | -4.8 |
MA | 39926583 | 40453003 | 40427356 | -1.3 | 0.1 |
WA | 37338635 | 37916034 | 38634499 | -1.5 | -1.9 |
MN | 22952388 | 23326303 | 24033731 | -1.6 | -2.9 |
NM | 6908686 | 7036582 | 7178428 | -1.8 | -2.0 |
VA | 33913162 | 34563079 | 35649602 | -1.9 | -3.0 |
KS | 13483333 | 13737900 | 13699223 | -1.9 | 0.3 |
NY | 97921323 | 100012193 | 103081304 | -2.1 | -3.0 |
WI | 30097183 | 30865470 | 31442577 | -2.5 | -1.8 |
AL | 14188647 | 14583055 | 15637164 | -2.7 | -6.7 |
MI | 44758918 | 46052561 | 46734166 | -2.8 | -1.5 |
CO | 31085356 | 31975615 | 32011432 | -2.8 | -0.1 |
NJ | 40947978 | 42181061 | 42429576 | -2.9 | -0.6 |
CA | 146656984 | 151056672 | 155613529 | -2.9 | -2.9 |
CT | 20423515 | 21051597 | 21603777 | -3.0 | -2.6 |
RI | 5490076 | 5669309 | 5778025 | -3.2 | -1.9 |
PA | 40885876 | 42243049 | 44105513 | -3.2 | -4.2 |
OR | 19592295 | 20244499 | 20391927 | -3.2 | -0.7 |
IN | 30836051 | 31849195 | 33363879 | -3.2 | -4.5 |
IA | 16674976 | 17245764 | 17753953 | -3.3 | -2.9 |
NV | 9334070 | 9684935 | 9733359 | -3.6 | -0.5 |
NE | 7449868 | 7726127 | 7873829 | -3.6 | -1.9 |
VT | 3526357 | 3673501 | 3721332 | -4.0 | -1.3 |
SC | 13989511 | 14567585 | 15802934 | -4.0 | -7.8 |
AK | 3268073 | 3402486 | 3467234 | -4.0 | -1.9 |
IL | 63466887 | 66166082 | 67336230 | -4.1 | -1.7 |
NC | 31263894 | 32621293 | 33605264 | -4.2 | -2.9 |
MD | 24976429 | 26089963 | 27481583 | -4.3 | -5.1 |
AZ | 23439707 | 24584201 | 25315276 | -4.7 | -2.9 |
OH | 68176967 | 71895854 | 74119719 | -5.2 | -3.0 |
WV | 4944242 | 5231251 | 5231443 | -5.5 | -0.0 |
MO | 24663467 | 26117633 | 27065546 | -5.6 | -3.5 |
KY | 16910828 | 17909495 | 18028488 | -5.6 | -0.7 |
LA | 16227594 | 17211007 | 20262385 | -5.7 | -15.1 |
TX | 66168387 | 70514138 | 70975901 | -6.2 | -0.7 |
TN | 18102460 | 19396554 | 18701973 | -6.7 | 3.7 |
GA | 26835701 | 28816233 | 27987249 | -6.9 | 3.0 |
AR | 9551686 | 10358181 | 10596035 | -7.8 | -2.2 |
GU | 75119 | 81572 | 71813 | -7.9 | 13.6 |
MS | 7602710 | 8581994 | 8915406 | -11.4 | -3.7 |
HI | 3456131 | 4135229 | 4490320 | -16.4 | -7.9 |
AS | 48828 | 67848 | 63166 | -28.0 | 7.4 |
1.4 Filtering Groups with HAVING
The HAVING
clause is used to filter records that work on summarized GROUP BY results. It is similar to the WHERE
clause, but it is used for filtering groups instead of individual rows.
Let’s try to compare libraries located in a state with similar characteristics. For that, we will use HAVING
to place conditions on gorups created by aggregating.
%%sql
SELECT pls2018.stabr,sum(pls2018.visits) AS visits_2018,
sum(pls2017.visits) AS visits_2017,
sum(pls2016.visits) AS visits_2016,
round ( (
sum(pls2018.visits) - sum(pls2017.visits) ) * 1.0 / sum(pls2017.visits) ) * 100, 1
(
) AS visits_ratio_2018_2017,round ( ( (sum(pls2017.visits) - sum(pls2016.visits) ) * 1.0 /
sum(pls2016.visits) ) * 100, 1) AS visits_ratio_2017_2016
FROM libraries_2018 AS pls2018= pls2017.fscskey
JOIN libraries_2017 AS pls2017 ON pls2018.fscskey = pls2016.fscskey
JOIN libraries_2016 AS pls2016 ON pls2018.fscskey >= 0
WHERE pls2018.visits >= 0
AND pls2017.visits >= 0
AND pls2016.visits
GROUP BY pls2018.stabrsum(pls2018.visits) > 50000000
HAVING ; ORDER BY visits_ratio_2018_2017 DESC
* sqlite:///dbs/w02/library.db
Done.
stabr | visits_2018 | visits_2017 | visits_2016 | visits_ratio_2018_2017 | visits_ratio_2017_2016 |
---|---|---|---|---|---|
FL | 68423689 | 66697122 | 70991029 | 2.6 | -6.0 |
NY | 97921323 | 100012193 | 103081304 | -2.1 | -3.0 |
CA | 146656984 | 151056672 | 155613529 | -2.9 | -2.9 |
IL | 63466887 | 66166082 | 67336230 | -4.1 | -1.7 |
OH | 68176967 | 71895854 | 74119719 | -5.2 | -3.0 |
TX | 66168387 | 70514138 | 70975901 | -6.2 | -0.7 |
We are including only rows with a sum greater than 50 million visitors in 2018. We got six states as a result of this arbitrary condition. Only Florida experienced an increase in the number of visitors from 2017 to 2018.