Convert CSV files into a SQLite database

Last update: Dec 27, 2022

Overview

csvs-to-sqlite

Convert CSV files into a SQLite database. Browse and publish that SQLite database with Datasette.

Basic usage:

csvs-to-sqlite myfile.csv mydatabase.db

This will create a new SQLite database called mydatabase.db containing a single table, myfile, containing the CSV content.

You can provide multiple CSV files:

csvs-to-sqlite one.csv two.csv bundle.db

The bundle.db database will contain two tables, one and two.

This means you can use wildcards:

csvs-to-sqlite ~/Downloads/*.csv my-downloads.db

If you pass a path to one or more directories, the script will recursively search those directories for CSV files and create tables for each one.

csvs-to-sqlite ~/path/to/directory all-my-csvs.db

Handling TSV (tab-separated values)

You can use the -s option to specify a different delimiter. If you want to use a tab character you'll need to apply shell escaping like so:

csvs-to-sqlite my-file.tsv my-file.db -s $'\t'

Refactoring columns into separate lookup tables

Let's say you have a CSV file that looks like this:

county,precinct,office,district,party,candidate,votes
Clark,1,President,,REP,John R. Kasich,5
Clark,2,President,,REP,John R. Kasich,0
Clark,3,President,,REP,John R. Kasich,7

(Real example taken from the Open Elections project)

You can now convert selected columns into separate lookup tables using the new --extract-column option (shortname: -c) - for example:

csvs-to-sqlite openelections-data-*/*.csv \
    -c county:County:name \
    -c precinct:Precinct:name \
    -c office -c district -c party -c candidate \
    openelections.db

The format is as follows:

column_name:optional_table_name:optional_table_value_column_name

If you just specify the column name e.g. -c office, the following table will be created:

CREATE TABLE "office" (
    "id" INTEGER PRIMARY KEY,
    "value" TEXT
);

If you specify all three options, e.g. -c precinct:Precinct:name the table will look like this:

CREATE TABLE "Precinct" (
    "id" INTEGER PRIMARY KEY,
    "name" TEXT
);

The original tables will be created like this:

CREATE TABLE "ca__primary__san_francisco__precinct" (
    "county" INTEGER,
    "precinct" INTEGER,
    "office" INTEGER,
    "district" INTEGER,
    "party" INTEGER,
    "candidate" INTEGER,
    "votes" INTEGER,
    FOREIGN KEY (county) REFERENCES County(id),
    FOREIGN KEY (party) REFERENCES party(id),
    FOREIGN KEY (precinct) REFERENCES Precinct(id),
    FOREIGN KEY (office) REFERENCES office(id),
    FOREIGN KEY (candidate) REFERENCES candidate(id)
);

They will be populated with IDs that reference the new derived tables.

Installation

$ pip install csvs-to-sqlite

csvs-to-sqlite now requires Python 3. If you are running Python 2 you can install the last version to support Python 2:

$ pip install csvs-to-sqlite==0.9.2

csvs-to-sqlite --help

Usage: csvs-to-sqlite [OPTIONS] PATHS... DBNAME

  PATHS: paths to individual .csv files or to directories containing .csvs

  DBNAME: name of the SQLite database file to create

Options:
  -s, --separator TEXT            Field separator in input .csv
  -q, --quoting INTEGER           Control field quoting behavior per csv.QUOTE_*
                                  constants. Use one of QUOTE_MINIMAL (0),
                                  QUOTE_ALL (1), QUOTE_NONNUMERIC (2) or
                                  QUOTE_NONE (3).

  --skip-errors                   Skip lines with too many fields instead of
                                  stopping the import

  --replace-tables                Replace tables if they already exist
  -t, --table TEXT                Table to use (instead of using CSV filename)
  -c, --extract-column TEXT       One or more columns to 'extract' into a
                                  separate lookup table. If you pass a simple
                                  column name that column will be replaced with
                                  integer foreign key references to a new table
                                  of that name. You can customize the name of
                                  the table like so:     state:States:state_name
                                  
                                  This will pull unique values from the 'state'
                                  column and use them to populate a new 'States'
                                  table, with an id column primary key and a
                                  state_name column containing the strings from
                                  the original column.

  -d, --date TEXT                 One or more columns to parse into ISO
                                  formatted dates

  -dt, --datetime TEXT            One or more columns to parse into ISO
                                  formatted datetimes

  -df, --datetime-format TEXT     One or more custom date format strings to try
                                  when parsing dates/datetimes

  -pk, --primary-key TEXT         One or more columns to use as the primary key
  -f, --fts TEXT                  One or more columns to use to populate a full-
                                  text index

  -i, --index TEXT                Add index on this column (or a compound index
                                  with -i col1,col2)

  --shape TEXT                    Custom shape for the DB table - format is
                                  csvcol:dbcol(TYPE),...

  --filename-column TEXT          Add a column with this name and populate with
                                  CSV file name

  --fixed-column 
   
    ...   Populate column with a fixed string
  --fixed-column-int 
    
     ...
                                  Populate column with a fixed integer
  --fixed-column-float 
     
      ...
                                  Populate column with a fixed float
  --no-index-fks                  Skip adding index to foreign key columns
                                  created using --extract-column (default is to
                                  add them)

  --no-fulltext-fks               Skip adding full-text index on values
                                  extracted using --extract-column (default is
                                  to add them)

  --just-strings                  Import all columns as text strings by default
                                  (and, if specified, still obey --shape,
                                  --date/datetime, and --datetime-format)

  --version                       Show the version and exit.
  --help                          Show this message and exit.

Comments

No create index option - please create new release

Hi Simon,

Love this wonderful tool! Thanks a million for using it and prompting sqlite!

The README shows an option for creating an index: https://github.com/simonw/csvs-to-sqlite/blob/master/README.md

-i, --index TEXT Add index on this column (or a compound index with -i col1,col2)

I don't seem to have that version in my csvs-to-sqlite:

% csvs-to-sqlite --help
Usage: csvs-to-sqlite [OPTIONS] PATHS... DBNAME

  PATHS: paths to individual .csv files or to directories containing .csvs

  DBNAME: name of the SQLite database file to create

Options:
  -s, --separator TEXT       Field separator in input .csv
  --replace-tables           Replace tables if they already exist
  -c, --extract-column TEXT  One or more columns to 'extract' into a separate
                             lookup table. If you pass a simple column name
                             that column will be replaced with integer foreign
                             key references to a new table of that name. You
                             can customize the name of the table like so:

                                 --extract-column state:States:state_name

                             This will pull unique values from the 'state'
                             column and use them to populate a new 'States'
                             table, with an id column primary key and a
                             state_name column containing the strings from the
                             original column.
  -f, --fts TEXT             One or more columns to use to populate a full-
                             text index
  --version                  Show the version and exit.
  --help                     Show this message and exit.

% csvs-to-sqlite --version
csvs-to-sqlite, version 0.7

Is this an option in an unreleased build of csvs-to-sqlite? If so, when do you think it would be ready?

Thanks!

opened by jungle-boogie 7

Add support for loading CSVs directly from URLs

Closes #4

This adds a new parameter type to support paths or URLs on the CLI.

Something like csvs-to-sqlite https://data.stadt-zuerich.ch/dataset/wirtschaft_preise_ziw_basis2010/resource/54c7ae15-f673-41af-a7cd-b861b8c6744e/download/ziwbasis2010apr2010apr2016.csv blah.db now works.

opened by betatim 6
Ability to merge multiple CSV files into a single table (and store CSV filename in a column)
If you feed the tool a number of CSVs there should be an option to have them all go into the same database table, optionally with a column that gets populated with the name of the CSV file:

csvs-to-sqlite ~/openelections-data/ --table=combined_table --filename-column=source
opened by simonw 5
Allow appending files to an existing SQLite database

Tool currently quits with an error if you try this.

Should it be possible to update existing tables with new data? Not sure about this. We certainly can’t handle schema changes. We would need to be told a column to treat as a “primary key”.

Given how cheap it is to recreate the database from scratch I’m inclined to say it’s not worth bothering with table updates.

opened by simonw 5
add "fixed-column" option

For colname:value, add a column colname and populate with value.

The use case here is that I have a snakemake workflow that produces a lot of csv files with a structure like: samples/<sample_id>/<condition_name>/this.csv I want to import them all into the same table, where sample_id and condition_name are columns. I tried to match how --filename_column works, but allow for multiple new columns to be added.

I haven't yet added any tests, but if you think this option would be a useful addition, I will add some tests to cover the basic use case and the combination with shape.

opened by williamrowell 4
Interpret all columns as TEXT data type

Is it possible to interpret all columns as TEXT datatype (through a flag maybe?) I think the columns values are sampled and then column datatype is guessed. If there is an incompatible value in some row, then that row seems to be skipped. So instead, is it possible I load everything as TEXT data type? I just need the data in some format(for data comparison purposes), But all csv data must go into the table without skipping rows.

Thanks Thyag

opened by tsundara 4
csvs-to-sqlite has a hard requirement on package versions.

Currently version required is hardcoded:

https://github.com/simonw/csvs-to-sqlite/blob/dccbf65b37bc9eed50e9edb80a42f257e93edb1f/setup.py#L22-L28

The latest Pandas version is 0.21.0, this makes csvs-to-sqlite unusable with it by default and unusable with any package that requires the latest pandas.

Is there a reason why a specific version is enforced and not "0.20.3 or later"?

opened by mratsim 4
".\" added to table name in SQLITE database file

Hello Simon, thanks a lot for your great tool and sharing.

I am using it to convert a bunch of csv files in a folder. i get a .db file as output; anyhow my table names are marked with a "."chars which do not allow me to perform further sqlite queries...could you please advice me how to avoid this issue? Below my cmd input and output: C:~Documents\Profesional\2018-02-15>csvs-to-sqlite C:~Documents\Profesional\2018-02-15 output.db extract_columns=() Loaded 4 dataframes c:\users\joel0\appdata\local\programs\python\python36-32\lib\site-packages\pandas\core\generic.py:1362: UserWarning: The spaces in these column names will not be changed. In pandas versions < 0.14, spaces were converted to underscores. chunksize=chunksize, dtype=dtype) Created output.db from 4 CSV files

thanks in advance

opened by joel0977 4
--fts list-of-columns option

Creates a FTS5 table populated with the content of the specified columns: https://sqlite.org/fts5.html (falls back on FTS4 if FTS5 is not supported)

Would make it trivial to add fulltext search to data from a CSV file.

opened by simonw 4

Columns are typed "REAL" if they are integers with some NaN/blanks

This is bad. If a column has all integers and some blanks it should result in an INTEGER.

Example: this CSV https://github.com/openelections/openelections-data-ca/blob/master/2016/20161108__ca__general__yolo__precinct.csv produces this SQL table:

CREATE TABLE "2016/20161108__ca__general__yolo__precinct" (
"county" TEXT,
  "precinct" INTEGER,
  "office" INTEGER,
  "district" REAL,
  "party" REAL,
  "candidate" INTEGER,
  "votes" INTEGER
,
FOREIGN KEY (county) REFERENCES county(id),
    FOREIGN KEY (party) REFERENCES party(id),
    FOREIGN KEY (precinct) REFERENCES precinct(id),
    FOREIGN KEY (office) REFERENCES office(id),
    FOREIGN KEY (candidate) REFERENCES candidate(id))

opened by simonw 4

--shape option for specifying the "shape" of the resulting table
This option will allow you to tell the command exactly what columns should be created in the new table and what their types should be.

For example:

csvs-to-sqlite votes.csv votes.db --shape "county:Cty,votes:Vts(REAL)"

This will produce a table with just two columns: Cty and Vts. Those columns will correspond to the county and votes columns in the original CSV.

The Cty column will use the default type detected by pandas - but the Vts column will be forced to be a REAL column instead.
opened by simonw 3

Column `filing_id` not found despite its manifest existence

I am trying to load 18 CSV files from CorpWatch into a SQLite database using csvs-to-sqlite. I will spare you the deprecation warnings for error_bad_lines.

michael$ time /Users/michael/Library/Python/3.8/bin/csvs-to-sqlite ~/Downloads/corpwatch_api_tables_csv ~/Temp/corpwatch.db -s $'\t'

  return pd.read_csv(
/Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/utils.py:38: DtypeWarning: Columns (8,10) have mixed types. Specify dtype option on import or set low_memory=False.

  return pd.read_csv(
/Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/utils.py:38: DtypeWarning: Columns (17) have mixed types. Specify dtype option on import or set low_memory=False.

  return pd.read_csv(
/Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/utils.py:38: DtypeWarning: Columns (5,6,7,8,9) have mixed types. Specify dtype option on import or set low_memory=False.

  return pd.read_csv(
/Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/utils.py:38: DtypeWarning: Columns (3,6,7) have mixed types. Specify dtype option on import or set low_memory=False.
  return pd.read_csv(
Loaded 18 dataframes
Traceback (most recent call last):
  File "/Users/michael/Library/Python/3.8/bin/csvs-to-sqlite", line 8, in <module>
    sys.exit(cli())
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/cli.py", line 251, in cli
    df.to_sql(df.table_name, conn, if_exists="append", index=False)
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/core/generic.py", line 2951, in to_sql
    return sql.to_sql(
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/io/sql.py", line 697, in to_sql
    return pandas_sql.to_sql(
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/io/sql.py", line 2190, in to_sql
    return table.insert(chunksize, method)
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/io/sql.py", line 950, in insert
    num_inserted = exec_insert(conn, keys, chunk_iter)
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/io/sql.py", line 1896, in _execute_insert
    conn.executemany(self.insert_statement(num_rows=1), data_list)
sqlite3.OperationalError: table ./company_filings has no column named filing_id

real	1m30.052s
user	1m12.763s
sys	0m16.640s

So the complaint is about company_filings.csv not having a filing_id. Yet:

michael$ head ~/Downloads/corpwatch_api_tables_csv/company_filings.csv filing_id cik year quarter period_of_report filing_date form_10k_url sec_21_url 47 1000180 2008 1 20071230 2008-02-25 http://www.sec.gov/Archives/edgar/data/1000180/0000950134-08-003259.txt http://www.sec.gov/Archives/edgar/data/1000180/000095013408003259/f38194exv21w1.htm 104 1000209 2008 1 20071231 2008-03-13 http://www.sec.gov/Archives/edgar/data/1000209/0001193125-08-055644.txt http://www.sec.gov/Archives/edgar/data/1000209/000119312508055644/dex211.htm 121 1000228 2008 1 20071229 2008-02-26 http://www.sec.gov/Archives/edgar/data/1000228/0000950123-08-002119.txt http://www.sec.gov/Archives/edgar/data/1000228/000095012308002119/y50229exv21w1.htm 152 1000229 2008 1 0 2008-02-22 http://www.sec.gov/Archives/edgar/data/1000229/0001000229-08-000005.txt NULL 174 1000230 2008 1 0 2008-01-29 http://www.sec.gov/Archives/edgar/data/1000230/0001193125-08-014289.txt NULL 186 1000232 2008 1 20071231 2008-03-31 http://www.sec.gov/Archives/edgar/data/1000232/0001000232-08-000005.txt http://www.sec.gov/Archives/edgar/data/1000232/000100023208000005/exhibit21subsidiaries.txt 213 1000234 2008 1 20070924 2008-03-17 http://www.sec.gov/Archives/edgar/data/1000234/0001000234-08-000003.txt http://www.sec.gov/Archives/edgar/data/1000234/000100023408000003/ex21.htm 214 1000234 2008 1 20071231 2008-03-17 http://www.sec.gov/Archives/edgar/data/1000234/0001000234-08-000004.txt http://www.sec.gov/Archives/edgar/data/1000234/000100023408000004/ex21.htm 462 1000623 2008 1 20071231 2008-03-07 http://www.sec.gov/Archives/edgar/data/1000623/0001047469-08-002365.txt http://www.sec.gov/Archives/edgar/data/1000623/000104746908002365/a2183428zex-21.htm


Why is this column not being found?

opened by MichaelTiemannOSC 1

low_memory=False not a valid option
Hi, csvs-to-sqlite suggests to use low_memory=False, but there is no way to use such option from the command line, and it is not mentioned in the documentation.

/home/user/.local/pipx/venvs/csvs-to-sqlite/lib/python3.8/site-packages/csvs_to_sqlite/utils.py:38: DtypeWarning: Columns (14) have mixed types. Specify dtype option on import or set low_memory=False. return pd.read_csv(

https://github.com/simonw/csvs-to-sqlite/blob/a8a37a016790dc93270c74e32d0a5051bc5a0f4d/csvs_to_sqlite/utils.py#L43
opened by frafra 0

error_bad_lines argument has been deprecated

I get a deprecation warning on the latest stable.

csvs_to_sqlite/utils.py:38: FutureWarning: The error_bad_lines argument has been deprecated and will be removed in a future version. Use on_bad_lines in the future.

opened by frafra 0

option to lower case column names, replace space with underscore, and remove reservered character from column names

it's generally easier to if the column names of a sqlite table are lower case, have no spaces, and have no reserved characters. It would be nice if there was an option in this tool to make transform the columns names thus.

opened by fgregg 0
Optionally maintain checksums of CSV files for faster updates

Wanted to see if there is interest in a patch that helps speed up our workflows significantly, or if there are any further ideas for improving on such a feature. If this is out of scope for this project, I'm happy to continue maintaining my fork of this project.

Use Case

We currently maintain a folder of >200 CSV files with a total of a few hundred megabytes, and have a CI step that builds these CSVs into a sqlite database. These CSV files get updated 2-3 times a day, but only small changes are made to them. Currently, running csvs-to-sqlite with the --replace-tables flag takes roughly 6-7 minutes, which is too long for our use case.

Solution

Add a --update-tables flag that maintains a checksum hash of each CSV file in a table called .csvs-meta (happy to change this or make it configurable), and only reads the csv and loads the dataframe if the checksum has changed.

Forked Version Here

opened by dkaoster 0

Releases(1.3)

1.3(Nov 18, 2021)
New options for populating a fixed column - so every inserted row will have the same column with the same value. Options are --fixed-column column-name string-value, --fixed-column-int column-name integer-value and --fixed-column-float column-name float-value. Thanks, William Rowell. #81

Source code(tar.gz)
Source code(zip)
1.2(Nov 3, 2020)
Upgraded dependencies, including Pandas. #73

Source code(tar.gz)
Source code(zip)
1.1(Aug 9, 2020)
--just-strings feature to disable type detection and import columns as strings by default. Thanks, Dan Nguyen!. #58

Source code(tar.gz)
Source code(zip)
1.0(Aug 3, 2019)

This release drops support for Python 2.x #55
Source code(tar.gz)
Source code(zip)
0.9.2(Jul 3, 2019)

Bumped dependencies and pinned pytest to version 4 (5 is incompatible with Python 2.7).
Source code(tar.gz)
Source code(zip)
0.9.1(Jun 24, 2019)
Fixed bug where -f option used FTS4 even when FTS5 was available (#41)

Source code(tar.gz)
Source code(zip)
0.9(Jan 17, 2019)
Support for loading CSVs directly from URLs, thanks @betatim - #38

New -pk/--primary-key options, closes #22

Create FTS index for extracted column values

Added --no-fulltext-fks option, closes #32

Now using black for code formatting

Bumped versions of dependencies

Source code(tar.gz)
Source code(zip)
0.8(Apr 24, 2018)
-d and -df options for specifying date/datetime columns, closes #33

Maintain lookup tables in SQLite, refs #17

--index option to specify which columns to index, closes #24

Test confirming --shape and --filename-column and -c work together #25

Use usecols when loading CSV if shape specified

--filename-column is now compatible with --shape, closes #10

--no-index-fks option

By default, csvs-to-sqlite creates an index for every foreign key column that is added using the --extract-column option.

For large tables, this can dramatically increase the size of the resulting database file on disk. The new --no-index-fks option allows you to disable this feature to save on file size.

Refs #24 which will allow you to explicitly list which columns SHOULD have an index created.

Added --filename-column option, refs #10

Fixes for Python 2, refs #25

Implemented new --shape option - refs #25

--table option for specifying table to write to, refs #10

Updated README to cover --skip-errors, refs #20

Add --skip-errors option (#20) [Jani Monoses]

Less verbosity (#19) [Jani Monoses]

Only log extract_columns info when that option is passed.

Add option for field quoting behaviour (#15) [Jani Monoses]

Source code(tar.gz)
Source code(zip)
0.7(Nov 26, 2017)
Add -s option to specify input field separator (#13) [Jani Monoses]

Source code(tar.gz)
Source code(zip)
0.6.1(Nov 25, 2017)
-f and -c now work for single table multiple columns.

Fixes #12

Source code(tar.gz)
Source code(zip)
0.6(Nov 24, 2017)
SQLite full-text search support

Added --fts option for setting up SQLite full-text search.

The --fts option will create a corresponding SQLite FTS virtual table, using the best available version of the FTS module.

https://sqlite.org/fts5.html https://www.sqlite.org/fts3.html

Usage:

csvs-to-sqlite my-csv.csv output.db -f column1 -f column2

Example generated with this option: https://sf-trees-search.now.sh/

Example search: https://sf-trees-search.now.sh/sf-trees-search-a899b92?sql=select+*+from+Street_Tree_List+where+rowid+in+%28select+rowid+from+Street_Tree_List_fts+where+Street_Tree_List_fts+match+%27grove+london+dpw%27%29%0D%0A

Will be used in https://github.com/simonw/datasette/issues/131

--fts and --extract-column now cooperate.

If you extract a column and then specify that same column in the --fts list, csvs-to-sqlite now uses the original value of that column in the index.

Example using CSV from https://data.sfgov.org/City-Infrastructure/Street-Tree-List/tkzw-k3nq

csvs-to-sqlite Street_Tree_List.csv trees-fts.db \ -c qLegalStatus -c qSpecies -c qSiteInfo \ -c PlantType -c qCaretaker -c qCareAssistant \ -f qLegalStatus -f qSpecies -f qAddress \ -f qSiteInfo -f PlantType -f qCaretaker \ -f qCareAssistant -f PermitNotes

Closes #9

Handle column names with spaces in them.

Added csvs-to-sqlite --version option.

Using http://click.pocoo.org/5/api/#click.version_option

Source code(tar.gz)
Source code(zip)
0.5(Nov 19, 2017)

Now handles columns with integers and nulls in correctly

Pandas does a good job of figuring out which SQLite column types should be used for a DataFrame - with one exception: due to a limitation of NumPy it treats columns containing a mixture of integers and NaN (blank values) as being of type float64, which means they end up as REAL columns in SQLite.

http://pandas.pydata.org/pandas-docs/stable/gotchas.html#support-for-integer-na

To fix this, we now check to see if a float64 column actually consists solely of NaN and integer-valued floats (checked using v.is_integer() in Python). If that is the case, we over-ride the column type to be INTEGER instead.

See #5 - also a8ab524 and 0997b7b
Source code(tar.gz)
Source code(zip)

0.3(Nov 17, 2017)

Mechanism for converting columns into separate tables

Let's say you have a CSV file that looks like this:

county,precinct,office,district,party,candidate,votes
Clark,1,President,,REP,John R. Kasich,5
Clark,2,President,,REP,John R. Kasich,0
Clark,3,President,,REP,John R. Kasich,7

(Real example from https://github.com/openelections/openelections-data-sd/blob/ master/2016/20160607__sd__primary__clark__precinct.csv )

You can now convert selected columns into separate lookup tables using the new --extract-column option (shortname: -c) - for example:

csvs-to-sqlite openelections-data-*/*.csv \
    -c county:County:name \
    -c precinct:Precinct:name \
    -c office -c district -c party -c candidate \
    openelections.db

The format is as follows:

column_name:optional_table_name:optional_table_value_column_name

If you just specify the column name e.g. -c office, the following table will be created:

CREATE TABLE "party" (
    "id" INTEGER PRIMARY KEY,
    "value" TEXT
);

If you specify all three options, e.g. -c precinct:Precinct:name the table will look like this:

CREATE TABLE "Precinct" (
    "id" INTEGER PRIMARY KEY,
    "name" TEXT
);

The original tables will be created like this:

CREATE TABLE "ca__primary__san_francisco__precinct" (
    "county" INTEGER,
    "precinct" INTEGER,
    "office" INTEGER,
    "district" INTEGER,
    "party" INTEGER,
    "candidate" INTEGER,
    "votes" INTEGER,
    FOREIGN KEY (county) REFERENCES County(id),
    FOREIGN KEY (party) REFERENCES party(id),
    FOREIGN KEY (precinct) REFERENCES Precinct(id),
    FOREIGN KEY (office) REFERENCES office(id),
    FOREIGN KEY (candidate) REFERENCES candidate(id)
);

They will be populated with IDs that reference the new derived tables.

Closes #2

Source code(tar.gz)
Source code(zip)