Convert CSV files into a SQLite database

Overview

csvs-to-sqlite

PyPI Changelog Tests License

Convert CSV files into a SQLite database. Browse and publish that SQLite database with Datasette.

Basic usage:

csvs-to-sqlite myfile.csv mydatabase.db

This will create a new SQLite database called mydatabase.db containing a single table, myfile, containing the CSV content.

You can provide multiple CSV files:

csvs-to-sqlite one.csv two.csv bundle.db

The bundle.db database will contain two tables, one and two.

This means you can use wildcards:

csvs-to-sqlite ~/Downloads/*.csv my-downloads.db

If you pass a path to one or more directories, the script will recursively search those directories for CSV files and create tables for each one.

csvs-to-sqlite ~/path/to/directory all-my-csvs.db

Handling TSV (tab-separated values)

You can use the -s option to specify a different delimiter. If you want to use a tab character you'll need to apply shell escaping like so:

csvs-to-sqlite my-file.tsv my-file.db -s $'\t'

Refactoring columns into separate lookup tables

Let's say you have a CSV file that looks like this:

county,precinct,office,district,party,candidate,votes
Clark,1,President,,REP,John R. Kasich,5
Clark,2,President,,REP,John R. Kasich,0
Clark,3,President,,REP,John R. Kasich,7

(Real example taken from the Open Elections project)

You can now convert selected columns into separate lookup tables using the new --extract-column option (shortname: -c) - for example:

csvs-to-sqlite openelections-data-*/*.csv \
    -c county:County:name \
    -c precinct:Precinct:name \
    -c office -c district -c party -c candidate \
    openelections.db

The format is as follows:

column_name:optional_table_name:optional_table_value_column_name

If you just specify the column name e.g. -c office, the following table will be created:

CREATE TABLE "office" (
    "id" INTEGER PRIMARY KEY,
    "value" TEXT
);

If you specify all three options, e.g. -c precinct:Precinct:name the table will look like this:

CREATE TABLE "Precinct" (
    "id" INTEGER PRIMARY KEY,
    "name" TEXT
);

The original tables will be created like this:

CREATE TABLE "ca__primary__san_francisco__precinct" (
    "county" INTEGER,
    "precinct" INTEGER,
    "office" INTEGER,
    "district" INTEGER,
    "party" INTEGER,
    "candidate" INTEGER,
    "votes" INTEGER,
    FOREIGN KEY (county) REFERENCES County(id),
    FOREIGN KEY (party) REFERENCES party(id),
    FOREIGN KEY (precinct) REFERENCES Precinct(id),
    FOREIGN KEY (office) REFERENCES office(id),
    FOREIGN KEY (candidate) REFERENCES candidate(id)
);

They will be populated with IDs that reference the new derived tables.

Installation

$ pip install csvs-to-sqlite

csvs-to-sqlite now requires Python 3. If you are running Python 2 you can install the last version to support Python 2:

$ pip install csvs-to-sqlite==0.9.2

csvs-to-sqlite --help

Usage: csvs-to-sqlite [OPTIONS] PATHS... DBNAME

  PATHS: paths to individual .csv files or to directories containing .csvs

  DBNAME: name of the SQLite database file to create

Options:
  -s, --separator TEXT            Field separator in input .csv
  -q, --quoting INTEGER           Control field quoting behavior per csv.QUOTE_*
                                  constants. Use one of QUOTE_MINIMAL (0),
                                  QUOTE_ALL (1), QUOTE_NONNUMERIC (2) or
                                  QUOTE_NONE (3).

  --skip-errors                   Skip lines with too many fields instead of
                                  stopping the import

  --replace-tables                Replace tables if they already exist
  -t, --table TEXT                Table to use (instead of using CSV filename)
  -c, --extract-column TEXT       One or more columns to 'extract' into a
                                  separate lookup table. If you pass a simple
                                  column name that column will be replaced with
                                  integer foreign key references to a new table
                                  of that name. You can customize the name of
                                  the table like so:     state:States:state_name
                                  
                                  This will pull unique values from the 'state'
                                  column and use them to populate a new 'States'
                                  table, with an id column primary key and a
                                  state_name column containing the strings from
                                  the original column.

  -d, --date TEXT                 One or more columns to parse into ISO
                                  formatted dates

  -dt, --datetime TEXT            One or more columns to parse into ISO
                                  formatted datetimes

  -df, --datetime-format TEXT     One or more custom date format strings to try
                                  when parsing dates/datetimes

  -pk, --primary-key TEXT         One or more columns to use as the primary key
  -f, --fts TEXT                  One or more columns to use to populate a full-
                                  text index

  -i, --index TEXT                Add index on this column (or a compound index
                                  with -i col1,col2)

  --shape TEXT                    Custom shape for the DB table - format is
                                  csvcol:dbcol(TYPE),...

  --filename-column TEXT          Add a column with this name and populate with
                                  CSV file name

  --fixed-column 
   
    ...   Populate column with a fixed string
  --fixed-column-int 
    
     ...
                                  Populate column with a fixed integer
  --fixed-column-float 
     
      ...
                                  Populate column with a fixed float
  --no-index-fks                  Skip adding index to foreign key columns
                                  created using --extract-column (default is to
                                  add them)

  --no-fulltext-fks               Skip adding full-text index on values
                                  extracted using --extract-column (default is
                                  to add them)

  --just-strings                  Import all columns as text strings by default
                                  (and, if specified, still obey --shape,
                                  --date/datetime, and --datetime-format)

  --version                       Show the version and exit.
  --help                          Show this message and exit.


     
    
   
Comments
  • No create index option - please create new release

    No create index option - please create new release

    Hi Simon,

    Love this wonderful tool! Thanks a million for using it and prompting sqlite!

    The README shows an option for creating an index: https://github.com/simonw/csvs-to-sqlite/blob/master/README.md

    -i, --index TEXT Add index on this column (or a compound index with -i col1,col2)

    I don't seem to have that version in my csvs-to-sqlite:

    % csvs-to-sqlite --help
    Usage: csvs-to-sqlite [OPTIONS] PATHS... DBNAME
    
      PATHS: paths to individual .csv files or to directories containing .csvs
    
      DBNAME: name of the SQLite database file to create
    
    Options:
      -s, --separator TEXT       Field separator in input .csv
      --replace-tables           Replace tables if they already exist
      -c, --extract-column TEXT  One or more columns to 'extract' into a separate
                                 lookup table. If you pass a simple column name
                                 that column will be replaced with integer foreign
                                 key references to a new table of that name. You
                                 can customize the name of the table like so:
    
                                     --extract-column state:States:state_name
    
                                 This will pull unique values from the 'state'
                                 column and use them to populate a new 'States'
                                 table, with an id column primary key and a
                                 state_name column containing the strings from the
                                 original column.
      -f, --fts TEXT             One or more columns to use to populate a full-
                                 text index
      --version                  Show the version and exit.
      --help                     Show this message and exit.
    
    % csvs-to-sqlite --version
    csvs-to-sqlite, version 0.7
    

    Is this an option in an unreleased build of csvs-to-sqlite? If so, when do you think it would be ready?

    Thanks!

    opened by jungle-boogie 7
  • Add support for loading CSVs directly from URLs

    Add support for loading CSVs directly from URLs

    Closes #4

    This adds a new parameter type to support paths or URLs on the CLI.

    Something like csvs-to-sqlite https://data.stadt-zuerich.ch/dataset/wirtschaft_preise_ziw_basis2010/resource/54c7ae15-f673-41af-a7cd-b861b8c6744e/download/ziwbasis2010apr2010apr2016.csv blah.db now works.

    opened by betatim 6
  • Ability to merge multiple CSV files into a single table (and store CSV filename in a column)

    Ability to merge multiple CSV files into a single table (and store CSV filename in a column)

    If you feed the tool a number of CSVs there should be an option to have them all go into the same database table, optionally with a column that gets populated with the name of the CSV file:

    csvs-to-sqlite ~/openelections-data/ --table=combined_table --filename-column=source
    
    opened by simonw 5
  • Allow appending files to an existing SQLite database

    Allow appending files to an existing SQLite database

    Tool currently quits with an error if you try this.

    Should it be possible to update existing tables with new data? Not sure about this. We certainly can’t handle schema changes. We would need to be told a column to treat as a “primary key”.

    Given how cheap it is to recreate the database from scratch I’m inclined to say it’s not worth bothering with table updates.

    opened by simonw 5
  • add

    add "fixed-column" option

    For colname:value, add a column colname and populate with value.

    The use case here is that I have a snakemake workflow that produces a lot of csv files with a structure like: samples/<sample_id>/<condition_name>/this.csv I want to import them all into the same table, where sample_id and condition_name are columns. I tried to match how --filename_column works, but allow for multiple new columns to be added.

    I haven't yet added any tests, but if you think this option would be a useful addition, I will add some tests to cover the basic use case and the combination with shape.

    opened by williamrowell 4
  • Interpret all columns as TEXT data type

    Interpret all columns as TEXT data type

    Is it possible to interpret all columns as TEXT datatype (through a flag maybe?) I think the columns values are sampled and then column datatype is guessed. If there is an incompatible value in some row, then that row seems to be skipped. So instead, is it possible I load everything as TEXT data type? I just need the data in some format(for data comparison purposes), But all csv data must go into the table without skipping rows.

    Thanks Thyag

    opened by tsundara 4
  • csvs-to-sqlite has a hard requirement on package versions.

    csvs-to-sqlite has a hard requirement on package versions.

    Currently version required is hardcoded:

    https://github.com/simonw/csvs-to-sqlite/blob/dccbf65b37bc9eed50e9edb80a42f257e93edb1f/setup.py#L22-L28

    The latest Pandas version is 0.21.0, this makes csvs-to-sqlite unusable with it by default and unusable with any package that requires the latest pandas.

    Is there a reason why a specific version is enforced and not "0.20.3 or later"?

    opened by mratsim 4
  • ".\" added to table name in SQLITE database file

    Hello Simon, thanks a lot for your great tool and sharing.

    I am using it to convert a bunch of csv files in a folder. i get a .db file as output; anyhow my table names are marked with a "."chars which do not allow me to perform further sqlite queries...could you please advice me how to avoid this issue? Below my cmd input and output: C:~Documents\Profesional\2018-02-15>csvs-to-sqlite C:~Documents\Profesional\2018-02-15 output.db extract_columns=() Loaded 4 dataframes c:\users\joel0\appdata\local\programs\python\python36-32\lib\site-packages\pandas\core\generic.py:1362: UserWarning: The spaces in these column names will not be changed. In pandas versions < 0.14, spaces were converted to underscores. chunksize=chunksize, dtype=dtype) Created output.db from 4 CSV files

    thanks in advance

    opened by joel0977 4
  • --fts list-of-columns option

    --fts list-of-columns option

    Creates a FTS5 table populated with the content of the specified columns: https://sqlite.org/fts5.html (falls back on FTS4 if FTS5 is not supported)

    Would make it trivial to add fulltext search to data from a CSV file.

    opened by simonw 4
  • Columns are typed

    Columns are typed "REAL" if they are integers with some NaN/blanks

    This is bad. If a column has all integers and some blanks it should result in an INTEGER.

    Example: this CSV https://github.com/openelections/openelections-data-ca/blob/master/2016/20161108__ca__general__yolo__precinct.csv produces this SQL table:

    CREATE TABLE "2016/20161108__ca__general__yolo__precinct" (
    "county" TEXT,
      "precinct" INTEGER,
      "office" INTEGER,
      "district" REAL,
      "party" REAL,
      "candidate" INTEGER,
      "votes" INTEGER
    ,
    FOREIGN KEY (county) REFERENCES county(id),
        FOREIGN KEY (party) REFERENCES party(id),
        FOREIGN KEY (precinct) REFERENCES precinct(id),
        FOREIGN KEY (office) REFERENCES office(id),
        FOREIGN KEY (candidate) REFERENCES candidate(id))
    
    opened by simonw 4
  • --shape option for specifying the

    --shape option for specifying the "shape" of the resulting table

    This option will allow you to tell the command exactly what columns should be created in the new table and what their types should be.

    For example:

    csvs-to-sqlite votes.csv votes.db --shape "county:Cty,votes:Vts(REAL)"
    

    This will produce a table with just two columns: Cty and Vts. Those columns will correspond to the county and votes columns in the original CSV.

    The Cty column will use the default type detected by pandas - but the Vts column will be forced to be a REAL column instead.

    opened by simonw 3
  • Column `filing_id` not found despite its manifest existence

    Column `filing_id` not found despite its manifest existence

    I am trying to load 18 CSV files from CorpWatch into a SQLite database using csvs-to-sqlite. I will spare you the deprecation warnings for error_bad_lines.

    michael$ time /Users/michael/Library/Python/3.8/bin/csvs-to-sqlite ~/Downloads/corpwatch_api_tables_csv ~/Temp/corpwatch.db -s $'\t'
    
      return pd.read_csv(
    /Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/utils.py:38: DtypeWarning: Columns (8,10) have mixed types. Specify dtype option on import or set low_memory=False.
    
      return pd.read_csv(
    /Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/utils.py:38: DtypeWarning: Columns (17) have mixed types. Specify dtype option on import or set low_memory=False.
    
      return pd.read_csv(
    /Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/utils.py:38: DtypeWarning: Columns (5,6,7,8,9) have mixed types. Specify dtype option on import or set low_memory=False.
    
      return pd.read_csv(
    /Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/utils.py:38: DtypeWarning: Columns (3,6,7) have mixed types. Specify dtype option on import or set low_memory=False.
      return pd.read_csv(
    Loaded 18 dataframes
    Traceback (most recent call last):
      File "/Users/michael/Library/Python/3.8/bin/csvs-to-sqlite", line 8, in <module>
        sys.exit(cli())
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/click/core.py", line 829, in __call__
        return self.main(*args, **kwargs)
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/click/core.py", line 782, in main
        rv = self.invoke(ctx)
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/click/core.py", line 1066, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/click/core.py", line 610, in invoke
        return callback(*args, **kwargs)
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/cli.py", line 251, in cli
        df.to_sql(df.table_name, conn, if_exists="append", index=False)
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/core/generic.py", line 2951, in to_sql
        return sql.to_sql(
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/io/sql.py", line 697, in to_sql
        return pandas_sql.to_sql(
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/io/sql.py", line 2190, in to_sql
        return table.insert(chunksize, method)
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/io/sql.py", line 950, in insert
        num_inserted = exec_insert(conn, keys, chunk_iter)
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/io/sql.py", line 1896, in _execute_insert
        conn.executemany(self.insert_statement(num_rows=1), data_list)
    sqlite3.OperationalError: table ./company_filings has no column named filing_id
    
    real	1m30.052s
    user	1m12.763s
    sys	0m16.640s
    

    So the complaint is about company_filings.csv not having a filing_id. Yet:

    michael$ head ~/Downloads/corpwatch_api_tables_csv/company_filings.csv filing_id cik year quarter period_of_report filing_date form_10k_url sec_21_url 47 1000180 2008 1 20071230 2008-02-25 http://www.sec.gov/Archives/edgar/data/1000180/0000950134-08-003259.txt http://www.sec.gov/Archives/edgar/data/1000180/000095013408003259/f38194exv21w1.htm 104 1000209 2008 1 20071231 2008-03-13 http://www.sec.gov/Archives/edgar/data/1000209/0001193125-08-055644.txt http://www.sec.gov/Archives/edgar/data/1000209/000119312508055644/dex211.htm 121 1000228 2008 1 20071229 2008-02-26 http://www.sec.gov/Archives/edgar/data/1000228/0000950123-08-002119.txt http://www.sec.gov/Archives/edgar/data/1000228/000095012308002119/y50229exv21w1.htm 152 1000229 2008 1 0 2008-02-22 http://www.sec.gov/Archives/edgar/data/1000229/0001000229-08-000005.txt NULL 174 1000230 2008 1 0 2008-01-29 http://www.sec.gov/Archives/edgar/data/1000230/0001193125-08-014289.txt NULL 186 1000232 2008 1 20071231 2008-03-31 http://www.sec.gov/Archives/edgar/data/1000232/0001000232-08-000005.txt http://www.sec.gov/Archives/edgar/data/1000232/000100023208000005/exhibit21subsidiaries.txt 213 1000234 2008 1 20070924 2008-03-17 http://www.sec.gov/Archives/edgar/data/1000234/0001000234-08-000003.txt http://www.sec.gov/Archives/edgar/data/1000234/000100023408000003/ex21.htm 214 1000234 2008 1 20071231 2008-03-17 http://www.sec.gov/Archives/edgar/data/1000234/0001000234-08-000004.txt http://www.sec.gov/Archives/edgar/data/1000234/000100023408000004/ex21.htm 462 1000623 2008 1 20071231 2008-03-07 http://www.sec.gov/Archives/edgar/data/1000623/0001047469-08-002365.txt http://www.sec.gov/Archives/edgar/data/1000623/000104746908002365/a2183428zex-21.htm

    
    Why is this column not being found?
    opened by MichaelTiemannOSC 1
  • low_memory=False not a valid option

    low_memory=False not a valid option

    Hi, csvs-to-sqlite suggests to use low_memory=False, but there is no way to use such option from the command line, and it is not mentioned in the documentation.

    /home/user/.local/pipx/venvs/csvs-to-sqlite/lib/python3.8/site-packages/csvs_to_sqlite/utils.py:38: DtypeWarning: Columns (14) have mixed types. Specify dtype option on import or set low_memory=False.
      return pd.read_csv(
    

    https://github.com/simonw/csvs-to-sqlite/blob/a8a37a016790dc93270c74e32d0a5051bc5a0f4d/csvs_to_sqlite/utils.py#L43

    opened by frafra 0
  • error_bad_lines argument has been deprecated

    error_bad_lines argument has been deprecated

    I get a deprecation warning on the latest stable.

    csvs_to_sqlite/utils.py:38: FutureWarning: The error_bad_lines argument has been deprecated and will be removed in a future version. Use on_bad_lines in the future.
    
    opened by frafra 0
  • option to lower case column names, replace space with underscore, and remove reservered character from column names

    option to lower case column names, replace space with underscore, and remove reservered character from column names

    it's generally easier to if the column names of a sqlite table are lower case, have no spaces, and have no reserved characters. It would be nice if there was an option in this tool to make transform the columns names thus.

    opened by fgregg 0
  • Optionally maintain checksums of CSV files for faster updates

    Optionally maintain checksums of CSV files for faster updates

    Wanted to see if there is interest in a patch that helps speed up our workflows significantly, or if there are any further ideas for improving on such a feature. If this is out of scope for this project, I'm happy to continue maintaining my fork of this project.

    Use Case

    We currently maintain a folder of >200 CSV files with a total of a few hundred megabytes, and have a CI step that builds these CSVs into a sqlite database. These CSV files get updated 2-3 times a day, but only small changes are made to them. Currently, running csvs-to-sqlite with the --replace-tables flag takes roughly 6-7 minutes, which is too long for our use case.

    Solution

    Add a --update-tables flag that maintains a checksum hash of each CSV file in a table called .csvs-meta (happy to change this or make it configurable), and only reads the csv and loads the dataframe if the checksum has changed.

    Forked Version Here

    opened by dkaoster 0
Releases(1.3)
  • 1.3(Nov 18, 2021)

    • New options for populating a fixed column - so every inserted row will have the same column with the same value. Options are --fixed-column column-name string-value, --fixed-column-int column-name integer-value and --fixed-column-float column-name float-value. Thanks, William Rowell. #81
    Source code(tar.gz)
    Source code(zip)
  • 1.2(Nov 3, 2020)

  • 1.1(Aug 9, 2020)

  • 1.0(Aug 3, 2019)

  • 0.9.2(Jul 3, 2019)

  • 0.9.1(Jun 24, 2019)

  • 0.9(Jan 17, 2019)

    • Support for loading CSVs directly from URLs, thanks @betatim - #38
    • New -pk/--primary-key options, closes #22
    • Create FTS index for extracted column values
    • Added --no-fulltext-fks option, closes #32
    • Now using black for code formatting
    • Bumped versions of dependencies
    Source code(tar.gz)
    Source code(zip)
  • 0.8(Apr 24, 2018)

    • -d and -df options for specifying date/datetime columns, closes #33

    • Maintain lookup tables in SQLite, refs #17

    • --index option to specify which columns to index, closes #24

    • Test confirming --shape and --filename-column and -c work together #25

    • Use usecols when loading CSV if shape specified

    • --filename-column is now compatible with --shape, closes #10

    • --no-index-fks option

      By default, csvs-to-sqlite creates an index for every foreign key column that is added using the --extract-column option.

      For large tables, this can dramatically increase the size of the resulting database file on disk. The new --no-index-fks option allows you to disable this feature to save on file size.

      Refs #24 which will allow you to explicitly list which columns SHOULD have an index created.

    • Added --filename-column option, refs #10

    • Fixes for Python 2, refs #25

    • Implemented new --shape option - refs #25

    • --table option for specifying table to write to, refs #10

    • Updated README to cover --skip-errors, refs #20

    • Add --skip-errors option (#20) [Jani Monoses]

    • Less verbosity (#19) [Jani Monoses]

      Only log extract_columns info when that option is passed.

    • Add option for field quoting behaviour (#15) [Jani Monoses]

    Source code(tar.gz)
    Source code(zip)
  • 0.7(Nov 26, 2017)

  • 0.6.1(Nov 25, 2017)

  • 0.6(Nov 24, 2017)

    SQLite full-text search support

    • Added --fts option for setting up SQLite full-text search.

      The --fts option will create a corresponding SQLite FTS virtual table, using the best available version of the FTS module.

      https://sqlite.org/fts5.html https://www.sqlite.org/fts3.html

      Usage:

      csvs-to-sqlite my-csv.csv output.db -f column1 -f column2
      

      Example generated with this option: https://sf-trees-search.now.sh/

      Example search: https://sf-trees-search.now.sh/sf-trees-search-a899b92?sql=select+*+from+Street_Tree_List+where+rowid+in+%28select+rowid+from+Street_Tree_List_fts+where+Street_Tree_List_fts+match+%27grove+london+dpw%27%29%0D%0A

      Will be used in https://github.com/simonw/datasette/issues/131

    • --fts and --extract-column now cooperate.

      If you extract a column and then specify that same column in the --fts list, csvs-to-sqlite now uses the original value of that column in the index.

      Example using CSV from https://data.sfgov.org/City-Infrastructure/Street-Tree-List/tkzw-k3nq

      csvs-to-sqlite Street_Tree_List.csv trees-fts.db \
          -c qLegalStatus -c qSpecies -c qSiteInfo \
          -c PlantType -c qCaretaker -c qCareAssistant \
          -f qLegalStatus -f qSpecies -f qAddress \
          -f qSiteInfo -f PlantType -f qCaretaker \
          -f qCareAssistant -f PermitNotes
      

      Closes #9

    • Handle column names with spaces in them.

    • Added csvs-to-sqlite --version option.

      Using http://click.pocoo.org/5/api/#click.version_option

    Source code(tar.gz)
    Source code(zip)
  • 0.5(Nov 19, 2017)

    Now handles columns with integers and nulls in correctly

    Pandas does a good job of figuring out which SQLite column types should be used for a DataFrame - with one exception: due to a limitation of NumPy it treats columns containing a mixture of integers and NaN (blank values) as being of type float64, which means they end up as REAL columns in SQLite.

    http://pandas.pydata.org/pandas-docs/stable/gotchas.html#support-for-integer-na

    To fix this, we now check to see if a float64 column actually consists solely of NaN and integer-valued floats (checked using v.is_integer() in Python). If that is the case, we over-ride the column type to be INTEGER instead.

    See #5 - also a8ab524 and 0997b7b

    Source code(tar.gz)
    Source code(zip)
  • 0.3(Nov 17, 2017)

    • Mechanism for converting columns into separate tables

      Let's say you have a CSV file that looks like this:

      county,precinct,office,district,party,candidate,votes
      Clark,1,President,,REP,John R. Kasich,5
      Clark,2,President,,REP,John R. Kasich,0
      Clark,3,President,,REP,John R. Kasich,7
      

      (Real example from https://github.com/openelections/openelections-data-sd/blob/ master/2016/20160607__sd__primary__clark__precinct.csv )

      You can now convert selected columns into separate lookup tables using the new --extract-column option (shortname: -c) - for example:

      csvs-to-sqlite openelections-data-*/*.csv \
          -c county:County:name \
          -c precinct:Precinct:name \
          -c office -c district -c party -c candidate \
          openelections.db
      

      The format is as follows:

      column_name:optional_table_name:optional_table_value_column_name
      

      If you just specify the column name e.g. -c office, the following table will be created:

      CREATE TABLE "party" (
          "id" INTEGER PRIMARY KEY,
          "value" TEXT
      );
      

      If you specify all three options, e.g. -c precinct:Precinct:name the table will look like this:

      CREATE TABLE "Precinct" (
          "id" INTEGER PRIMARY KEY,
          "name" TEXT
      );
      

      The original tables will be created like this:

      CREATE TABLE "ca__primary__san_francisco__precinct" (
          "county" INTEGER,
          "precinct" INTEGER,
          "office" INTEGER,
          "district" INTEGER,
          "party" INTEGER,
          "candidate" INTEGER,
          "votes" INTEGER,
          FOREIGN KEY (county) REFERENCES County(id),
          FOREIGN KEY (party) REFERENCES party(id),
          FOREIGN KEY (precinct) REFERENCES Precinct(id),
          FOREIGN KEY (office) REFERENCES office(id),
          FOREIGN KEY (candidate) REFERENCES candidate(id)
      );
      

      They will be populated with IDs that reference the new derived tables.

      Closes #2

    Source code(tar.gz)
    Source code(zip)
Owner
Simon Willison
Simon Willison
Object-oriented file system path manipulation

path (aka path pie, formerly path.py) implements path objects as first-class entities, allowing common operations on files to be invoked on those path

Jason R. Coombs 1k Dec 28, 2022
ZipFly is a zip archive generator based on zipfile.py

ZipFly is a zip archive generator based on zipfile.py. It was created by Buzon.io to generate very large ZIP archives for immediate sending out to clients, or for writing large ZIP archives without m

Buzon 506 Jan 04, 2023
A python module to parse text files with contains secret variables.

A python module to parse text files with contains secret variables.

0 Dec 05, 2022
ValveVMF - A python library to parse Valve's VMF files

ValveVMF ValveVMF is a Python library for parsing .vmf files for the Source Engi

pySourceSDK 2 Jan 02, 2022
A JupyterLab extension that allows opening files and directories with external desktop applications.

A JupyterLab extension that allows opening files and directories with external desktop applications.

martinRenou 0 Oct 14, 2021
Various converters to convert value sets from CSV to JSON, etc.

ValueSet Converters Tools for converting value sets in different formats. Such as converting extensional value sets in CSV format to JSON format able

Health Open Terminology Ecosystem 4 Sep 08, 2022
Organizer is a python program that organizes your downloads folder

Organizer Organizer is a python program that organizes your downloads folder, it can run as a service and so will start along with the system, and the

Gustavo 2 Oct 18, 2021
A Python script to organize your files in a given directory.

File-Organizer A Python script to organize your files in a given directory. It organizes your files based on the file extension and moves them into sp

Imira Randeniya 1 Sep 11, 2022
Find potentially sensitive files

find_files Find potentially sensitive files This script searchs for potentially sensitive files based off of file name or string contained in the file

4 Aug 20, 2022
CleverCSV is a Python package for handling messy CSV files.

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line applicati

The Alan Turing Institute 1k Dec 19, 2022
FileGenerator - File Generator for sites that accepts documents

File Generator for sites that accepts documents This code generates files as per

Shaunak 2 Mar 19, 2022
A small Python module for determining appropriate platform-specific dirs, e.g. a "user data dir".

the problem What directory should your app use for storing user data? If running on macOS, you should use: ~/Library/Application Support/AppName If

ActiveState Software 948 Dec 31, 2022
Listreqs is a simple requirements.txt generator. It's an alternative to pipreqs

⚡ Listreqs Listreqs is a simple requirements.txt generator. It's an alternative to pipreqs. Where in Pipreqs, it helps you to Generate requirements.tx

Soumyadip Sarkar 4 Oct 15, 2021
Simple Python File Manager

This script lets you automatically relocate files based on their extensions. Very useful from the downloads folder !

Aimé Risson 22 Dec 27, 2022
A simple tool to find and replace all the matches of a regular expression in file(s).

FindREp A simple tool to find and replace all the matches of a regular expression in file(s). You can either select the file(s) directly or select a f

Biraj 5 Oct 18, 2022
Measure file similarity in a many-to-many fashion

Mesi Mesi is a tool to measure the similarity in a many-to-many fashion of long-form documents like Python source code or technical writing. The outpu

GatorEducator 3 Feb 02, 2022
PyDeleter - delete a specifically formatted file in a directory or delete all other files

PyDeleter If you want to delete a specifically formatted file in a directory or delete all other files, PyDeleter does it for you. How to use? 1- Down

Amirabbas Motamedi 1 Jan 30, 2022
Simple, convenient and cross-platform file date changing library. 📝📅

Simple, convenient and cross-platform file date changing library.

kubinka0505 15 Dec 18, 2022
A simple bulk file renamer, written in python.

Python File Editor A simple bulk file renamer, written in python. There are two functions, the bulk rename and the bulk file extention change. Bulk Fi

Sam Bloomfield 2 Dec 22, 2021
Get Your TXT File Length !.

TXTLen Get Your TXT File Length !. Hi 👋 , I'm Alireza A Python Developer Boy 🔭 I’m currently working on my C# projects 🌱 I’m currently Learning CSh

Alireza Hasanzadeh 1 Jan 06, 2022