tiflolinux.org - GNU Social
  • Login

Bienvenido

  • Public

    • Public
    • Network
    • Groups
    • Popular
    • People

Conversation

Notices

  1. Jorge Sanz (xurxosanz@mastodon.social)'s status on Tuesday, 21-Jun-2022 16:15:32 CEST Jorge Sanz Jorge Sanz

    Tested locally with a 1M records file, and this is a super neat and approachable way to quickly inspect CSV files

    RT @simonw@twitter.com

    TIL you can run SQL queries directly against CSV files as a one-liner using the default sqlite3 command line utility
    https://til.simonwillison.net/sqlite/one-line-csv-operations

    🐦🔗: https://twitter.com/simonw/status/1539038172599836672

    In conversation Tuesday, 21-Jun-2022 16:15:32 CEST from mastodon.social permalink

    Attachments


    1. https://files.mastodon.social/media_attachments/files/108/514/781/758/489/312/original/99ceb63d059bebf5.jpg

    2. https://files.mastodon.social/media_attachments/files/108/514/781/838/936/250/original/8ffdb7814f8d599c.png
    3. One-liner for running queries against CSV files with SQLite
      from @simonw
      I figured out how to run a SQL query directly against a CSV file using the `sqlite3` command-line utility: sqlite3 :memory: -cmd '.mode csv' -cmd '.import taxi.csv taxi' \ 'SELECT passenger_count, COUNT(*), AVG(total_amount) FROM taxi GROUP BY passenger_count' This uses the special `:memory:` filename to open an in-memory database. Then it uses two `-cmd` options to turn on CSV mode and import the `taxi.csv` file into a table called `taxi`. Then it runs the SQL query. You can get `taxi.csv` by downloading the compressed file from [here](https://github.com/multiprocessio/dsq/blob/43e72ff1d2c871082fed0ae401dd59e2ff9f6cfe/testdata/taxi.csv.7z) and running: 7z e -aos taxi.csv.7z I figured this out while commenting on [this issue](https://github.com/multiprocessio/dsq/issues/70). The output looks like this: ``` "",128020,32.2371511482553 0,42228,17.0214016766151 1,1533197,17.6418833067999 2,286461,18.0975870711456 3,72852,17.9153958710923 4,25510,18.452774990196 5,50291,17.2709248175672 6,32623,17.6002964166367 7,2,87.17 8,2,95.705 9,1,113.6 ``` Add `-cmd '.mode column'` to output in columns instead: ``` $ sqlite3 :memory: -cmd '.mode csv' -cmd '.import taxi.csv taxi' -cmd '.mode column' \ 'SELECT passenger_count, COUNT(*), AVG(total_amount) FROM taxi GROUP BY passenger_count' passenger_count COUNT(*) AVG(total_amount) --------------- -------- ----------------- 128020 32.2371511482553 0 42228 17.0214016766151 1 1533197 17.6418833067999 2 286461 18.0975870711456 3 72852 17.9153958710923 4 25510 18.452774990196 5 50291 17.2709248175672 6 32623 17.6002964166367 7 2 87.17 8 2 95.705 9 1 113.6 ``` Or use `-cmd '.mode markdown'` to get a Markdown table: | passenger_count | COUNT(*) | AVG(total_amount) | |-----------------|----------|-------------------| | | 128020 | 32.2371511482553 | | 0 | 42228 | 17.0214016766151 | | 1 | 1533197 | 17.6418833067999 | | 2 | 286461 | 18.0975870711456 | | 3 | 72852 | 17.9153958710923 | | 4 | 25510 | 18.452774990196 | | 5 | 50291 | 17.2709248175672 | | 6 | 32623 | 17.6002964166367 | | 7 | 2 | 87.17 | | 8 | 2 | 95.705 | | 9 | 1 | 113.6 | A full list of output modes can be seen like this: ``` % sqlite3 -cmd '.help mode' .mode MODE ?TABLE? Set output mode MODE is one of: ascii Columns/rows delimited by 0x1F and 0x1E box Tables using unicode box-drawing characters csv Comma-separated values column Output in columns. (See .width) html HTML code insert SQL insert statements for TABLE json Results in a JSON array line One value per line list Values delimited by "|" markdown Markdown table format quote Escape answers as for SQL table ASCII-art table tabs Tab-separated values tcl TCL list elements ``` ## Other options There are a whole bunch of other tools that can be used for this kind of thing! My own [sqlite-utils memory](https://simonwillison.net/2021/Jun/19/sqlite-utils-memory/) command can load data from JSON, CSV or TSV into an in-memory database and run a query against it. It's a LOT slower than using `sqlite3` directly though. [dsq](https://github.com/multiprocessio/dsq) is a tool that does this kind of thing (and a lot more). Author Phil Eaton compiled [a collection of benchmarks](https://github.com/multiprocessio/dsq#benchmark) of other similar tools, and his [benchmarking script](https://github.com/multiprocessio/dsq/blob/43e72ff1d2c871082fed0ae401dd59e2ff9f6cfe/scripts/benchmark.sh) demonstrates how to use each one of them.
    4. No result found on File_thumbnail lookup.
      https://twitter.com/simonw/status/1539038172599836672
      from Simon Willison

    Feeds

    • Activity Streams
    • RSS 2.0
    • Atom
    • Help
    • About
    • FAQ
    • TOS
    • Privacy
    • Source
    • Version
    • Contact

    tiflolinux.org - GNU Social is a social network, courtesy of tiflolinux.org. It runs on GNU social, version 2.0.1-beta0, available under the GNU Affero General Public License.

    Creative Commons Attribution 3.0 All tiflolinux.org - GNU Social content and data are available under the Creative Commons Attribution 3.0 license.