Back to blog

How to store data in SQLite: the complete guide from first table to production-ready database

SQLite runs inside every Android and iOS device, Python's standard library, and most embedded systems on the planet. The entire database lives in a single file, with no network layer, daemon, or config files to manage. That zero-overhead model makes it the default choice for web scrapers, mobile apps, CLI tools, and data pipelines that need structured storage without server complexity. This guide covers the full lifecycle: schema design, inserts, queries, security, and debugging.

TL;DR

  • sqlite3 is in Python's standard library. No installation needed.
  • Always use ? placeholders, never f-strings or string formatting in queries.
  • Enable two PRAGMAs at connection start: foreign_keys = ON and journal_mode = WAL.
  • Wrap every multi-step write in a transaction. Wrap myq every connection in a context manager.
  • executemany() inside a single transaction is 50–100x faster than looping execute().
  • Add STRICT to table definitions if you're on SQLite 3.37+. Type affinity will silently accept wrong types otherwise.
  • Store the database path in a config variable. Centralize connection logic in one helper function.
  • When you hit concurrent write bottlenecks or need network access, move to PostgreSQL. Your SQL transfers directly.

Prerequisites

You'll need Python 3.6+ and basic familiarity with functions, context managers, and exception handling. The sqlite3 module ships with the standard library, nothing to install.

Basic SQL is assumed. You should recognize SELECT, INSERT, UPDATE, and DELETE. This guide covers SQLite-specific behavior, not SQL fundamentals.

For the platform-specific sections: Android examples use Kotlin and Android Studio; Flutter examples use Dart with sqflite or drift in your pubspec.yaml; Windows examples use .NET 6+ with the Microsoft.Data.Sqlite NuGet package.

To run queries interactively, grab DB Browser for SQLite.

Setting up a SQLite database and managing connections

The difference between an SQLite project that runs cleanly for two years and one that starts breaking after three weeks usually has nothing to do with queries. It comes down to how you open and close connections.

Creating and initializing the database

Because sqlite3 is part of the standard Python library, a working database connection requires nothing beyond a Python installation.

import sqlite3
# Creates the file if it doesn't exist; opens it if it does
conn = sqlite3.connect("inventory.db")

If inventory.db doesn't exist yet, SQLite will create it. If it does, SQLite will open it. There's no separate CREATE DATABASE step like in PostgreSQL.

For testing or single-run pipelines where data doesn't need to persist beyond the process, use an in-memory database:

conn = sqlite3.connect(":memory:")

In-memory databases are fast and clean up automatically when the connection closes. They're a popular go-to for unit tests when users want to spin up a fresh database, run logic against it, and throw it away without any teardown.

NoteFile extension conventions matter for tooling. .sqlite is recognized by most GUI browsers like DB Browser for SQLite; .db is common in framework tutorials. Pick one and stay consistent. Mixing them is a surprisingly common source of "why is my table empty?" debugging sessions.

Scattering connect() calls across a codebase leads to multiple connection objects, some never closed, some pointing at the wrong file. Wrap initialization in a class instead:

import sqlite3
class DatabaseManager:
def __init__(self, db_path: str):
self.db_path = db_path
def initialize(self):
with sqlite3.connect(self.db_path) as conn:
conn.execute("""
CREATE TABLE IF NOT EXISTS products (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
category TEXT,
unit_price REAL,
stock_count INTEGER DEFAULT 0,
created_at TEXT DEFAULT CURRENT_TIMESTAMP
)
""")
conn.commit()

IF NOT EXISTS makes this safe to call on every startup without throwing errors on tables that already exist. Startup logic should be boring and reliable.

Libraries like SQLAlchemy, Flutter's sqflite, and Microsoft.Data.Sqlite all follow the same principle – wrap raw SQLite behind a structured interface so that the rest of the application never touches the database file directly.

Equivalent patterns in other environments

Environment

Pattern

SQLAlchemy

create_engine() + Base.metadata.create_all(engine)

Flutter

openDatabase() with an onCreate callback

.NET

Microsoft.Data.Sqlite with an InitializeDatabase() method

Getting the database open is the straightforward part. Keeping the connection lifecycle clean is where most users develop problems.

Managing connections reliably

First, decide on your connection lifetime – short-lived or persistent.

Short-lived connections open for a single operation or transaction and close immediately. They're the right default for scripts, request handlers, and anything that runs periodically.

Persistent connections stay open for the lifetime of a process and make sense when connection overhead is measurable, for example, desktop applications or long-running daemons with continuous write activity.

For most use cases, short-lived connections are safer. They release locks immediately, leave no dangling state, and fail loudly if something goes wrong.

Python's context manager pattern handles commit and close automatically:

with sqlite3.connect("inventory.db") as conn:
conn.execute(
"INSERT INTO products (name, unit_price) VALUES (?, ?)",
("Widget", 9.99)
)
# Commits on clean exit, rolls back on exception -- automatically

As you can see, there’s no explicit conn.close() or try/finally boilerplate. For larger projects, it’s recommended to centralize connection logic in a single helper so that every part of the codebase opens connections the same way:

from contextlib import contextmanager
import sqlite3
DB_PATH = "inventory.db" # Store in a config or env variable -- never hardcode it
@contextmanager
def get_db_connection():
conn = sqlite3.connect(DB_PATH)
try:
yield conn
conn.commit()
except Exception:
conn.rollback()
raise
finally:
conn.close()

If you need to rename the file, switch to a config variable, or add logging, and change it in one place instead of hunting through the codebase.

It’s important to never hold a connection open across slow operations. The process includes opening a connection, firing a network request, waiting for the response, then writing the result. SQLite holds a write lock for that entire wait. On a small script, this is harmless, but in a scraper making 500 requests per run, it adds up.

To fix this, simply finish the slow work first, then open the connection only when you're ready to write. For projects with heavier concurrent read requirements, enable WAL mode early: 

conn.execute("PRAGMA journal_mode=WAL")

WAL (Write-Ahead Logging) allows readers and writers to operate concurrently rather than taking turns. You can add this at initialization and rarely think about it again.

With connections handled reliably, the next decision that shapes everything downstream is the schema.

Designing the database schema

A schema is a contract. It defines what data your application is allowed to store and what guarantees the database will enforce. Time spent sketching it before writing any SQL pays back every time you avoid hunting down a bug caused by a null that shouldn't exist or a duplicate that should have been blocked.

SQLite's type system – a trap worth knowing

SQLite uses type affinity rather than strict types. The five affinity types are INTEGER, TEXT, REAL, BLOB, and NUMERIC, but declaring a column as any of them doesn't lock it down. A column declared as INTEGER will silently accept a string without raising an error. Store "9.99" instead of 9.99 in a unit_price column and SQLite won't complain, but when you run a SUM across that column three months later, the result will be wrong, and nothing will tell you why.

SQLite 3.37+ has an opt-in fix. STRICT tables enforce actual type constraints at write times:

CREATE TABLE products (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
unit_price REAL NOT NULL
) STRICT;

When building something for production, always add STRICT from the start, because retrofitting type safety into an existing schema is painful.

Creating the tables

For the rest of this guide, we'll use an example of a product inventory tracker with suppliers as the working example.

with sqlite3.connect("inventory.db") as conn:
conn.execute("""
CREATE TABLE IF NOT EXISTS products (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
category TEXT,
unit_price REAL CHECK(unit_price > 0),
stock_count INTEGER DEFAULT 0,
created_at TEXT DEFAULT CURRENT_TIMESTAMP
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS suppliers (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL UNIQUE,
country TEXT
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS product_suppliers (
product_id INTEGER NOT NULL,
supplier_id INTEGER NOT NULL,
PRIMARY KEY (product_id, supplier_id),
FOREIGN KEY (product_id) REFERENCES products(id),
FOREIGN KEY (supplier_id) REFERENCES suppliers(id)
)
""")

A few decisions worth noting:

  • DEFAULT 0 on stock_count means application code never has to pass an initial value.
  • DEFAULT CURRENT_TIMESTAMP on created_at is filled out automatically.
  • CHECK(unit_price > 0) enforces a business rule at the database level. Even if the application has a bug, a negative price cannot be committed.

Foreign key enforcement is disabled by default in SQLite, which trips up developers coming from PostgreSQL. You can enable it at the start of every connection:

conn.execute("PRAGMA foreign_keys = ON")

Foreign keys are one constraint type. The others worth defining at schema time are primary keys, UNIQUE, and CHECK. Together they cover the data integrity guarantees that would otherwise require enforcement in application code, scattered across every function that touches the database.

Primary keys and additional constraints

INTEGER PRIMARY KEY is an alias for SQLite's internal rowid. Any row that doesn't explicitly set this column gets an auto-incremented value. For junction tables like product_suppliers, define a composite primary key instead:

PRIMARY KEY (product_id, supplier_id)

This prevents the same product-supplier pair from being recorded twice, which is exactly what a junction table guarantees.

UNIQUE constraints prevent duplicates on non-primary-key columns:

name TEXT NOT NULL UNIQUE

CHECK constraints push basic business rules down into the database layer – the last line of defense against bad data:

unit_price REAL CHECK(unit_price > 0)

Schema tips:

  • Sketch the schema before writing any SQL. A quick diagram of tables, required vs. optional columns, and relationships take ten minutes to prepare and prevent a lot of ALTER TABLE pain down the road.
  • Use snake_case for all table and column names. Avoid SQL-reserved words as identifiers. SQLite parses ORDER and GROUP as keywords first, so a table named order breaks any query that references it unless you wrap the name in quotes every single time.
  • Plan for schema migrations from the start. SQLite's ALTER TABLE supports ADD COLUMN, RENAME COLUMN, RENAME TO, and (since 3.35.0) DROP COLUMN, with the restriction that dropped columns cannot be primary keys, part of an index, or referenced by a foreign key. Any significant structural change requires recreating the table. Keeping schema creation SQL in version control from day one makes this far less painful.

With the schema defined, the next step is getting data in, with one rule applicable to every insert.

Inserting data

Value passing, large dataset handling, and duplicate record behavior are defaults that often cause problems with query writing. Each has a safer alternative worth knowing before writing any production code. 

Parameterized queries are the only correct default

For every single insert the write function looks like this:

with get_db_connection() as conn:
cursor = conn.cursor()
cursor.execute(
"INSERT INTO products (name, category, unit_price, stock_count) VALUES (?, ?, ?, ?)",
("Widget Pro", "Hardware", 29.99, 100)
)
print(f"New row ID: {cursor.lastrowid}")

The ? placeholders are where the safety comes from. The database driver handles the binding between placeholder and value. Whatever gets passed in, even if it contains SQL syntax, gets treated as literal data and never executed as a command. The alternative (building INSERT statements with f-strings or %-formatting) is the primary SQL injection vector in Python codebases, because user input gets interpreted as SQL rather than data.

For queries with many columns where positional order becomes error-prone, switch to named placeholders:

cursor.execute(
"""INSERT INTO products (name, category, unit_price)
VALUES (:name, :category, :unit_price)""",
{"name": "Widget Pro", "category": "Hardware", "unit_price": 29.99}
)

Named placeholders make it obvious what's happening where and are resilient to column reordering.

Bulk inserts with executemany()

Looping execute() over a list of records and committing each one individually is the SQLite performance mistake users make. The difference is unnoticeable at 10 rows, but the performance overhead adds up quickly at 50,000 rows.

executemany() inside a single transaction fixes this:

products = [
{"name": "Bolt Pack", "category": "Fasteners", "unit_price": 4.99, "stock_count": 500},
{"name": "Hex Key Set", "category": "Tools", "unit_price": 12.49, "stock_count": 75},
{"name": "Cable Tie", "category": "Accessories", "unit_price": 2.99, "stock_count": 1000},
]
with get_db_connection() as conn:
conn.executemany(
"INSERT INTO products (name, category, unit_price, stock_count) VALUES (:name, :category, :unit_price, :stock_count)",
products
)

Batching inside a single transaction is consistently 50× to 100× faster than per-row commits on large datasets. The code change is two lines.

INSERT OR REPLACE and INSERT OR IGNORE

Bulk inserts assume all data is new. Every scraper and sync pipeline eventually hits the case where a record might already exist.

INSERT OR REPLACE treats the incoming row as authoritative and overwrites the existing one:

conn.execute(
"INSERT OR REPLACE INTO products (id, name, unit_price) VALUES (?, ?, ?)",
(1, "Widget Pro", 31.99)
)

INSERT OR IGNORE skips the row silently if it would violate a unique constraint. Re-running the same script against the same data produces no errors and no duplicates:

conn.execute(
"INSERT OR IGNORE INTO products (name, unit_price) VALUES (?, ?)",
("Widget Pro", 29.99)
)

Insertion checklist

  • Wrap bulk inserts in a single transaction. Per-row commits on large datasets are catastrophically slow.
  • Retrieve auto-assigned primary keys with cursor.lastrowid. Use cursor.lastrowid after an insert to retrieve the auto-assigned primary key.
  • Validate and clean data before inserting. SQLite's type affinity silently accepts wrong types, storing bad data without complaint.

Collecting data before storing it?

Decodo's Web Scraping API returns clean, structured output that drops straight into your SQLite pipeline with minimal validation overhead.

Querying, updating, and deleting data

SELECT * is fine for exploration, but for anything running in production, name your columns explicitly. It makes the code readable, reduces memory usage, and prevents silent breakage when someone adds a column to the table later.

Querying with SELECT

with get_db_connection() as conn:
conn.row_factory = sqlite3.Row # Access columns by name, not index position
cursor = conn.cursor()
cursor.execute("""
SELECT id, name, unit_price, stock_count
FROM products
WHERE (category = ? OR category = ?) AND unit_price < ?
ORDER BY unit_price ASC
LIMIT 10 OFFSET 0
""", ("Hardware", "Tools", 50.00))
rows = cursor.fetchall()
for row in rows:
print(f"{row['name']}: ${row['unit_price']}")

For partial text matching, use LIKE with a % wildcard, for example, WHERE name LIKE ? with "%Widget%" as the bound value matches any name containing that string.

conn.row_factory = sqlite3.Row lets you access result columns by name (row['name']) rather than by index (row[0]). Index-based access is brittle, and adding a column in the middle of the schema breaks every index-based access downstream. However, Row objects and tuples coming out of SQLite are raw database types, so parsing them into the data structures your application actually uses is a separate step worth thinking about deliberately.

Pick the fetch method based on what you’re expecting back:

  • fetchall() loads the full result set into memory. It works for small datasets, but can be a problem for large ones.
  • fetchone() retrieves a single row. You can use this for lookups where you’d expect a 0 or 1 result.
  • fetchmany(n) pulls n rows at a time, which is better for large result sets that you want to process in chunks.

Updating records with UPDATE

When a product price changes or stock levels shift, target the exact rows you want to modify using UPDATE with a WHERE clause:

with get_db_connection() as conn:
cursor = conn.cursor()
cursor.execute(
"UPDATE products SET unit_price = ?, stock_count = ? WHERE id = ?",
(34.99, 90, 1)
)
print(f"Rows updated: {cursor.rowcount}")

The WHERE clause is mandatory. An UPDATE without a WHERE clause modifies every row in the table. Check cursor.rowcount in scripts that run unattended, so you can know exactly how many rows were affected.

Deletion follows the same discipline – the WHERE clause is the only thing standing between a targeted operation and a full table wipe.

Deleting records with DELETE

Before running any DELETE in production, run the equivalent SELECT first to confirm you’re targeting the right rows. The deletion itself is straightforward:

with get_db_connection() as conn:
cursor = conn.cursor()
cursor.execute("DELETE FROM products WHERE id = ?", (1,))

For data that needs an audit trail, a soft delete is a safer default than physically removing rows:

-- Add during table creation
is_deleted INTEGER DEFAULT 0
-- Mark a row as deleted
UPDATE products SET is_deleted = 1 WHERE id = ?
-- Query only active rows
SELECT * FROM products WHERE is_deleted = 0

Soft deletes preserve history, allow undoing, and make debugging much easier with accidental deletions. Use the following code to expire the oldest records from a log or cache table without touching newer ones:

DELETE FROM logs
WHERE id IN (
SELECT id FROM logs ORDER BY created_at ASC LIMIT 100
)

Single-statement operations are straightforward, transactions ensure multiple writes succeed or fail as a unit.

Transactions for multi-step operations

Some operations only make sense together. Decrementing stock and recording the sale have to both succeed or both fail. Here's how to enforce that with an explicit transaction:

def record_sale(conn, product_id: int, quantity: int, sale_price: float):
try:
conn.execute("BEGIN")
conn.execute(
"UPDATE products SET stock_count = stock_count - ? WHERE id = ?",
(quantity, product_id)
)
conn.execute(
"INSERT INTO sales (product_id, quantity, sale_price) VALUES (?, ?, ?)",
(product_id, quantity, sale_price)
)
conn.commit()
except Exception:
conn.rollback()
raise

The context manager pattern handles commit and rollback automatically for simpler cases. Explicit BEGIN/COMMIT/ROLLBACK gives finer control when the logic is more involved.

Transactions protect consistency across operations. The next layer of protection runs at every individual statement, keeping bad data out in the first place.

Protecting data integrity and preventing SQL injection

Most data problems trace back to one of three failure points: a query built from raw string input, a constraint that was never defined, or a value that looked valid, but wasn't. The first is a security issue and the latter two are data quality problems, each with a different fix.

Why parameterized queries are non-negotiable

SQL injection is worth understanding concretely. The code below shows what happens when a query is built from raw string input, and how a simple name lookup becomes a full table dump with a crafted input value.

# NEVER do this
user_input = "' OR '1'='1"
query = f"SELECT * FROM products WHERE name = '{user_input}'"
conn.execute(query)
# Executes: SELECT * FROM products WHERE name = '' OR '1'='1'
# Returns every row in the table regardless of access controls.
# For multi-statement attacks, executescript() is the real risk:
conn.executescript(f"SELECT * FROM products WHERE name = '{user_input}'")
# executescript() runs all statements -- this is where DROP TABLE attacks land.

Parameterized queries close this off entirely:

# Safe regardless of what user_input contains
cursor.execute("SELECT * FROM products WHERE name = ?", (user_input,))

This applies to every statement type. WHERE clauses in SELECT, UPDATE, and DELETE are just as vulnerable as INSERT. Any query touching user-supplied input needs parameterized values.

When building pipelines that write external data into SQLite, parameterized queries are the minimum bar – every value arriving from outside the application is untrusted input until proven otherwise. Using a data collection layer like Decodo's Web Scraping API reduces this burden because structured, clean output means less defensive handling before the data reaches the write path.

Enforcing integrity with constraints

Constraints handle a different class of problem – valid-looking data that violates business rules. They run as a second layer of defense even when the application code has a bug:

Constraint

What it prevents

NOT NULL

Accidental null values in required fields

UNIQUE

Duplicate entries on non-primary-key columns

CHECK

Business rule violations, e.g., negative prices

FOREIGN KEY

Orphaned records and referential inconsistency

As noted in the schema section, foreign key enforcement is easy to forget and difficult to debug when missing. Enable it at the start of every connection:

conn.execute("PRAGMA foreign_keys = ON")

Transactions as an integrity tool

Constraints protect individual rows, while transactions protect operations that span multiple rows or tables. SQLite transactions are ACID-compliant, meaning all changes commit together (or none do), constraints are checked on commit, concurrent operations don't interfere, and committed data persists after a crash.

Without a transaction, a crash between the update and the INSERT in record_sale leaves stock decremented, but no sale recorded, leading to two tables out of sync with no specific error shown. You can verify by running this snippet:

# Simulate a mid-operation failure without a transaction
conn.execute("UPDATE products SET stock_count = stock_count - 1 WHERE id = ?", (1,))
raise RuntimeError("Simulated crash") # INSERT never runs. Stock is now wrong.

Wrap both writes in a transaction and the rollback restores the original state automatically.

For simpler cases, the context manager handles commit and rollback without BEGIN/COMMIT boilerplate. Use explicit BEGIN and COMMIT when the logic branches mid-transaction, or you need finer control over save points.

Input validation before the database

Parameterized queries prevent injection. They don't validate that a value makes sense. A negative price passes through a placeholder without complaint, lands in the database, and breaks margin calculations. Catch bad data in application code before it ever reaches the write path:

def validate_product(name: str, unit_price: float, stock_count: int):
if not name or not name.strip():
raise ValueError("Product name is required")
if unit_price <= 0:
raise ValueError(f"unit_price must be positive, got {unit_price}")
if stock_count < 0:
raise ValueError(f"stock_count can't be negative, got {stock_count}")

Validation in application code, constraints in the schema, and parameterized queries at the statement level cause a problem if all three are not in place.

Now that we’ve covered the fundamentals – schema, inserts, queries, and integrity, we can move to patterns for data that don’t fit SQLite's native types, queries that span multiple tables, and the project contexts where SQLite's strengths matter most.

Advanced techniques: complex data types, joins, and specialized projects

Once the schema is stable and queries become more demanding, it might lead to problems with storing data that doesn't map cleanly to SQLite's native types, pulling data across multiple tables, and choosing the right patterns for specific project contexts. This section covers all three.

Storing complex data types

SQLite has no native array or list type. When you need to store tags, a set of URLs, or a dictionary of config values, choose between these two options.

Option 1 is serializing to JSON. It’s implemented quickly, but SQL can't query into the values:

import json
tags = ["electronics", "sale", "featured"]
with get_db_connection() as conn:
conn.execute(
"INSERT INTO products (name, tags) VALUES (?, ?)",
("Widget Pro", json.dumps(tags))
)
# Reading back
cursor.execute("SELECT tags FROM products WHERE id = ?", (1,))
row = cursor.fetchone()
tags = json.loads(row["tags"])

Option 2 is a normalized child table. More code, but fully queryable:

CREATE TABLE product_tags (
product_id INTEGER NOT NULL REFERENCES products(id),
tag TEXT NOT NULL,
PRIMARY KEY (product_id, tag)
);

WHERE tag = 'sale' now works at the SQL level, indexes apply, and joins become possible. Use JSON for list values that will never be queried directly, and a child table when they will.

A BLOB column stores binary data like small images, compressed payloads, or serialized objects directly alongside their metadata, retrieving both in a single query with no separate file lookup. Where that coupling matters, BLOB keeps the data model simple. Where it doesn't, storing large BLOBs inflates the database file, slows backups, and makes the file unwieldy to move around.

For anything above a few hundred kilobytes, that doesn't need to travel with its metadata, storing the file on disk and keeping only the path in the database is the better trade-off:

with open("thumbnail.png", "rb") as f:
image_data = f.read()
conn.execute(
"INSERT INTO product_images (product_id, image_data) VALUES (?, ?)",
(1, image_data)
)

Combining tables with JOIN

When data lives across multiple tables, JOIN brings it together. INNER JOIN returns only rows with matching entries in both tables:

cursor.execute("""
SELECT p.name, p.unit_price, s.name AS supplier_name
FROM products p
INNER JOIN product_suppliers ps ON p.id = ps.product_id
INNER JOIN suppliers s ON ps.supplier_id = s.id
WHERE p.category = ?
""", ("Hardware",))

LEFT JOIN includes all rows from the primary table even when no match exists in the joined table:

cursor.execute("""
SELECT p.name, p.unit_price, s.name AS supplier_name
FROM products p
LEFT JOIN product_suppliers ps ON p.id = ps.product_id
LEFT JOIN suppliers s ON ps.supplier_id = s.id
""")

Always use table-qualified column names (for example, p.name, s.name) when joined tables share column names.

For complex multi-step queries, Common Table Expressions (CTEs) are far more readable than deeply-nested subqueries. Each CTE is a named mini-table and the final SELECT reads like a sentence:

WITH low_stock AS (
SELECT id, name, stock_count
FROM products
WHERE stock_count < 10
),
recent_sales AS (
SELECT product_id, SUM(quantity) AS total_sold
FROM sales
WHERE created_at > date('now', '-30 days')
GROUP BY product_id
)
SELECT ls.name,
ls.stock_count,
COALESCE(rs.total_sold, 0) AS sold_last_30_days
FROM low_stock ls
LEFT JOIN recent_sales rs ON ls.id = rs.product_id
ORDER BY rs.total_sold DESC;

SQLite in specialized project contexts

  • Web scraper persistence. SQLite is a natural fit as the storage layer for Python web scrapers that accumulate records across runs. Using Decodo's Web Scraping API as the data collection layer means structured, clean output arrives ready to insert, reducing the validation burden before it hits the database. INSERT OR IGNORE prevents duplicates across runs, and timestamped rows allow detecting what has changed since the last scrape:
conn.execute("""
CREATE TABLE IF NOT EXISTS scraped_products (
url TEXT PRIMARY KEY,
name TEXT,
price REAL,
scraped_at TEXT DEFAULT CURRENT_TIMESTAMP
)
""")
conn.execute(
"INSERT OR IGNORE INTO scraped_products (url, name, price) VALUES (?, ?, ?)",
(product_url, product_name, product_price)
)

The same pattern applies to AI web scraping pipelines where collected data feeds a model or analysis step. SQLite provides a lightweight, queryable store without adding infrastructure.

  • IoT and sensor logging. SQLite's footprint is small enough to run on single-board computers. For time-series data, the index is the critical schema decision. Without one, time-range queries do full table scans:
CREATE TABLE sensor_readings (
id INTEGER PRIMARY KEY,
device_id TEXT NOT NULL,
value REAL NOT NULL,
recorded_at TEXT DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_sensor_time ON sensor_readings(device_id, recorded_at);
  • CLI tools and developer utilities. SQLite gives a persistent user state without asking anyone to install anything extra. Here's a simple key-value store built with Click:
import sqlite3
import click
DB_PATH = "~/.config/mytool/data.db"
@click.command()
@click.argument("key")
@click.argument("value")
def store(key, value):
"""Store a key-value pair persistently."""
with sqlite3.connect(DB_PATH) as conn:
conn.execute(
"INSERT OR REPLACE INTO config (key, value) VALUES (?, ?)",
(key, value)
)
click.echo(f"Stored {key}")

The JSON vs. child table decision, the JOIN patterns, and the project-specific schema choices above are all expressions of the same underlying principle – match the data structure to how you need to query it. Get that right, and the rest follows.

Using SQLite across platforms and with abstraction layers

The SQL in this guide transfers directly across every platform below. What changes is the layer between the code and the database engine. Each platform presents the same core trade-off. Raw SQL gives direct control with minimal setup, while an abstraction layer adds compile-time safety and reduces boilerplate at the cost of setup time and generated queries you didn't write..

Python: sqlite3 vs. SQLAlchemy

The built-in sqlite3 module covers most Python use cases without installation. You can use it when every query is explicitly written in the source code, and the project is small enough that an ORM would add more ceremony than clarity.

SQLAlchemy Core earns its place when query conditions are determined at runtime, filtering by whichever columns the user selects, for example, and the alternative is concatenating strings to build dynamic SQL. The key advantage over raw sqlite3 is programmatic query construction, assembling filters, joins, and ORDER BY clauses in code without touching a string.

from sqlalchemy import create_engine, MetaData, Table, Column, Integer, String, Float, select, and_
engine = create_engine("sqlite:///inventory.db")
metadata = MetaData()
products = Table("products", metadata,
Column("id", Integer, primary_key=True),
Column("name", String, nullable=False),
Column("unit_price", Float),
)
metadata.create_all(engine)
# Build the query programmatically -- no string concatenation
filters = [products.c.unit_price < 50.0]
if category:
filters.append(products.c.category == category)
query = select(products.c.name, products.c.unit_price).where(and_(*filters))
with engine.connect() as conn:
for row in conn.execute(query):
print(row.name, row.unit_price)

SQLAlchemy ORM maps Python classes to tables and generates SQL from method calls. It pays off for larger applications with complex data models where the abstraction genuinely reduces code. For simple, read-heavy scripts like pulling data, transforming it, and writing a report, the ORM setup cost outweighs the benefit. Raw sqlite3 ships faster and stays easier to read.

Android: SQLiteOpenHelper vs. Room

SQLiteOpenHelper is the traditional approach. Override onCreate() to handle schema creation and onUpgrade() for migrations:

class InventoryDatabase(context: Context) :
SQLiteOpenHelper(context, "inventory.db", null, 1) {
override fun onCreate(db: SQLiteDatabase) {
db.execSQL("""
CREATE TABLE products (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
unit_price REAL,
stock_count INTEGER DEFAULT 0
)
""")
}
override fun onUpgrade(db: SQLiteDatabase, oldVersion: Int, newVersion: Int) {
// Handle schema migrations per version increment
}
}

Room is recommended for any new Android development. It generates SQL during the compile time, catches type mismatches before runtime, and eliminates the boilerplate code that SQLiteOpenHelper leaves to you:

@Entity(tableName = "products")
data class Product(
@PrimaryKey(autoGenerate = true) val id: Int = 0,
@ColumnInfo(name = "name") val name: String,
@ColumnInfo(name = "unit_price") val unitPrice: Double,
@ColumnInfo(name = "stock_count") val stockCount: Int = 0
)
@Dao
interface ProductDao {
@Query("SELECT * FROM products WHERE unit_price < :maxPrice")
fun getAffordableProducts(maxPrice: Double): List<Product>
@Insert(onConflict = OnConflictStrategy.REPLACE)
fun insertProduct(product: Product): Long
}

SQLiteOpenHelper ships faster for a screen or two with simple queries. Once the schema involves multiple tables, relationships, and LiveData or Flow integration, Room pays is best. Complex analytical queries that Room can't express cleanly are where @RawQuery earns its place.

Flutter: sqflite and Drift

sqflite uses raw SQL, returns Futures for every operation, and requires no setup beyond the package itself. Use it when direct SQL control is enough and the project doesn't justify the Drift setup cost.

import 'package:sqflite/sqflite.dart';
import 'package:path/path.dart';
Future<Database> openInventoryDatabase() async {
return openDatabase(
join(await getDatabasesPath(), 'inventory.db'),
onCreate: (db, version) {
return db.execute('''
CREATE TABLE products (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
unit_price REAL,
stock_count INTEGER DEFAULT 0
)
''');
},
version: 1,
);
}
Future<void> insertProduct(Database db, Map<String, dynamic> product) async {
await db.insert('products', product,
conflictAlgorithm: ConflictAlgorithm.replace);
}

Drift generates type-safe Dart query code from table definitions. Its relationship to sqflite mirrors Room's relationship to SQLiteOpenHelper, delivering more setup overhead, more compile-time safety, and is better suited to large applications with complex schemas and reactive UI requirements. For a small app with two or three tables, sqflite ships faster.

Windows: Microsoft.Data.Sqlite and Entity Framework Core

Microsoft.Data.Sqlite gives thin, explicit control without abstraction overhead:

using Microsoft.Data.Sqlite;
var connection = new SqliteConnection("Data Source=inventory.db");
connection.Open();
var createCmd = connection.CreateCommand();
createCmd.CommandText = @"
CREATE TABLE IF NOT EXISTS products (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
unit_price REAL
)";
createCmd.ExecuteNonQuery();
var insertCmd = connection.CreateCommand();
insertCmd.CommandText =
"INSERT INTO products (name, unit_price) VALUES ($name, $price)";
insertCmd.Parameters.AddWithValue("$name", "Widget Pro");
insertCmd.Parameters.AddWithValue("$price", 29.99);
insertCmd.ExecuteNonQuery();

For Windows apps that need full ORM features, Entity Framework Core with the SQLite provider adds migrations, LINQ queries, and change tracking:

// Define the model
public class Product {
public int Id { get; set; }
public string Name { get; set; }
public double UnitPrice { get; set; }
public int StockCount { get; set; }
}
// Define the context
public class InventoryContext : DbContext {
public DbSet<Product> Products { get; set; }
protected override void OnConfiguring(DbContextOptionsBuilder options)
=> options.UseSqlite("Data Source=inventory.db");
}
// Query with LINQ - EF Core generates the SQL
using var context = new InventoryContext();
var affordable = context.Products
.Where(p => p.UnitPrice < 50.0)
.OrderBy(p => p.UnitPrice)
.ToList();

The same decision rule applies to every other platform: Microsoft.Data.Sqlite ships faster for small projects with straightforward queries, Entity Framework Core is most cost-efficient when the data model is complex, migrations matter, and LINQ queries would otherwise be hand-rolled SQL.

Enable query logging during development and verify the SQL the framework is actually generating. Abstractions produce unexpected queries, and EXPLAIN QUERY PLAN won't catch what you're not looking at."

SQLite is also widely used beyond the platforms covered here. If you're working with Go or comparing language options for a data-heavy project, this guide to Go vs. Python covers how the two languages approach the same class of problems.

Final thoughts

SQLite's reputation is deserved, but the thing most tutorials don't mention is how often it outlasts the projects that were supposed to outgrow it. The concurrent write limit sounds alarming until you measure actual write frequency and realize most applications never get close to it. The single-file model sounds fragile until you've spent a week debugging a misconfigured PostgreSQL connection in a staging environment.

Migrate when sustained write contention shows up in your logs, not in anticipation of scale. Most projects never reach that point.

The one place SQLite will genuinely surprise you is schema migrations. ALTER TABLE limitations are manageable early and painful late. Keeping your schema-creation SQL in version control from day one and treating every structural change as a migration script rather than a one-off fix is the habit that makes the difference when the schema needs to evolve six months in.

When you do eventually move to PostgreSQL, the SQL will transfer directly. That's rarer than it should be in this ecosystem, and it's worth choosing tools with that property.

Store data efficiently

Decodo's Web Scraping API delivers structured, clean output at scale so your SQLite pipeline gets reliable input from day one.

About the author

Mykolas Juodis

Head of Marketing

Mykolas is a seasoned digital marketing professional with over a decade of experience, currently leading Marketing department in the web data gathering industry. His extensive background in digital marketing, combined with his deep understanding of proxies and web scraping technologies, allows him to bridge the gap between technical solutions and practical business applications.


Connect with Mykolas via LinkedIn.

All information on Decodo Blog is provided on an as is basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Decodo Blog or any third-party websites that may belinked therein.

Frequently asked questions

What's the difference between SQLite and MySQL or PostgreSQL?

SQLite is a serverless, file-based engine that runs inside your application process. MySQL and PostgreSQL run as separate server processes, support multiple simultaneous writers, and are designed for multi-user networked deployments. SQLite trades concurrency and network access for simplicity and zero configuration. For single-user apps, prototypes, mobile apps, and read-heavy workloads with moderate traffic, SQLite is enough. High concurrent write traffic or multi-user access is the signal to move to PostgreSQL.

Is SQLite suitable for production applications?

Yes, but for the right workload. SQLite powers production databases in Android and iOS apps, desktop tools, embedded systems, and many web applications with low-to-moderate traffic. It handles thousands of queries per second on modest hardware as long as reads dominate. The limitations are concurrent writes and network access.

Can multiple processes write to the same SQLite database simultaneously?

SQLite supports multiple concurrent readers, but only one writer at a time. Write operations acquire an exclusive lock on the database file. WAL mode (PRAGMA journal_mode=WAL) improves this by allowing readers and writers to operate simultaneously, but multiple concurrent writers still queue. If your application requires multiple simultaneous writers, PostgreSQL is the better choice.

How to Save Your Scraped Data

Web scraping without proper data storage wastes your time and effort. You spend hours gathering valuable information, only to lose it when your terminal closes or your script crashes. This guide will teach you multiple storage methods, from CSV files to databases, with practical examples you can implement immediately to keep your data safe.

🐍 Python Web Scraping: In-Depth Guide 2026

Welcome to 2026! What better way to celebrate than by mastering Python? If you’re new to web scraping, don’t worry – this guide starts from the basics, guiding you step-by-step on collecting data from websites. Whether you’re curious about automating simple tasks or diving into more significant projects, Python makes it easy and fun to start. Let’s slither into the world of web scraping and see how powerful this tool can be!

What Is Data Cleaning and Why Is It Important?

Did you know that businesses worldwide generate approximately 2.5 quintillion bytes of data every day? That's a staggering amount of information! But here's the catch – not all of this data is clean, accurate, or reliable. In fact, studies suggest that up to 40% of business data is flawed.

This is where the concept of data cleaning comes into play. Understanding data cleaning and its importance can help businesses ensure their data is accurate, reliable, and ready for insightful analysis.

© 2018-2026 decodo.com (formerly smartproxy.com). All Rights Reserved