Loading Data into Google BigQuery

2 minute read

Published:

According to Google,

  • BigQuery is Google’s fully managed, NoOps, low cost analytics database.
  • With BigQuery, the users can query terabytes and terabytes of data without having any infrastructure to manage or needing a database administrator.
  • BigQuery uses SQL and can take advantage of the pay-as-you-go model.
  • BigQuery allows you to focus on analyzing data to find meaningful insights.

In this post, we’re going to look at how to load data into BigQuery.

Creating a New Dataset in BigQuery

  • Navigate to the BigQuery service (under Big Data section)

  • Click on the name of your project, then click Create Dataset.

  • Set the Dataset ID to my_dataset. Use the default values for the other fields.

  • Click Create dataset.

  • A dataset called my_dataset should be created under your project name.


Loading Local Data into a BigQuery Table

Next, we’re going to load a local CSV file into a BigQuery table.

  • Select the my_dataset dataset and click Create Table.

  • Use the following information as the table options.

- Source:

Create table from: Upload
Choose File: select the local CSV file
File format: CSV

- Table name: my_table

- Schema:

Check "Auto Detect (tip: Not seeing the checkbox? Ensure the file format is CSV and not Avro)"

- Advanced Options

Leave at default values
  • Click Create Table.

The local CSV data has been loaded into the BigQuery table (my_table).


Running SQL Queries

Since the table is already populated by the data from the local CSV file, we can now execute some queries.

Let’s try the simplest one. Take the top 5 rows from my_table.

#standardSQL
SELECT
  *
FROM
  my_dataset.my_table
LIMIT 5

Loading Data from Google Cloud Storage into BigQuery

Previously we load data from a CSV file stored locally.

In this subsection, we’re going to perform the same action but the CSV file is located in GCS.

  • In the Cloud Shell, execute the following command.
bq load \
--source_format=CSV \
--autodetect \
--noreplace  \
my_dataset.my_table \
gs://<BUCKET_NAME>/<path_to_csv_file>

Creating Tables from Other Tables in BigQuery

In this scenario, we’d like to extract certain number of rows from a BigQuery table and store those rows in a new BigQuery table.

For example, we’d like to create a table called my_even_numbers_table from my_table. For simplicity, let’s assume that my_table consists of a single integer column called num. We want to filter out all the even numbers from my_table and store them into my_even_numbers_table.

In the Query Editor, execute the following command.

#standardSQL
CREATE TABLE
  my_dataset.my_even_numbers_table AS
SELECT
  *
FROM
  my_dataset.my_table
WHERE
  num % 2 = 0