bigquery

This library implements various methods for working with the Google Bigquery APIs.

Installation

$ pip install --upgrade gcloud-aio-bigquery

Usage

We’re still working on documentation – for now, you can use the smoke test as an example.

Emulators

For testing purposes, you may want to use gcloud-aio-bigquery along with a local emulator. Setting the $BIGQUERY_EMULATOR_HOST environment variable to the address of your emulator should be enough to do the trick.

Submodules

Package Contents

Classes

Disposition

Create a collection of name/value pairs.

SchemaUpdateOption

Create a collection of name/value pairs.

SourceFormat

Create a collection of name/value pairs.

Dataset

Job

Table

Functions

query_response_to_dict(response)

Convert a query response to a dictionary.

Attributes

SCOPES

__version__

class bigquery.Disposition(*args, **kwds)

Bases: enum.Enum

Create a collection of name/value pairs.

Example enumeration:

>>> class Color(Enum):
...     RED = 1
...     BLUE = 2
...     GREEN = 3

Access them by:

  • attribute access:

    >>> Color.RED
    <Color.RED: 1>
    
  • value lookup:

    >>> Color(1)
    <Color.RED: 1>
    
  • name lookup:

    >>> Color['RED']
    <Color.RED: 1>
    

Enumerations can be iterated over, and know how many members they have:

>>> len(Color)
3
>>> list(Color)
[<Color.RED: 1>, <Color.BLUE: 2>, <Color.GREEN: 3>]

Methods can be added to enumerations, and members can have their own attributes – see the documentation for details.

WRITE_APPEND = 'WRITE_APPEND'
WRITE_EMPTY = 'WRITE_EMPTY'
WRITE_TRUNCATE = 'WRITE_TRUNCATE'
class bigquery.SchemaUpdateOption(*args, **kwds)

Bases: enum.Enum

Create a collection of name/value pairs.

Example enumeration:

>>> class Color(Enum):
...     RED = 1
...     BLUE = 2
...     GREEN = 3

Access them by:

  • attribute access:

    >>> Color.RED
    <Color.RED: 1>
    
  • value lookup:

    >>> Color(1)
    <Color.RED: 1>
    
  • name lookup:

    >>> Color['RED']
    <Color.RED: 1>
    

Enumerations can be iterated over, and know how many members they have:

>>> len(Color)
3
>>> list(Color)
[<Color.RED: 1>, <Color.BLUE: 2>, <Color.GREEN: 3>]

Methods can be added to enumerations, and members can have their own attributes – see the documentation for details.

ALLOW_FIELD_ADDITION = 'ALLOW_FIELD_ADDITION'
ALLOW_FIELD_RELAXATION = 'ALLOW_FIELD_RELAXATION'
bigquery.SCOPES = ['https://www.googleapis.com/auth/bigquery.insertdata', 'https://www.googleapis.com/auth/bigquery']
class bigquery.SourceFormat(*args, **kwds)

Bases: enum.Enum

Create a collection of name/value pairs.

Example enumeration:

>>> class Color(Enum):
...     RED = 1
...     BLUE = 2
...     GREEN = 3

Access them by:

  • attribute access:

    >>> Color.RED
    <Color.RED: 1>
    
  • value lookup:

    >>> Color(1)
    <Color.RED: 1>
    
  • name lookup:

    >>> Color['RED']
    <Color.RED: 1>
    

Enumerations can be iterated over, and know how many members they have:

>>> len(Color)
3
>>> list(Color)
[<Color.RED: 1>, <Color.BLUE: 2>, <Color.GREEN: 3>]

Methods can be added to enumerations, and members can have their own attributes – see the documentation for details.

AVRO = 'AVRO'
CSV = 'CSV'
DATASTORE_BACKUP = 'DATASTORE_BACKUP'
NEWLINE_DELIMITED_JSON = 'NEWLINE_DELIMITED_JSON'
ORC = 'ORC'
PARQUET = 'PARQUET'
class bigquery.Dataset(dataset_name=None, project=None, service_file=None, session=None, token=None, api_root=None)

Bases: bigquery.bigquery.BigqueryBase

Parameters:
  • dataset_name (Optional[str]) –

  • project (Optional[str]) –

  • service_file (Optional[Union[str, IO[AnyStr]]]) –

  • session (Optional[requests.Session]) –

  • token (Optional[gcloud.aio.auth.Token]) –

  • api_root (Optional[str]) –

async list_tables(session=None, timeout=60, params=None)

List tables in a dataset.

Parameters:
  • session (Optional[requests.Session]) –

  • timeout (int) –

  • params (Optional[Dict[str, Any]]) –

Return type:

Dict[str, Any]

async list_datasets(session=None, timeout=60, params=None)

List datasets in current project.

Parameters:
  • session (Optional[requests.Session]) –

  • timeout (int) –

  • params (Optional[Dict[str, Any]]) –

Return type:

Dict[str, Any]

async get(session=None, timeout=60, params=None)

Get a specific dataset in current project.

Parameters:
  • session (Optional[requests.Session]) –

  • timeout (int) –

  • params (Optional[Dict[str, Any]]) –

Return type:

Dict[str, Any]

async insert(dataset, session=None, timeout=60)

Create datasets in current project.

Parameters:
  • dataset (Dict[str, Any]) –

  • session (Optional[requests.Session]) –

  • timeout (int) –

Return type:

Dict[str, Any]

async delete(dataset_name=None, session=None, timeout=60)

Delete datasets in current project.

Parameters:
  • dataset_name (Optional[str]) –

  • session (Optional[requests.Session]) –

  • timeout (int) –

Return type:

Dict[str, Any]

class bigquery.Job(job_id=None, project=None, service_file=None, session=None, token=None, api_root=None, location=None)

Bases: bigquery.bigquery.BigqueryBase

Parameters:
  • job_id (Optional[str]) –

  • project (Optional[str]) –

  • service_file (Optional[Union[str, IO[AnyStr]]]) –

  • session (Optional[requests.Session]) –

  • token (Optional[gcloud.aio.auth.Token]) –

  • api_root (Optional[str]) –

  • location (Optional[str]) –

static _make_query_body(query, write_disposition, use_query_cache, dry_run, use_legacy_sql, destination_table)
Parameters:
  • query (str) –

  • write_disposition (bigquery.bigquery.Disposition) –

  • use_query_cache (bool) –

  • dry_run (bool) –

  • use_legacy_sql (bool) –

  • destination_table (Optional[Any]) –

Return type:

Dict[str, Any]

_config_params(params=None)
Parameters:

params (Optional[Dict[str, Any]]) –

Return type:

Dict[str, Any]

async get_job(session=None, timeout=60)

Get the specified job resource by job ID.

Parameters:
  • session (Optional[requests.Session]) –

  • timeout (int) –

Return type:

Dict[str, Any]

async get_query_results(session=None, timeout=60, params=None)

Get the specified jobQueryResults by job ID.

Parameters:
  • session (Optional[requests.Session]) –

  • timeout (int) –

  • params (Optional[Dict[str, Any]]) –

Return type:

Dict[str, Any]

async cancel(session=None, timeout=60)

Cancel the specified job by job ID.

Parameters:
  • session (Optional[requests.Session]) –

  • timeout (int) –

Return type:

Dict[str, Any]

async query(query_request, session=None, timeout=60)

Runs a query synchronously and returns query results if completes within a specified timeout.

Parameters:
  • query_request (Dict[str, Any]) –

  • session (Optional[requests.Session]) –

  • timeout (int) –

Return type:

Dict[str, Any]

async insert(job, session=None, timeout=60)

Insert a new asynchronous job.

Parameters:
  • job (Dict[str, Any]) –

  • session (Optional[requests.Session]) –

  • timeout (int) –

Return type:

Dict[str, Any]

async insert_via_query(query, session=None, write_disposition=Disposition.WRITE_EMPTY, timeout=60, use_query_cache=True, dry_run=False, use_legacy_sql=True, destination_table=None)

Create table as a result of the query

Parameters:
  • query (str) –

  • session (Optional[requests.Session]) –

  • write_disposition (bigquery.bigquery.Disposition) –

  • timeout (int) –

  • use_query_cache (bool) –

  • dry_run (bool) –

  • use_legacy_sql (bool) –

  • destination_table (Optional[Any]) –

Return type:

Dict[str, Any]

async result(session=None)
Parameters:

session (Optional[requests.Session]) –

Return type:

Dict[str, Any]

async delete(session=None, job_id=None, timeout=60)

Delete the specified job by job ID.

Parameters:
  • session (Optional[requests.Session]) –

  • job_id (Optional[str]) –

  • timeout (int) –

Return type:

Dict[str, Any]

class bigquery.Table(dataset_name, table_name, project=None, service_file=None, session=None, token=None, api_root=None)

Bases: bigquery.bigquery.BigqueryBase

Parameters:
  • dataset_name (str) –

  • table_name (str) –

  • project (Optional[str]) –

  • service_file (Optional[Union[str, IO[AnyStr]]]) –

  • session (Optional[requests.Session]) –

  • token (Optional[gcloud.aio.auth.Token]) –

  • api_root (Optional[str]) –

static _mk_unique_insert_id(row)
Parameters:

row (Dict[str, Any]) –

Return type:

str

_make_copy_body(source_project, destination_project, destination_dataset, destination_table)
Parameters:
  • source_project (str) –

  • destination_project (str) –

  • destination_dataset (str) –

  • destination_table (str) –

Return type:

Dict[str, Any]

static _make_insert_body(rows, *, skip_invalid, ignore_unknown, template_suffix, insert_id_fn)
Parameters:
  • rows (List[Dict[str, Any]]) –

  • skip_invalid (bool) –

  • ignore_unknown (bool) –

  • template_suffix (Optional[str]) –

  • insert_id_fn (Callable[[Dict[str, Any]], str]) –

Return type:

Dict[str, Any]

_make_load_body(source_uris, project, autodetect, source_format, write_disposition, ignore_unknown_values, schema_update_options)
Parameters:
Return type:

Dict[str, Any]

_make_query_body(query, project, write_disposition, use_query_cache, dry_run)
Parameters:
Return type:

Dict[str, Any]

async create(table, session=None, timeout=60)

Create the table specified by tableId from the dataset.

Parameters:
  • table (Dict[str, Any]) –

  • session (Optional[requests.Session]) –

  • timeout (int) –

Return type:

Dict[str, Any]

async patch(table, session=None, timeout=60)

Patch an existing table specified by tableId from the dataset.

Parameters:
  • table (Dict[str, Any]) –

  • session (Optional[requests.Session]) –

  • timeout (int) –

Return type:

Dict[str, Any]

async delete(session=None, timeout=60)

Deletes the table specified by tableId from the dataset.

Parameters:
  • session (Optional[requests.Session]) –

  • timeout (int) –

Return type:

Dict[str, Any]

async get(session=None, timeout=60)

Gets the specified table resource by table ID.

Parameters:
  • session (Optional[requests.Session]) –

  • timeout (int) –

Return type:

Dict[str, Any]

async insert(rows, skip_invalid=False, ignore_unknown=True, session=None, template_suffix=None, timeout=60, *, insert_id_fn=None)

Streams data into BigQuery

By default, each row is assigned a unique insertId. This can be customized by supplying an insert_id_fn which takes a row and returns an insertId.

In cases where at least one row has successfully been inserted and at least one row has failed to be inserted, the Google API will return a 2xx (successful) response along with an insertErrors key in the response JSON containing details on the failing rows.

Parameters:
  • rows (List[Dict[str, Any]]) –

  • skip_invalid (bool) –

  • ignore_unknown (bool) –

  • session (Optional[requests.Session]) –

  • template_suffix (Optional[str]) –

  • timeout (int) –

  • insert_id_fn (Optional[Callable[[Dict[str, Any]], str]]) –

Return type:

Dict[str, Any]

async insert_via_copy(destination_project, destination_dataset, destination_table, session=None, timeout=60)

Copy BQ table to another table in BQ

Parameters:
  • destination_project (str) –

  • destination_dataset (str) –

  • destination_table (str) –

  • session (Optional[requests.Session]) –

  • timeout (int) –

Return type:

bigquery.job.Job

async insert_via_load(source_uris, session=None, autodetect=False, source_format=SourceFormat.CSV, write_disposition=Disposition.WRITE_TRUNCATE, timeout=60, ignore_unknown_values=False, schema_update_options=None)

Loads entities from storage to BigQuery.

Parameters:
Return type:

bigquery.job.Job

async insert_via_query(query, session=None, write_disposition=Disposition.WRITE_EMPTY, timeout=60, use_query_cache=True, dry_run=False)

Create table as a result of the query

Parameters:
  • query (str) –

  • session (Optional[requests.Session]) –

  • write_disposition (bigquery.bigquery.Disposition) –

  • timeout (int) –

  • use_query_cache (bool) –

  • dry_run (bool) –

Return type:

bigquery.job.Job

async list_tabledata(session=None, timeout=60, params=None)

List the content of a table in rows.

Parameters:
  • session (Optional[requests.Session]) –

  • timeout (int) –

  • params (Optional[Dict[str, Any]]) –

Return type:

Dict[str, Any]

bigquery.query_response_to_dict(response)

Convert a query response to a dictionary.

API responses for job queries are packed into a difficult-to-use format. This method deserializes a response into a List of rows, with each row being a dictionary of field names to the row’s value.

This method also handles converting the values according to the schema defined in the response (eg. into builtin python types).

Parameters:

response (Dict[str, Any]) –

Return type:

List[Dict[str, Any]]

bigquery.__version__