bigquery
¶
This library implements various methods for working with the Google Bigquery APIs.
Installation¶
$ pip install --upgrade gcloud-aio-bigquery
Usage¶
We’re still working on documentation – for now, you can use the smoke test as an example.
Emulators¶
For testing purposes, you may want to use gcloud-aio-bigquery
along with a
local emulator. Setting the $BIGQUERY_EMULATOR_HOST
environment variable to
the address of your emulator should be enough to do the trick.
Submodules¶
Package Contents¶
Classes¶
Create a collection of name/value pairs. |
|
Create a collection of name/value pairs. |
|
Create a collection of name/value pairs. |
|
Functions¶
|
Convert a query response to a dictionary. |
Attributes¶
- class bigquery.Disposition(*args, **kwds)¶
Bases:
enum.Enum
Create a collection of name/value pairs.
Example enumeration:
>>> class Color(Enum): ... RED = 1 ... BLUE = 2 ... GREEN = 3
Access them by:
attribute access:
>>> Color.RED <Color.RED: 1>
value lookup:
>>> Color(1) <Color.RED: 1>
name lookup:
>>> Color['RED'] <Color.RED: 1>
Enumerations can be iterated over, and know how many members they have:
>>> len(Color) 3
>>> list(Color) [<Color.RED: 1>, <Color.BLUE: 2>, <Color.GREEN: 3>]
Methods can be added to enumerations, and members can have their own attributes – see the documentation for details.
- WRITE_APPEND = 'WRITE_APPEND'¶
- WRITE_EMPTY = 'WRITE_EMPTY'¶
- WRITE_TRUNCATE = 'WRITE_TRUNCATE'¶
- class bigquery.SchemaUpdateOption(*args, **kwds)¶
Bases:
enum.Enum
Create a collection of name/value pairs.
Example enumeration:
>>> class Color(Enum): ... RED = 1 ... BLUE = 2 ... GREEN = 3
Access them by:
attribute access:
>>> Color.RED <Color.RED: 1>
value lookup:
>>> Color(1) <Color.RED: 1>
name lookup:
>>> Color['RED'] <Color.RED: 1>
Enumerations can be iterated over, and know how many members they have:
>>> len(Color) 3
>>> list(Color) [<Color.RED: 1>, <Color.BLUE: 2>, <Color.GREEN: 3>]
Methods can be added to enumerations, and members can have their own attributes – see the documentation for details.
- ALLOW_FIELD_ADDITION = 'ALLOW_FIELD_ADDITION'¶
- ALLOW_FIELD_RELAXATION = 'ALLOW_FIELD_RELAXATION'¶
- bigquery.SCOPES = ['https://www.googleapis.com/auth/bigquery.insertdata', 'https://www.googleapis.com/auth/bigquery']¶
- class bigquery.SourceFormat(*args, **kwds)¶
Bases:
enum.Enum
Create a collection of name/value pairs.
Example enumeration:
>>> class Color(Enum): ... RED = 1 ... BLUE = 2 ... GREEN = 3
Access them by:
attribute access:
>>> Color.RED <Color.RED: 1>
value lookup:
>>> Color(1) <Color.RED: 1>
name lookup:
>>> Color['RED'] <Color.RED: 1>
Enumerations can be iterated over, and know how many members they have:
>>> len(Color) 3
>>> list(Color) [<Color.RED: 1>, <Color.BLUE: 2>, <Color.GREEN: 3>]
Methods can be added to enumerations, and members can have their own attributes – see the documentation for details.
- AVRO = 'AVRO'¶
- CSV = 'CSV'¶
- DATASTORE_BACKUP = 'DATASTORE_BACKUP'¶
- NEWLINE_DELIMITED_JSON = 'NEWLINE_DELIMITED_JSON'¶
- ORC = 'ORC'¶
- PARQUET = 'PARQUET'¶
- class bigquery.Dataset(dataset_name=None, project=None, service_file=None, session=None, token=None, api_root=None)¶
Bases:
bigquery.bigquery.BigqueryBase
- Parameters:
dataset_name (Optional[str]) –
project (Optional[str]) –
service_file (Optional[Union[str, IO[AnyStr]]]) –
session (Optional[requests.Session]) –
token (Optional[gcloud.aio.auth.Token]) –
api_root (Optional[str]) –
- async list_tables(session=None, timeout=60, params=None)¶
List tables in a dataset.
- Parameters:
session (Optional[requests.Session]) –
timeout (int) –
params (Optional[Dict[str, Any]]) –
- Return type:
Dict[str, Any]
- async list_datasets(session=None, timeout=60, params=None)¶
List datasets in current project.
- Parameters:
session (Optional[requests.Session]) –
timeout (int) –
params (Optional[Dict[str, Any]]) –
- Return type:
Dict[str, Any]
- async get(session=None, timeout=60, params=None)¶
Get a specific dataset in current project.
- Parameters:
session (Optional[requests.Session]) –
timeout (int) –
params (Optional[Dict[str, Any]]) –
- Return type:
Dict[str, Any]
- async insert(dataset, session=None, timeout=60)¶
Create datasets in current project.
- Parameters:
dataset (Dict[str, Any]) –
session (Optional[requests.Session]) –
timeout (int) –
- Return type:
Dict[str, Any]
- async delete(dataset_name=None, session=None, timeout=60)¶
Delete datasets in current project.
- Parameters:
dataset_name (Optional[str]) –
session (Optional[requests.Session]) –
timeout (int) –
- Return type:
Dict[str, Any]
- class bigquery.Job(job_id=None, project=None, service_file=None, session=None, token=None, api_root=None, location=None)¶
Bases:
bigquery.bigquery.BigqueryBase
- Parameters:
job_id (Optional[str]) –
project (Optional[str]) –
service_file (Optional[Union[str, IO[AnyStr]]]) –
session (Optional[requests.Session]) –
token (Optional[gcloud.aio.auth.Token]) –
api_root (Optional[str]) –
location (Optional[str]) –
- static _make_query_body(query, write_disposition, use_query_cache, dry_run, use_legacy_sql, destination_table)¶
- Parameters:
query (str) –
write_disposition (bigquery.bigquery.Disposition) –
use_query_cache (bool) –
dry_run (bool) –
use_legacy_sql (bool) –
destination_table (Optional[Any]) –
- Return type:
Dict[str, Any]
- _config_params(params=None)¶
- Parameters:
params (Optional[Dict[str, Any]]) –
- Return type:
Dict[str, Any]
- async get_job(session=None, timeout=60)¶
Get the specified job resource by job ID.
- Parameters:
session (Optional[requests.Session]) –
timeout (int) –
- Return type:
Dict[str, Any]
- async get_query_results(session=None, timeout=60, params=None)¶
Get the specified jobQueryResults by job ID.
- Parameters:
session (Optional[requests.Session]) –
timeout (int) –
params (Optional[Dict[str, Any]]) –
- Return type:
Dict[str, Any]
- async cancel(session=None, timeout=60)¶
Cancel the specified job by job ID.
- Parameters:
session (Optional[requests.Session]) –
timeout (int) –
- Return type:
Dict[str, Any]
- async query(query_request, session=None, timeout=60)¶
Runs a query synchronously and returns query results if completes within a specified timeout.
- Parameters:
query_request (Dict[str, Any]) –
session (Optional[requests.Session]) –
timeout (int) –
- Return type:
Dict[str, Any]
- async insert(job, session=None, timeout=60)¶
Insert a new asynchronous job.
- Parameters:
job (Dict[str, Any]) –
session (Optional[requests.Session]) –
timeout (int) –
- Return type:
Dict[str, Any]
- async insert_via_query(query, session=None, write_disposition=Disposition.WRITE_EMPTY, timeout=60, use_query_cache=True, dry_run=False, use_legacy_sql=True, destination_table=None)¶
Create table as a result of the query
- Parameters:
query (str) –
session (Optional[requests.Session]) –
write_disposition (bigquery.bigquery.Disposition) –
timeout (int) –
use_query_cache (bool) –
dry_run (bool) –
use_legacy_sql (bool) –
destination_table (Optional[Any]) –
- Return type:
Dict[str, Any]
- async result(session=None)¶
- Parameters:
session (Optional[requests.Session]) –
- Return type:
Dict[str, Any]
- async delete(session=None, job_id=None, timeout=60)¶
Delete the specified job by job ID.
- Parameters:
session (Optional[requests.Session]) –
job_id (Optional[str]) –
timeout (int) –
- Return type:
Dict[str, Any]
- class bigquery.Table(dataset_name, table_name, project=None, service_file=None, session=None, token=None, api_root=None)¶
Bases:
bigquery.bigquery.BigqueryBase
- Parameters:
dataset_name (str) –
table_name (str) –
project (Optional[str]) –
service_file (Optional[Union[str, IO[AnyStr]]]) –
session (Optional[requests.Session]) –
token (Optional[gcloud.aio.auth.Token]) –
api_root (Optional[str]) –
- static _mk_unique_insert_id(row)¶
- Parameters:
row (Dict[str, Any]) –
- Return type:
str
- _make_copy_body(source_project, destination_project, destination_dataset, destination_table)¶
- Parameters:
source_project (str) –
destination_project (str) –
destination_dataset (str) –
destination_table (str) –
- Return type:
Dict[str, Any]
- static _make_insert_body(rows, *, skip_invalid, ignore_unknown, template_suffix, insert_id_fn)¶
- Parameters:
rows (List[Dict[str, Any]]) –
skip_invalid (bool) –
ignore_unknown (bool) –
template_suffix (Optional[str]) –
insert_id_fn (Callable[[Dict[str, Any]], str]) –
- Return type:
Dict[str, Any]
- _make_load_body(source_uris, project, autodetect, source_format, write_disposition, ignore_unknown_values, schema_update_options)¶
- Parameters:
source_uris (List[str]) –
project (str) –
autodetect (bool) –
source_format (bigquery.bigquery.SourceFormat) –
write_disposition (bigquery.bigquery.Disposition) –
ignore_unknown_values (bool) –
schema_update_options (List[bigquery.bigquery.SchemaUpdateOption]) –
- Return type:
Dict[str, Any]
- _make_query_body(query, project, write_disposition, use_query_cache, dry_run)¶
- Parameters:
query (str) –
project (str) –
write_disposition (bigquery.bigquery.Disposition) –
use_query_cache (bool) –
dry_run (bool) –
- Return type:
Dict[str, Any]
- async create(table, session=None, timeout=60)¶
Create the table specified by tableId from the dataset.
- Parameters:
table (Dict[str, Any]) –
session (Optional[requests.Session]) –
timeout (int) –
- Return type:
Dict[str, Any]
- async patch(table, session=None, timeout=60)¶
Patch an existing table specified by tableId from the dataset.
- Parameters:
table (Dict[str, Any]) –
session (Optional[requests.Session]) –
timeout (int) –
- Return type:
Dict[str, Any]
- async delete(session=None, timeout=60)¶
Deletes the table specified by tableId from the dataset.
- Parameters:
session (Optional[requests.Session]) –
timeout (int) –
- Return type:
Dict[str, Any]
- async get(session=None, timeout=60)¶
Gets the specified table resource by table ID.
- Parameters:
session (Optional[requests.Session]) –
timeout (int) –
- Return type:
Dict[str, Any]
- async insert(rows, skip_invalid=False, ignore_unknown=True, session=None, template_suffix=None, timeout=60, *, insert_id_fn=None)¶
Streams data into BigQuery
By default, each row is assigned a unique insertId. This can be customized by supplying an insert_id_fn which takes a row and returns an insertId.
In cases where at least one row has successfully been inserted and at least one row has failed to be inserted, the Google API will return a 2xx (successful) response along with an insertErrors key in the response JSON containing details on the failing rows.
- Parameters:
rows (List[Dict[str, Any]]) –
skip_invalid (bool) –
ignore_unknown (bool) –
session (Optional[requests.Session]) –
template_suffix (Optional[str]) –
timeout (int) –
insert_id_fn (Optional[Callable[[Dict[str, Any]], str]]) –
- Return type:
Dict[str, Any]
- async insert_via_copy(destination_project, destination_dataset, destination_table, session=None, timeout=60)¶
Copy BQ table to another table in BQ
- Parameters:
destination_project (str) –
destination_dataset (str) –
destination_table (str) –
session (Optional[requests.Session]) –
timeout (int) –
- Return type:
- async insert_via_load(source_uris, session=None, autodetect=False, source_format=SourceFormat.CSV, write_disposition=Disposition.WRITE_TRUNCATE, timeout=60, ignore_unknown_values=False, schema_update_options=None)¶
Loads entities from storage to BigQuery.
- Parameters:
source_uris (List[str]) –
session (Optional[requests.Session]) –
autodetect (bool) –
source_format (bigquery.bigquery.SourceFormat) –
write_disposition (bigquery.bigquery.Disposition) –
timeout (int) –
ignore_unknown_values (bool) –
schema_update_options (Optional[List[bigquery.bigquery.SchemaUpdateOption]]) –
- Return type:
- async insert_via_query(query, session=None, write_disposition=Disposition.WRITE_EMPTY, timeout=60, use_query_cache=True, dry_run=False)¶
Create table as a result of the query
- Parameters:
query (str) –
session (Optional[requests.Session]) –
write_disposition (bigquery.bigquery.Disposition) –
timeout (int) –
use_query_cache (bool) –
dry_run (bool) –
- Return type:
- async list_tabledata(session=None, timeout=60, params=None)¶
List the content of a table in rows.
- Parameters:
session (Optional[requests.Session]) –
timeout (int) –
params (Optional[Dict[str, Any]]) –
- Return type:
Dict[str, Any]
- bigquery.query_response_to_dict(response)¶
Convert a query response to a dictionary.
API responses for job queries are packed into a difficult-to-use format. This method deserializes a response into a List of rows, with each row being a dictionary of field names to the row’s value.
This method also handles converting the values according to the schema defined in the response (eg. into builtin python types).
- Parameters:
response (Dict[str, Any]) –
- Return type:
List[Dict[str, Any]]
- bigquery.__version__¶