IdaDataBase

Connect to Netezza

class nzpyida.base.IdaDataBase(dsn, uid='', pwd='', autocommit=True, verbose=False)[source]

An IdaDataBase instance represents a reference to a remote Netezza Warehouse database. This is an abstraction layer for the remote connection. The IdaDataBase interface provides several functions that enable basic database administration in pythonic syntax.

You can use either ODBC or JDBC or NZPY to connect to the database. The default connection type is ODBC, which is the standard connection type for Windows users. To establish an ODBC connection, download an IBM Netezza driver and set up your ODBC connection by specifying your connection protocol, port, and hostname. An ODBC connection on Linux or Mac might require more settings. For more information about how to establish an ODBC connection, see the pypyodbc documentation.

To connect with JDBC, install the optional external package jaydebeapi, download the ibm jdbc driver, and save it in your local nzpyida folder. If you put the jdbc driver in the CLASSPATH variable or the folder that contains it, it will work too. A C++ compiler adapted to the current python version, operating system, and architecture may also be required to install jaydebeapi.

To connect with NZPY, specify the required parameters in dict format, there is no need to install any drivers.

The instantiation of an IdaDataBase object is a mandatory step before creating IdaDataFrame objects because IdaDataFrames require an IdaDataBase as a parameter to be initialized. By convention, use only one instance of IdaDataBase per database. However, you can use several instances of IdaDataFrame per connection.

Methods

`add_column`(idadf[, column, colname, ncat])	Add physically a column to the dataset.
`add_column_id`(idadf[, column_id, destructive])	Add an ID column to an IdaDataFrame.
`append`(idadf, df[, maxnrow])	Append rows of a DataFrame to an IdaDataFrame.
`as_idadataframe`(dataframe[, tablename, ...])	Upload a dataframe and return its corresponding IdaDataFrame.
`close`()	Close the IdaDataBase connection.
`commit`()	Commit operations in the database.
`delete_column`(idadf, column_name[, destructive])	Delete a column in an idaDataFrame.
`drop_model`(modelname)	Drop a model in the database.
`drop_table`(tablename)	Drop a table in the database.
`drop_view`(viewname)	Drop a view in the database.
`exists_model`(modelname)	Check if a model exists in self.
`exists_table`(tablename)	Check if a table exists in self.
`exists_table_or_view`(objectname)	Check if a table or view exists in self.
`exists_view`(viewname)	Check if a view exists in self.
`ida_query`(query[, silent, first_row_only, ...])	Prepare, execute and format the result of a query in a dataframe or in a Tuple.
`ida_scalar_query`(query[, silent, autocommit])	Prepare and execute a query and return only the first element as a string.
`is_model`(modelname)	Check if an object is a model in self.
`is_table`(tablename)	Check if an object is a table in self.
`is_table_or_view`(objectname)	Check if an object is a table or a view in self.
`is_view`(viewname)	Check if an object is a view in self.
`reconnect`()	Try to reopen the connection.
`rename`(idadf, newname)	Rename a table referenced by an IdaDataFrame in Db2 Warehouse.
`rollback`()	Rollback operations in the database.
`show_models`()	Show models that are available in the database.
`show_tables`([show_all])	Show tables and views that are available in self.
`to_def_case`(text)	Converts the given object to the default case on this database.

__init__(dsn, uid='', pwd='', autocommit=True, verbose=False)[source]

Open a database connection.

Parameters:

dsnstr: Data Source Name (as specified in your ODBC settings) or JDBC URL string or NZPY dict
uidstr, optional: User ID.
pwdstr, optional: User password.
autocommitbool, default: True: If True, automatically commits all operations.
verbosebool, defaukt: True: If True, prints all SQL requests that are sent to the database.

Returns:

IdaDataBase object

Raises:

ImportError

JayDeBeApi is not installed.

IdaDataBaseError

uid and pwd are defined both in uid, pwd parameters and dsn.
The ‘db2jcc4.jar’ file is not in the ibmdbpy site-package repository.

Examples

ODBC connection, userID and password are stored in ODBC settings:

>>> IdaDataBase(dsn="BLUDB") # ODBC Connection
<ibmdbpy.base.IdaDataBase at 0x9bec860>

ODBC connection, userID and password are not stored in ODBC settings:

>>> IdaDataBase(dsn="BLUDB", uid="<UID>", pwd="<PWD>")
<ibmdbpy.base.IdaDataBase at 0x9bec860>

JDBC connection, full JDBC string:

>>> jdbc='jdbc:db2://<HOST>:<PORT>/<DBNAME>:user=<UID>;password=<PWD>'
>>> IdaDataBase(dsn=jdbc)
<ibmdbpy.base.IdaDataBase at 0x9bec860>

JDBC connectiom, JDBC string and seperate userID and password:

>>> jdbc = 'jdbc:db2://<HOST>:<PORT>/<DBNAME>'
>>> IdaDataBase(dsn=jdbc, uid="<UID>", pwd="<PWD>")
<ibmdbpy.base.IdaDataBase at 0x9bec860>

NZPY connection, userID and password are stored in NZPY dict:

nzpy_dsn ={

“database”:”<DBNAME>”,: “port” :<PORT>,

“host” : “<HOST>”, “securityLevel”:<SECURITYLEVEL>, “logLevel”:<LOGLEVEL>, “user”:”<UID>”, “password”:”<PWD>” }

>>> IdaDataBase(dsn=nzpy_dsn)
<ibmdbpy.base.IdaDataBase at 0x9bec860>

NZPY connection, NZPY dict and seperate userID and password

nzpy_dsn ={

“database”:”<DBNAME>”,: “port” :<PORT>,

“host” : “<HOST>”, “securityLevel”:<SECURITYLEVEL>, “logLevel”:<LOGLEVEL>

}

>>> IdaDataBase(dsn=nzpy_dsn, uid="<UID>", pwd="<PWD>")

Attributes:

data_source_namestr: Name of the referring DataBase.
_con_typestr: Type of the connection, either ‘odbc’ or ‘jdbc’ or ‘nzpy’.
_connection_stringstr: Connection string use for connecting via ODBC or JDBC or NZPY.
_conconnection object: Connection object to the remote Database.
_database_system: str: Underlying database system, either ‘db2’ or ‘netezza’
_database_name: str: The name of the database the application is connected to.
_idadfslist: List of IdaDataFrame objects opened under this connection.

Methods

DataBase Exploration

current_schema

IdaDataBase.current_schema()[source]

Get the current user schema name as a string.

Returns:

str: User’s schema name.

Examples

>>> idadb.current_schema()
'DASHXXXXXX'

show_tables

IdaDataBase.show_tables(show_all=False)[source]

Show tables and views that are available in self. By default, this function shows only tables that belong to a user’s specific schema.

Parameters:

show_allbool: If True, all table and view names in the database are returned, not only those that belong to the user’s schema.

Returns:

DataFrame: A data frame containing tables and views names in self with some additional information (TABSCHEMA, TABNAME, OWNER, TYPE).

Notes

show_tables implements a cache strategy. The cache is stored when the user calls the method with the argument show_all set to True. This improves performance because database table look ups are a very common operation. The cache gets updated each time a table or view is created or refreshed, each time a table or view is deleted, or when a new IdaDataFrame is opened.

Examples

>>> ida_db.show_tables()
     TABSCHEMA           TABNAME       OWNER TYPE
0    DASHXXXXXX            SWISS  DASHXXXXXX    T
1    DASHXXXXXX             IRIS  DASHXXXXXX    T
2    DASHXXXXXX     VIEW_TITANIC  DASHXXXXXX    V
...
>>> ida_db.show_tables(show_all = True)
     TABSCHEMA           TABNAME       OWNER TYPE
0    DASHXXXXXX            SWISS  DASHXXXXXX    T
1    DASHXXXXXX             IRIS  DASHXXXXXX    T
2    DASHXXXXXX     VIEW_TITANIC  DASHXXXXXX    V
2      SYSTOOLS      IDAX_MODELS  DASH101631    A
...

show_models

IdaDataBase.show_models()[source]

Show models that are available in the database.

Returns:

DataFrame

Examples

>>> idadb.show_models()
    MODELSCHEMA               MODELNAME       OWNER
0   DASHXXXXXX  KMEANS_10857_1434974511  DASHXXXXXX
1   DASHXXXXXX  KMEANS_11726_1434977692  DASHXXXXXX
2   DASHXXXXXX  KMEANS_11948_1434976568  DASHXXXXXX

exists_table_or_view

IdaDataBase.exists_table_or_view(objectname)[source]

Check if a table or view exists in self.

Parameters:

objectnamestr: Name of the table or view to check.

Returns:

bool

Raises:

TypeError: The object exists but is not of the expected type.

Examples

>>> idadb.exists_table_or_view("NOT_EXISTING")
False
>>> idadb.exists_table_or_view("TABLE_OR_VIEW")
True
>>> idadb.exists_table_or_view("NO_TABLE_NOR_VIEW")
TypeError : "NO_TABLE_NOR_VIEW" exists in schema '?' but of type '?'

exists_table

IdaDataBase.exists_table(tablename)[source]

Check if a table exists in self.

Parameters:

tablenamestr: Name of the table to check.

Returns:

bool

Raises:

TypeError: The object exists but is not of the expected type.

Examples

>>> idadb.exists_table("NOT_EXISTING")
False
>>> idadb.exists_table("TABLE")
True
>>> idadb.exists_table("NO_TABLE")
TypeError : "tablename" exists in schema "?" but of type '?'

exists_view

IdaDataBase.exists_view(viewname)[source]

Check if a view exists in self.

Parameters:

viewnamestr: Name of the view to check.

Returns:

bool

Raises:

TypeError: The object exists but is not of the expected type.

Examples

>>> idadb.exists_view("NOT_EXISTING")
False
>>> idadb.exists_view("VIEW")
True
>>> idadb.exists_view("NO_VIEW")
TypeError : "viewname" exists in schema "?" but of type '?'

exists_model

IdaDataBase.exists_model(modelname)[source]

Check if a model exists in self.

Parameters:

modelnamestr: Name of the model to check. It should contain only alphanumeric characters and underscores. All lower case characters will be converted to upper case characters.

Returns:

bool

Raises:

TypeError: The object exists but is not of the expected type.

Examples

>>> idadb.exists_model("MODEL")
True
>>> idadb.exists_model("NOT_EXISTING")
False
>>> idadb.exists_model("NO_MODEL")
TypeError : NO_MODEL exists but is not a model (of type '?')

is_table_or_view

IdaDataBase.is_table_or_view(objectname)[source]

Check if an object is a table or a view in self.

Parameters:

objectnamestr: Name of the object to check.

Returns:

bool

Raises:

ValueError: objectname doesn’t exist in the database.

Examples

>>> idadb.is_table_or_view("NO_TABLE")
False
>>> idadb.is_table_or_view("TABLE")
True
>>> idadb.is_table_or_view("NOT_EXISTING")
ValueError : NO_EXISTING does not exist in database

is_table

IdaDataBase.is_table(tablename)[source]

Check if an object is a table in self.

Parameters:

tablenamestr: Name of the table to check.

Returns:

bool

Raises:

ValueError: The object doesn’t exist in the database.

Examples

>>> idadb.is_table("NO_TABLE")
False
>>> idadb.is_table("TABLE")
True
>>> idadb.is_table("NOT_EXISTING")
ValueError : NO_EXISTING does not exist in database

is_view

IdaDataBase.is_view(viewname)[source]

Check if an object is a view in self.

Parameters:

viewnamestr: Name of the view to check.

Returns:

bool

Raises:

ValueError: The object doesn’t exist in the database.

Examples

>>> idadb.is_view("NO_VIEW")
False
>>> idadb.is_view("VIEW")
True
>>> idadb.is_view("NOT_EXISTING")
ValueError : NO_EXISTING does not exist in database

is_model

IdaDataBase.is_model(modelname)[source]

Check if an object is a model in self.

Parameters:

modelnamestr: Name of the model to check. It should contain only alphanumeric characters and underscores. All lower case characters will be converted to upper case characters.

Returns:

bool

Raises:

ValueError: The object doesn’t exist in the database.

Examples

>>> idadb.is_model("MODEL")
True
>>> idadb.is_model("NO_MODEL")
False
>>> idadb.is_model("NOT_EXISTING")
ValueError : NOT_EXISTING doesn't exist in database

ida_query

IdaDataBase.ida_query(query, silent=False, first_row_only=False, autocommit=False)[source]

Prepare, execute and format the result of a query in a dataframe or in a Tuple. If nothing is expected to be returned for the SQL command, nothing is returned.

Parameters:

querystr: Query to be executed.
silent: bool, default: False: If True, the query is not printed in the python console even if the verbosity mode is activated (VERBOSE environment variable is equal to “True”).
first_row_onlybool, default: False: If True, only the first row of the result is returned as a Tuple.
autocommit: bool, default: False: If True, the autocommit function is available.

Returns:

DataFrame or Tuple (if first_row_only=False)

Notes

If first_row_only argument is True, then even if the actual result of the query is composed of several rows, only the first row will be returned.

Examples

>>> idadb.ida_query("SELECT * FROM IRIS LIMIT 5")
   sepal_length  sepal_width  petal_length  petal_width species
         5.1          3.5           1.4          0.2  setosa
         4.9          3.0           1.4          0.2  setosa
         4.7          3.2           1.3          0.2  setosa
         4.6          3.1           1.5          0.2  setosa
         5.0          3.6           1.4          0.2  setosa

>>> idadb.ida_query("SELECT COUNT(*) FROM IRIS")
(150, 150, 150, 150)

ida_scalar_query

IdaDataBase.ida_scalar_query(query, silent=False, autocommit=False)[source]

Prepare and execute a query and return only the first element as a string. If nothing is returned from the SQL query, an error occurs.

Parameters:

querystr: Query to be executed.
silent: bool, default: False: If True, the query will not be printed in python console even if verbosity mode is activated.
autocommit: bool, default: False: If True, the autocommit function is available.

Returns:

str or Number

Notes

Even if the actual result of the query is composed of several columns and several rows, only the first element (top-left) will be returned.

Examples

>>> idadb.ida_scalar_query("SELECT TRIM(CURRENT_SCHEMA) from SYSIBM.SYSDUMMY1")
'DASHXXXXX'

Upload DataFrames

as_idadataframe

IdaDataBase.as_idadataframe(dataframe, tablename=None, clear_existing=False, primary_key=None, indexer=None)[source]

Upload a dataframe and return its corresponding IdaDataFrame. The target table (tablename) will be created or replaced if the option clear_existing is set to True.

To add data to an existing tables, see IdaDataBase.append

Parameters:

dataframeDataFrame: Data to be uploaded, contained in a Pandas DataFrame.
tablenamestr, optional: Name to be given to the table created in the database. It should contain only alphanumeric characters and underscores. All lower case characters will be converted to upper case characters. If not given, a valid tablename is generated (for example, DATA_FRAME_X where X is a random number).
clear_existingbool: If set to True, a table will be replaced when a table with the same name already exists in the database.
primary_keystr: Name of a column to be used as primary key.

Returns:

IdaDataFrame

Raises:

TypeError

Argument dataframe is not of type pandas.DataFrame.
The primary key argument is not a string.

NameError

The name already exists in the database and clear_existing is False.
The primary key argument doesn’t correspond to a column.

PrimaryKeyError

The primary key contains non unique values.

Notes

This function is not intended to be used to add data to an existing table, rather to create a new table from a dataframe. To add data to an existing table, please consider using IdaDataBase.append

Examples

>>> from nzpyida.sampledata.iris import iris
>>> idadb.as_idadataframe(iris, "IRIS")
<ibmdbpy.frame.IdaDataFrame at 0xb34a898>
>>> idadb.as_idadataframe(iris, "IRIS")
NameError: IRIS already exists, choose a different name or use clear_existing option.
>>> idadb.as_idadataframe(iris, "IRIS2")
<ibmdbpy.frame.IdaDataFrame at 0xb375940>
>>> idadb.as_idadataframe(iris, "IRIS", clear_existing = True)
<ibmdbpy.frame.IdaDataFrame at 0xb371cf8>

Delete DataBase Objects

drop_table

IdaDataBase.drop_table(tablename)[source]

Drop a table in the database.

Parameters:

tablenamestr: Name of the table to drop.

Raises:

ValueError: If the object does not exist.
TypeError: If the object is not a table.

Notes

This operation cannot be undone if autocommit mode is activated.

Examples

>>> idadb.drop_table("TABLE")
True
>>> idadb.drop_table("NO_TABLE")
TypeError : NO_TABLE  exists in schema '?' but of type '?'
>>> idadb.drop_table("NOT_EXISTING")
ValueError : NO_EXISTING doesn't exist in database

drop_view

IdaDataBase.drop_view(viewname)[source]

Drop a view in the database.

Parameters:

viewnamestr: Name of the view to drop.

Raises:

ValueError: If the object does not exist.
TypeError: If the object is not a view.

Notes

This operation cannot be undone if autocommit mode is activated.

Examples

>>> idadb.drop_view("VIEW")
True
>>> idadb.drop_view("NO_VIEW")
TypeError : NO_VIEW exists in schema '?' but of type '?'
>>> idadb.drop_view("NOT_EXISTING")
ValueError : NO_EXISTING doesn't exist in database

drop_model

IdaDataBase.drop_model(modelname)[source]

Drop a model in the database.

Parameters:

modelnamestr: Name of the model to drop.

Raises:

ValueError: If the object does not exist.
TypeError: if the object exists but is not a model.

Notes

This operation cannot be undone if autocommit mode is activated.

Examples

>>> idadb.drop_model("MODEL")
True
>>> idadb.drop_model("NO_MODEL")
TypeError : NO_MODEL exists in schema '?' but of type '?'
>>> idadb.drop_model("NOT_EXISTING")
ValueError : NOT_EXISTING does not exist in database

DataBase Modification

rename

IdaDataBase.rename(idadf, newname)[source]

Rename a table referenced by an IdaDataFrame in Db2 Warehouse.

Parameters:

idadfIdaDataFrame: IdaDataFrame object referencing the table to rename.
newnamestr: Name to be given to self. It should contain only alphanumeric characters and underscores. All lower case characters will be converted to upper case characters. The new name should not already exist in the database.

Raises:

ValueError: The new tablename is not valid.
TypeError: Rename function is supported only for table type.
NameError: The name of the object to be created is identical to an existing name.

Notes

Upper case characters and numbers, optionally separated by underscores “_”, are valid characters.

add_column_id

IdaDataBase.add_column_id(idadf, column_id='ID', destructive=False)[source]

Add an ID column to an IdaDataFrame.

Parameters:

idadfIdaDataFrame: IdaDataFrame object to which an ID column will be added
column_idstr: Name of the ID column to add
destructivebool: If set to True, the column will be added phisically in the database. This can take time. If set to False, the column will be added virtually in a view and a new IdaDataFrame is returned.

Raises:

TypeError: idadf is not an IdaDataFrame.
ValueError: The given column name already exists in the DataBase.

Notes

The non-destructive creation of column IDs is not reliable, because row IDs are recalculated on the fly in a non-deterministic way each time a new view is produced. On the contrary, creating them destructively i.e physically is reliable but can take time. If no sorting has been done whatsoever before, row IDs will be created at random. Improvement idea: create ID columns in a non-destructive way and base them on the sorting of a set of columns, defined by the user, or all columns if no column combination results in unique identifiers.

delete_column

IdaDataBase.delete_column(idadf, column_name, destructive=False)[source]

Delete a column in an idaDataFrame.

Parameters:

idadfIdaDataFrame: The IdaDataframe in which a column should be deleted.
column_namestr: Name of the column to delete.
destructivebool: If set to True, the column is deleted in the database. Otherwise, it is deleted virtually, creating a view for the IdaDataFrame.

Raises:

TypeError: column_name should be a string.
ValueError: column_name refers to a column that doesn’t exist in self.

append

IdaDataBase.append(idadf, df, maxnrow=None)[source]

Append rows of a DataFrame to an IdaDataFrame. The DataFrame must have the same structure (same column names and datatypes). Optionally, the DataFrame to be added can be splitted into several chunks. This improves performance and prevents SQL overflows. By default, chunks are limited to 100.000 cells.

Parameters:

idadfIdaDataFrame: IdaDataFrame that receives data from dataframe df.
dfDataFrame: Dataframe whose rows are added to IdaDataFrame idadf.
maxnrowint, optional: number corresponding to the maximum number of rows for each chunks.

Raises:

TypeError

maxnrow should be an interger.
Argument idadf should be an IdaDataFrame.
Argument df should be a pandas DataFrame.

ValueErrpr

maxnrow should be greater than 1 or nleft blank.
Other should be a Pandas DataFrame.
Other dataframe has not the same number of columns as self.
Some columns in other have different names that are different from the names of the columns in self.

Connection Management

commit

IdaDataBase.commit()[source]

Commit operations in the database.

Notes

All changes that are made in the database after the last commit, including those in the child IdaDataFrames, are commited.

If the environment variable ‘VERBOSE’ is set to True, the commit operations are notified in the console.

rollback

IdaDataBase.rollback()[source]

Rollback operations in the database.

Notes

All changes that are made in the database after the last commit, including those in the child IdaDataFrames, are discarded.

close

IdaDataBase.close()[source]

Close the IdaDataBase connection.

Notes

If the environment variable ‘AUTOCOMMIT’ is set to True, then all changes after the last commit are committed, otherwise they are discarded.

reconnect

IdaDataBase.reconnect()[source]: Try to reopen the connection.