1. Data Types
A long time ago at university, I learned my first high-level programming language, Pascal, from the book by Niklaus Wirth:
Algorithms + Data Structures = Programs
But I didn’t really learn much about databases studying computer science, as they were taught by another department (business systems). So maybe now’s the time to start with an imaginary book called something like:
Queries + Data Types = Databases?
Data types have a long and important history in computing, driven initially by word lengths and machine data types but becoming more powerful and abstract as computer science matured. One popular 1980’s magazine was even named after a data type (“BYTE”, early microprocessors such as the 8008, Z80, and 6800 were characterized by BYTE/8-bit word sizes, in an era when the PDP-11 had 16-bit words, the VAX had 32-bit words, and the Cray-1 a massive 64-bit word size).
2. PostgreSQL Data Types
“Data Types” is a popular PostgreSQL search, so I decided to do some investigation of my own into why they are so important. First of all, why do data types matter in PostgreSQL? Doing some preliminary research I found out that data types in PostgreSQL are important for at least the following aspects (possibly more!):
- As column data types when creating a table
- For functions and operators
- For constraints
- For creating types and domains, and
- When using PostgreSQL from a programming language (e.g. PostgreSQL to/from Python, and “C”).
PostgreSQL has a lot of built-in data types that are described in Chapter 8 of the documentation. And you can add new data types, so I guess there are really an infinite number of data types possible.
There’s a table that enumerates at least 43 built-in data types, and reveals that along with the official name some types have aliases (used internally for historical reasons). For example “real” has the alias “float4” (a single precision 4-byte floating-point number).
Here’s the full table which shows the variety of data types available:
Name | Aliases | Description |
bigint | int8 | signed eight-byte integer |
bigserial | serial8 | autoincrementing eight-byte integer |
bit [ (n) ] | fixed-length bit string | |
bit varying [ (n) ] | varbit [ (n) ] | variable-length bit string |
boolean | bool | logical Boolean (true/false) |
box | rectangular box on a plane | |
bytea | binary data (“byte array”) | |
character [ (n) ] | char [ (n) ] | fixed-length character string |
character varying [ (n) ] | varchar [ (n) ] | variable-length character string |
cidr | IPv4 or IPv6 network address | |
circle | circle on a plane | |
date | calendar date (year, month, day) | |
double precision | float8 | double precision floating-point number (8 bytes) |
inet | IPv4 or IPv6 host address | |
integer | int, int4 | signed four-byte integer |
interval [ fields ] [ (p) ] | time span | |
json | textual JSON data | |
jsonb | binary JSON data, decomposed | |
line | infinite line on a plane | |
lseg | line segment on a plane | |
macaddr | MAC (Media Access Control) address | |
macaddr8 | MAC (Media Access Control) address (EUI-64 format) | |
money | currency amount | |
numeric [ (p, s) ] | decimal [ (p, s) ] | exact numeric of selectable precision |
path | geometric path on a plane | |
pg_lsn | PostgreSQL Log Sequence Number | |
pg_snapshot | user-level transaction ID snapshot | |
point | geometric point on a plane | |
polygon | closed geometric path on a plane | |
real | float4 | single precision floating-point number (4 bytes) |
smallint | int2 | signed two-byte integer |
smallserial | serial2 | autoincrementing two-byte integer |
serial | serial4 | autoincrementing four-byte integer |
text | variable-length character string | |
time [ (p) ] [ without time zone ] | time of day (no time zone) | |
time [ (p) ] with time zone | timetz | time of day, including time zone |
timestamp [ (p) ] [ without time zone ] | date and time (no time zone) | |
timestamp [ (p) ] with time zone | timestamptz | date and time, including time zone |
tsquery | text search query | |
tsvector | text search document | |
txid_snapshot | user-level transaction ID snapshot (deprecated; see pg_snapshot) | |
uuid | universally unique identifier | |
xml | XML data |
Table 1: Postgres Data Types (Name, Alias, Description)
But how do you know what you can do with each data type? Chapter 9 documents which functions and operators are applicable to each data type. The documentation also says that each data type has an external representation, which raises the question of what these “external representations” are either in standard SQL data types or for a specific programming language.
3. Using PostgreSQL From Java—the PgJDBC Driver
How do you use PostgreSQL from Java? With JDBC! (Java Database Connectivity). There’s a PostgreSQL JDBC Driver (PgJDBC for short) which allows Java programs to connect using standard, database independent, Java code. It’s an open source Pure Java (Type 4, which talks native PostgreSQL protocol) driver and is well documented.
It’s easy to download PostgreSQL, install it, and start the database server running. You also need to download the JDBC driver.
Connecting to the database is easy from jdbc:
1 |
Connection conn = DriverManager.getConnection("jdbc:postgresql://localhost/?user=name&password=abc&ssl=false"); |
To create tables you need to specify PostgreSQL data types for all the columns. For example, here’s a simple test table with integer id (which is the primary key) and value columns:
1 2 3 4 |
CREATE TABLE test1 ( id integer PRIMARY KEY; value integer; ); |
You can then do simple things in Java/PgJDBC using statements:
1 2 3 4 5 |
Statement st = conn.createStatement(); ResultSet rs = st.executeQuery("INSERT INTO test1 VALUES (1,1000)"); rs = st.executeQuery("SELECT * FROM test1 WHERE id = 1"); while (rs.next()) System.out.print("Result = " + rs.getInt("value")); |
4. Why Do Data Types Matter in JDBC?
For simple Statements, the INSERT statement is just a string, so you don’t need a real data type yet. However, to extract the returned value you do need to know the PostgreSQL type, and also the corresponding Java data type if you are going to do anything more useful than just print it out (i.e. store it, process it):
1 |
int value = rs.getInt("value"); |
However, to use Prepared Statements you need to use the correct data types (lines 2, 3, 6, and 9) for both INSERT and SELECT, otherwise you’ll get a run time SQL exception:
1 2 3 4 5 6 7 8 9 |
PreparedStatement pst1 = conn.prepareStatement("INSERT INTO test1(id, value) VALUES (?, ?)"); <strong>pst1.setInt(1, 2);</strong> <strong>pst1.setInt(2, 2000);</strong> int rowsInserted = pst1.executeUpdate(); PreparedStatement pst2 pst = conn.prepareStatement("SELECT * FROM test1 WHERE id = ?"); <strong>pst2.setInt(1, 2);</strong> rs = pst2.executeQuery(); while (rs.next()) <strong>int value = rs.getInt("value");</strong> |
Note that rs.getString(“value”) also seems to work ok for the basic data types (i.e. automatic casting), which is a feature we’ll find useful below.
5. Mappings From PostgreSQL Data Types to SQL/JDBC Data Types
But how do you know which Java data types the PostgreSQL data types actually map to in advance?
Some searching resulted in the discovery of the getTypeInfo() method in the java.sql.DatabaseMetaData Interface, which is an interface implemented by driver vendors to let users know the capabilities of the database in combination with the JDBC driver. The getTypeInfo() method retrieves a description of all the data types supported by the database, ordered by DATA_TYPE and then by how closely they map to the corresponding JDBC SQL type. There’s potentially lots of useful information available, but I was mainly interested in TYPE_NAME and DATA_TYPE, which is the SQL data type from java.sql.Types. Initially it was reporting DATA_TYPE as integers (corresponding to the constants in the ENUM java.sql.Types), which wasn’t very useful, but then I managed to get it to report the constant name as follows:
1 2 3 |
rs = conn.getMetaData().getTypeInfo()) while (rs.next()) System.out.println(rs.getString("TYPE_NAME") + "\t" + JDBCType.valueOf(rs.getInt("DATA_TYPE")).getName()); |
Running this reveals the complete list of PostgreSQL data types and their mapping to SQL/JDBC Data types, for a total of 183 data types. Removing all the (many) OTHER and ARRAY (e.g. _record etc) data types leaves us with this more manageable table:
PostgreSQL Data Type | SQL Data Type |
bool | BIT |
bit | BIT |
int8 | BIGINT |
bigserial | BIGINT |
oid | BIGINT |
bytea | BINARY |
char | CHAR |
bpchar | CHAR |
numeric | NUMERIC |
int4 | INTEGER |
serial | INTEGER |
int2 | SMALLINT |
smallserial | SMALLINT |
float4 | REAL |
float8 | DOUBLE |
money | DOUBLE |
name | VARCHAR |
text | VARCHAR |
varchar | VARCHAR |
date | DATE |
time | TIME |
timetz | TIME |
timestamp | TIMESTAMP |
timestamptz | TIMESTAMP |
cardinal_number | DISTINCT |
character_data | DISTINCT |
sql_identifier | DISTINCT |
time_stamp | DISTINCT |
yes_or_no | DISTINCT |
xml | SQLXML |
refcursor | REF_CURSOR |
Table 2: Mappings From PostgreSQL to SQL/JDBC Data Types
Note that the PostgreSQL data types are using the aliases from Table 1. But this is only part of the answer. How do we know what Java data types correspond to the SQL data types?
6. Mappings From SQL/JDBC Data Types to Java Data Types
After doing some more searching, the best I could do to find mappings from SQL data types to Java data types was to consult the actual JDBC specifications, Appendix B tables. Here’s my solution using Appendix B.4, Java object types mapped to JDBC types.
PostgreSQL Data Type | SQL/JDBC Data Type | Java Type |
bool | BIT | boolean |
bit | BIT | boolean |
int8 | BIGINT | long |
bigserial | BIGINT | long |
oid | BIGINT | long |
bytea | BINARY | byte[] |
char | CHAR | String |
bpchar | CHAR | String |
numeric | NUMERIC | java.math.BigDecimal |
int4 | INTEGER | int |
serial | INTEGER | int |
int2 | SMALLINT | short |
smallserial | SMALLINT | short |
float4 | REAL | float |
float8 | DOUBLE | double |
money | DOUBLE | double |
name | VARCHAR | String |
text | VARCHAR | String |
varchar | VARCHAR | String |
date | DATE | java.sql.Date |
time | TIME | java.sql.Time |
timetz | TIME | java.sql.Time |
timestamp | TIMESTAMP | java.sql.Timestamp |
timestamptz | TIMESTAMP | java.sql.Timestamp |
cardinal_number | DISTINCT | Mapping of underlying type |
character_data | DISTINCT | Mapping of underlying type |
sql_identifier | DISTINCT | Mapping of underlying type |
time_stamp | DISTINCT | Mapping of underlying type |
yes_or_no | DISTINCT | Mapping of underlying type |
xml | SQLXML | java.sql.SQLXML |
refcursor | REF_CURSOR | Undefined |
_abc | ARRAY | java.sql.array |
Table 3: Complete Mappings from PostgreSQL to SQL/JDBC to Java Data Types
Notes:
1 REF_CURSOR doesn’t appear in the jdbc appendices, but is mentioned in section “13.3.3.4 REF Cursor Support” of the specification, and may map to Types.REF_CURSOR.
2 _abc stands for one of many ARRAY data types available in PostgreSQL (_record to _yes_or_no).
7. Type Conversion
So now we know the correct Java types to use, everything should just magically work correctly right? Well, mostly, as long as you remember what the PostgreSQL data types used in the table columns are, and use the correct PostgreSQL to Java mappings. As an experiment I tried a few data types including int2/int4/int8 (short, int, long), char/text (String), numeric (BigDecimal). The main thing to watch out for is conversion/casting between different sized types—going from longer to short types results in run-times errors (if you are lucky) or truncation (if you are unlucky).
How about a really simple type such as bit, bool, and boolean? That should be idiot proof? True? (or False or Unknown…). Let’s try and see what happens (in my next PostgreSQL blog)!
8. Conclusions
In this blog I discovered what data types are available in PostgreSQL (a lot), and hopefully determined the definitive mapping from PostgreSQL to SQL/JDBC to Java data types. However, even armed with this information you have to be careful about type conversion/casting, and watch out for run-time errors, truncation, or loss of information.
And we’ve really only started to scratch the surface of PostgreSQL data types, as PostgreSQL allows for custom data types, user defined types, arrays, and OTHER data types (e.g. JSON). I was also surprised to discover that PostgreSQL is really an ORDBMS (an object-relational DB), and has support for some object-oriented features such as table inheritance and function overloading. How these all work from JDBC would be more than enough to fill my imaginary database book, or several actual (but currently chimerical) future blogs.
Transparent, fair, and flexible pricing for your data infrastructure: See Instaclustr Pricing Here or learn about our PostgreSQL Consulting service.