![]() ![]() The CASE expression for multiple row updates is still one of my favorite queries, if for only the ease of typing in a controlled environment where many individual UPDATE queries are the other alternative. One of those, on a table with 11 columns while the other had just 2 columns, both of which were of a number data type. I realize these tests are subjective at best. As with what was stated in some of the comments over on the Hacker News thread referenced above, it normally “just depends” on many factors to which may or may not be the optimal choice. I will say that using the CASE expression is a bit easier to type out, but not necessarily the best choice on all occasions. The timing is somewhat even between both sets of operations on the table with just 2 columns. I’ll run the single CASE expression first: location=# i /case_insert.sqlĪnd then the individual UPDATE’s: location=# i /case_insert.sql ![]() I am omitting them here for on-screen brevity and display) ( To note: SQL commands from the first set of operations are used with the appropriate modification(s). Location=# INSERT INTO proj_nums(proj_code) SELECT proj_code FROM data_staging I’ll create a table with 2 columns (composed of a SERIAL data type for the PRIMARY KEY and an INTEGER for the proj_code column) and move over the data: location=# CREATE TABLE proj_nums(n_id SERIAL PRIMARY KEY, proj_code INTEGER) I’m curious of any differences in a table with the same number of rows, but fewer columns, hence the next series of tests. This first test included the entire table, with all columns. The individual INSERT‘s took about half the amount of time as the single CASE. Let’s check the values: location=# SELECT DISTINCT proj_code FROM data_staging Īnd the timing (Note: I’ll do the math in a query since timing did not report whole seconds this run): location=# SELECT round((795.610 + 116.268 + 239.007 + 72.699 + 162.199 + 1987.857 + 321.223) / 1000, 3) AS seconds sql source file for this series of comparisons: BEGIN Īnd those results, location=# i /case_insert.sql I’ll ROLLBACK (not shown) these changes so I can run individual INSERT statements to test those as well.īelow reflects the modifications to the. Here are the reflected changes in the table so far: location=# SELECT DISTINCT proj_code FROM data_staging Just over half a million rows in 6+ seconds. Let’s run this file and check what the timing reports: location=# i /case_insert.sql Let’s go ahead and run a single CASE expression updating all of the proj_code column values and see what the timing reports. Now all proj_code columns have an INTEGER value. Prior to running these tests, I’ll go ahead and ALTER the proj_code column to type INTEGER: BEGIN ĪLTER TABLE data_staging ALTER COLUMN proj_code SET DATA TYPE INTEGER USING NULLIF(trim(both '"' FROM proj_code), '')::INTEGER Īnd cleanup that NULL column value (which is represented by the ELSE ’00’ in the exploratory CASE expression above), setting it to an arbitrary number, 10, with this UPDATE: UPDATE data_staging SELECT NULLIF(trim(both '"' FROM proj_code), '') AS p_code FROM data_staging Let’s use a CTE for that, then SELECT from it: location=# WITH cleaned_nums AS ( I’ll use trim to remove quotes from the values and cast to an INT and determine how many rows exist for each individual value: Here is an exploratory query to determine its current values: location=# SELECT DISTINCT proj_code FROM data_staging We have around half a million rows of data in this table.įor this first comparison, I will UPDATE the proj_code column. Location=# SELECT COUNT(*) FROM data_staging location=# d data_staging Ĭolumn | Type | Collation | Nullable | Default Any ‘analysis’ or UPDATES/INSERTS to this data set, is not reflective of actual ‘real-world’ GPS/GIS operations and is not intended as such. The sample data set I’m using, is found at this link here.īut keep in mind, the data itself is used in this example because it’s a decent sized set with multiple columns. Note: To start, I created a ‘staging’ table with all TEXT type columns to get the data loaded. Xubuntu Linux 16.04.3 LTS (Xenial Xerus).In this blog post, I want to observe comparisons between that particular query, and one involving multiple single UPDATE statements. One of those, in particular, a multi-row UPDATE with a single CASE expression, sparked up an interesting conversation over on Hacker News. In a previous blog post My Favorite PostgreSQL Queries and Why They Matter, I visited interesting queries meaningful to me as I learn, develop, and grow into a SQL developer role. ![]()
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |