<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Sql on despatches</title><link>https://icle.es/tags/sql/</link><description>Recent content in Sql on despatches</description><generator>Hugo</generator><language>en</language><lastBuildDate>Wed, 18 Mar 2026 15:13:17 +0000</lastBuildDate><atom:link href="https://icle.es/tags/sql/index.xml" rel="self" type="application/rss+xml"/><item><title>PostgreSQL performing huge updates</title><link>https://icle.es/2011/11/06/postgresql-performing-huge-updates-1106/</link><pubDate>Sun, 06 Nov 2011 12:45:41 +0000</pubDate><guid>https://icle.es/2011/11/06/postgresql-performing-huge-updates-1106/</guid><description>&lt;p>PostgreSQL is a pretty powerful database server and will work with almost any
settings thrown at it. It is really good at making do with what it has and
performing as it is asked.&lt;/p>
&lt;p>We recently found this as we were trying to update every row in a table that had
over eight million entries. We found in the first few tries that the update was
taking over 24 hours to complete which was far too long for an update script.&lt;/p>
&lt;p>Our investigation of this led us to the pgsql_tmp folder and the work_mem
configuration parameter.&lt;/p>
&lt;p>When the query was being executed, we checked the pgsql_tmp folder to see how
was space being utilised in there. We already knew about the pgsql table from
past experience. We had a server running out of disk space and rapidly. We had
narrowed it down into this folder. In cancelling the query referenced by the tmp
files in here, we were able to free up literally gigabytes of disk space...&lt;/p></description><content:encoded><![CDATA[<p>PostgreSQL is a pretty powerful database server and will work with almost any
settings thrown at it. It is really good at making do with what it has and
performing as it is asked.</p>
<p>We recently found this as we were trying to update every row in a table that had
over eight million entries. We found in the first few tries that the update was
taking over 24 hours to complete which was far too long for an update script.</p>
<p>Our investigation of this led us to the pgsql_tmp folder and the work_mem
configuration parameter.</p>
<p>When the query was being executed, we checked the pgsql_tmp folder to see how
was space being utilised in there. We already knew about the pgsql table from
past experience. We had a server running out of disk space and rapidly. We had
narrowed it down into this folder. In cancelling the query referenced by the tmp
files in here, we were able to free up literally gigabytes of disk space...</p>
<p>We had found roughly half a gig of temporary files in here. This led us to
investigate the configuration file.</p>
<p>The one parameter that stuck out was work_mem which was set to a default of 1mb
which I guess might make sense under most circumstances but not in this one.
According to the postgresql documentation</p>
<blockquote>
<p><code>work_mem</code> (<code>integer</code>)</p>
<p>Specifies the amount of memory to be used by internal sort operations and hash
tables before switching to temporary disk files. The value is defaults to one
megabyte (<code>1MB</code>). Note that for a complex query, several sort or hash
operations might be running in parallel; each one will be allowed to use as
much memory as this value specifies before it starts to put data into
temporary files. Also, several running sessions could be doing such operations
concurrently. So the total memory used could be many times the value
of <code>work_mem</code>; it is necessary to keep this fact in mind when choosing the
value. Sort operations are used for <code>ORDER BY</code>, <code>DISTINCT</code>, and merge joins.
Hash tables are used in hash joins, hash-based aggregation, and hash-based
processing of <code>IN</code> subqueries.</p></blockquote>
<p>This would tell us that the total memory usage with work_mem could be several
times the value set here and setting it to half a gig would probably be a
terrible idea for a heavily utilised production server. However, for the
migration process when we need to update over 8,000,000 rows, it might be a good
temporary fix.</p>
<p>After updating the work_mem to 512mb, we found that no more tmp files were
created and the whole thing was done in memory.</p>
<p>When updating so many rows, there area a few other things to consider.</p>
<p>Firstly, autovacuum will likely kick in several times to vacuum the table.
You'll probably want to disable this for the duration of the update statement
and run a vacuum afterwards.</p>
```sql
    --disable auto vacuum
    ALTER TABLE sometable SET (
      autovacuum_enabled = false, toast.autovacuum_enabled = false
    );
```
<p>You can switch autovacuum back on after the update statement has completed</p>
```sql
    --enable auto vacuum
    ALTER TABLE sometable SET (
      autovacuum_enabled = true, toast.autovacuum_enabled = true
    );
```
<p>A few other things you want to take a look at are the</p>
<ul>
<li>fsync parameter (I usually have this set to off anyway since the servers are
pratically fully redundant)</li>
<li>checkpoint_segments: I changed this to roughly 5 times the original value
(check the log to see if it says that its checkpointing too often)</li>
<li>checkpoint_completion_target: I changed this to 0.9</li>
</ul>
<p>With all of these updates, we were able to bring the total time of the update
down to a few hours.</p>]]></content:encoded></item><item><title>Tracking progress of an update statement</title><link>https://icle.es/2011/11/02/tracking-progress-of-an-update-statement-1101/</link><pubDate>Wed, 02 Nov 2011 19:59:02 +0000</pubDate><guid>https://icle.es/2011/11/02/tracking-progress-of-an-update-statement-1101/</guid><description>&lt;p>Sometimes there is a need to execute a long running update statement. This
update statement might be modifying millions of rows as was the case when we
went hunting for a way to track the progress of the update. Hunting around took
us to &lt;a href="http://archives.postgresql.org/pgsql-admin/2002-07/msg00286.php">http://archives.postgresql.org/pgsql-admin/2002-07/msg00286.php&lt;/a> In our
particular case, we are using postgresql but this should work with any database
server that provides sequences. Our original sql was of the form:&lt;/p>
```sql
update only table1 t1
set amount = t2.price
from table2 t2
where t1.id = t2.id;
```
&lt;p>There is of course now way of figuring out how many rows had been updated
already. The first step was to create a sequence&lt;/p>
```sql
CREATE TEMPORARY SEQUENCE seq_progress START 1;
```</description><content:encoded><![CDATA[<p>Sometimes there is a need to execute a long running update statement. This
update statement might be modifying millions of rows as was the case when we
went hunting for a way to track the progress of the update. Hunting around took
us to <a href="http://archives.postgresql.org/pgsql-admin/2002-07/msg00286.php">http://archives.postgresql.org/pgsql-admin/2002-07/msg00286.php</a> In our
particular case, we are using postgresql but this should work with any database
server that provides sequences. Our original sql was of the form:</p>
```sql
update only table1 t1
set amount = t2.price
from table2 t2
where t1.id = t2.id;
```
<p>There is of course now way of figuring out how many rows had been updated
already. The first step was to create a sequence</p>
```sql
CREATE TEMPORARY SEQUENCE seq_progress START 1;
```
<p>We can then use this sequence in the update statement to ensure that each row
updated also increments the sequence</p>
```sql
update only table1 t1
set amount = t2.price
from table2 t2
where nextval('seq_progress') != 0
and t1.id = t2.id;
```
<p>Once the query is running, you can open another connection to the database. To
get an indication of how far it has got, you can just run the following</p>
```sql
select nextval('seq_progress');
```
<p>Bear in mind that this will also increment it by 1 but if you have millions of
rows which is really the only case in which this would be useful, a few
additional increments is hardly going to make a difference.</p>
<p>Good luck and have fun!</p>]]></content:encoded></item></channel></rss>