<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Python on despatches</title><link>https://icle.es/tags/python/</link><description>Recent content in Python on despatches</description><generator>Hugo</generator><language>en</language><lastBuildDate>Thu, 17 Jul 2025 15:36:08 +0100</lastBuildDate><atom:link href="https://icle.es/tags/python/index.xml" rel="self" type="application/rss+xml"/><item><title>Setup Whisper</title><link>https://icle.es/2025/07/15/setup-whisper/</link><pubDate>Tue, 15 Jul 2025 08:24:28 +0100</pubDate><guid>https://icle.es/2025/07/15/setup-whisper/</guid><description>&lt;p>I wanted to generate chapter markers from a devlog audio recording using
OpenAI&amp;rsquo;s Whisper, and figured I&amp;rsquo;d run it locally. Whisper is Python-based, and
I&amp;rsquo;m on Arch. What could go wrong?&lt;/p>
&lt;p>Turns out… not much, but it still took a few hops.&lt;/p>
&lt;h2 id="choosing-the-right-setup">Choosing the Right Setup&lt;/h2>
&lt;p>I already had Python installed, but rather than littering system Python or
managing a bunch of ad hoc virtualenvs, I decided to do it properly — with
Poetry.&lt;/p></description><content:encoded><![CDATA[<p>I wanted to generate chapter markers from a devlog audio recording using
OpenAI&rsquo;s Whisper, and figured I&rsquo;d run it locally. Whisper is Python-based, and
I&rsquo;m on Arch. What could go wrong?</p>
<p>Turns out… not much, but it still took a few hops.</p>
<h2 id="choosing-the-right-setup">Choosing the Right Setup</h2>
<p>I already had Python installed, but rather than littering system Python or
managing a bunch of ad hoc virtualenvs, I decided to do it properly — with
Poetry.</p>
```bash
sudo pacman -S poetry
poetry new whisper-transcriber
cd whisper-transcriber
```
<p>So far so good.</p>
<h2 id="pytorch--cuda-the-pypy-pitfall">PyTorch + CUDA: the PyPy Pitfall</h2>
<p>My first attempt to install <code>torch</code>, <code>torchvision</code>, and <code>torchaudio</code> failed in a
confusing way — no versions found at all. The clue was in the command: I&rsquo;d
accidentally run it with <code>pip-pypy3</code>. PyTorch doesn&rsquo;t build wheels for PyPy.
CPython only.</p>
<h2 id="sorting-out-python-versions">Sorting Out Python Versions</h2>
<p>My system Python was 3.13. PyTorch had just released 3.13 wheels for <code>torch</code>,
but not yet for <code>torchaudio</code> — version mismatch. I used <code>pyenv</code> to install 3.12
instead:</p>
```bash
pyenv install 3.12.3
```
<p>Updated <code>pyproject.toml</code>:</p>
```toml
python = ">=3.12,<3.14"
```
<p>And re-pointed Poetry:</p>
```bash
poetry env use $(pyenv prefix 3.12.3)/bin/python
```
<p>Poetry ignored me the first time because 3.13 was still hardcoded. After
recreating the environment and verifying the version, I was ready.</p>
<h2 id="pep-668-the-externally-managed-false-alarm">PEP 668: the &ldquo;Externally Managed&rdquo; False Alarm</h2>
<p>Even inside the Poetry shell, Arch&rsquo;s patched Python threw a
<code>--break-system-packages</code> error. This check is meant to protect system Python —
but it was firing inside a fully isolated Poetry environment. Safe to ignore. I
added the flag:</p>
```bash
poetry run pip install torch torchvision torchaudio \
  --index-url https://download.pytorch.org/whl/cu121 \
  --break-system-packages
```
<p>Worked perfectly.</p>
<h2 id="the-result">The Result</h2>
```bash
poetry run whisper output000.mp3 --model base --output_format json
```
<p>Transcribed 2.5 hours of audio, timestamped segments ready for chapter
generation. All local, GPU-accelerated, isolated from system Python, and
repeatable.</p>
<hr>
<h2 id="in-summary">In Summary</h2>
<p>If you&rsquo;re on Arch and want Whisper with CUDA:</p>
<ol>
<li>Use <code>poetry</code> + <code>pyenv</code></li>
<li>Set Python to 3.12 (not 3.13)</li>
<li>Install torch with <code>--break-system-packages</code> and the <code>cu121</code> index</li>
<li>Whisper just works</li>
</ol>
<p>A bit of fiddling up front, but now it&rsquo;s a solid local tool — one less cloud
dependency to think about.</p>
]]></content:encoded></item><item><title>Automated Posting to BlueSky &amp; Reddit</title><link>https://icle.es/2025/07/01/automated-posting-to-bluesky-reddit/</link><pubDate>Tue, 01 Jul 2025 10:09:47 +0100</pubDate><guid>https://icle.es/2025/07/01/automated-posting-to-bluesky-reddit/</guid><description>&lt;p>I tend to be pretty impatient and when I&amp;rsquo;m doing something, I want to just
finish it off. Unfortunately, the world works better for me when I work to its
schedule.&lt;/p>
&lt;p>Every time I finish a video for &lt;a href="https://icle.es/endeavours/shri-codes.md">shri codes&lt;/a>,
while I am still in the zone, I want to post to all the places (YouTube, BlueSky
and Reddit). However, this is usually the worst time to share these if I want to
get some decent traffic and raise awareness.&lt;/p></description><content:encoded><![CDATA[<p>I tend to be pretty impatient and when I&rsquo;m doing something, I want to just
finish it off. Unfortunately, the world works better for me when I work to its
schedule.</p>
<p>Every time I finish a video for <a href="https://icle.es/endeavours/shri-codes.md">shri codes</a>,
while I am still in the zone, I want to post to all the places (YouTube, BlueSky
and Reddit). However, this is usually the worst time to share these if I want to
get some decent traffic and raise awareness.</p>
<p>I&rsquo;ve been remembering to post on the relevant days at reasonable times, but this
process is annoying at best, interrupts flow and takes up cognitive load.</p>
<p>I wanted to automate it. I&rsquo;ve got to say that automating these two seemingly
simple tasks were rife with unexpected complexity.</p>
<p>My first challenge was trying to get <code>rule_python</code> to work, which, in the end, I
did not succeed and gave up.</p>
<p>Getting <code>pylyzer</code> to work in neovim was also a challenge - another one that I
gave up on.</p>
<p>I briefly gave up on python altogether and went with go, and I made stellar
progress until I got to the bit about actually posting to <code>BlueSky</code> - ChatGPT
had (once again) lied to me (shame on me not verifying their claims). The
library it wanted me to use was a hallucination, and did not exist. I then
realised that there was no real library for reddit integration either.</p>
<p>Back to python, and trusty <code>poetry</code> to see me through.</p>
<h2 id="scheduled-elements">Scheduled Elements</h2>
<p>There are four elements to getting scheduling to work</p>
<h3 id="youtube-scheduling">YouTube Scheduling</h3>
<p>This part was the easiest. The platform is kind enough to provide an option to
schedule release of videos, and we&rsquo;ll use that!</p>
<h3 id="scheduled-publish-for-blog">Scheduled Publish for Blog</h3>
<p><code>hugo</code> supports this out of the box. The bigger challenge was how to get GitHub
Actions to regenerate the site when relevant. In the end, I identified the
window during the week when I want to be publishing.</p>
<p>10am - 4pm Mon - Fri seemed like a decent slot. GitHub Actions though does not
support summer time. I opted for 10am - 3pm, which seemed the better option.</p>
<p>My GitHub action for publishing takes one minute to execute. If I run the action
every 30 minutes, for three hours five days a week:</p>
<p><code>2 * 5 * 5 * 4 = 200</code></p>
<p><a href="https://github.com/drone-ah/wordsonsand/blob/main/.github/workflows/hugo.yaml">.github/workflows/hugo.yaml</a></p>
```yaml
on:
  schedule:
    - cron: "*/30 12-15 * * 1-5"
  # Runs on pushes targeting the default branch
  push:
    branches:
      - main
```
<p>I will need to run a second one for the despatches (below) as well, which would
mean around 400 minutes each month - while there are no limits for public
repos - it felt a little abusive to run it every minute.</p>
<p>Once this has been running safely for a while, I&rsquo;ll consider bumping the
cadence.</p>
<h4 id="cron-is-unreliable-on-github-actions">Cron is Unreliable on GitHub Actions</h4>
<p>After I got this all ready with the two workflows set up to run on GitHub
Actions, I waited, and waited, and nothing happened.</p>
<p><a href="https://docs.github.com/en/actions/reference/events-that-trigger-workflows#schedule">GA schedule doc</a>
states:</p>
<blockquote>
<p>The schedule event can be delayed during periods of high loads of GitHub
Actions workflow runs. High load times include the start of every hour. If the
load is sufficiently high enough, some queued jobs may be dropped. To decrease
the chance of delay, schedule your workflow to run at a different time of the
hour.</p></blockquote>
<p>The
<a href="https://upptime.js.org/blog/2021/01/22/github-actions-schedule-not-working/">upptime post about GitHub Actions schedule not working</a>
includes some suggested workarounds, namely:</p>
<ul>
<li><a href="https://ifttt.com/">IFTTT</a> - seems to be limited to a maximum of hourly</li>
<li><a href="https://cloud.google.com/scheduler/docs/">Google Cloud Scheduler</a> - could be
a good solution but a bit of a sledgehammer</li>
<li><a href="https://cronhub.io/">Cronhub</a> - starts at $19/mo</li>
</ul>
<p>I also discovered:</p>
<ul>
<li><a href="https://cron-job.org/">cron-job.org</a> - haven&rsquo;t tried this yet, but looks
viable</li>
</ul>
<p>I was going to try out cron-job.org when ChatGPT suggested a simpler
alternative - a simple workflow that only triggered the relevant workflows.</p>
<p>According to ChatGPT, the more complex a workflow, the more likely it is to be
dropped. It makes sense, of course, and while I wasn&rsquo;t fully convinced, I
decided to
<a href="https://github.com/drone-ah/wordsonsand/blob/main/.github/workflows/cron.yaml">give it a go</a>.</p>
<p>It&rsquo;s only been 10 minutes, but it has completed one run already - which is
promising, but the original run also ran once.</p>
<p>I&rsquo;ll have to keep an eye on the reliability of this.</p>
<h4 id="switched-to-cron-joborg">Switched to <code>cron-job.org</code></h4>
<p>While the above strategy was OK, I wanted something more reliable, so I switched
to <a href="https://console.cron-job.org">cron-job.org</a></p>
<p>I created a new access token, restricted to the repo and with two additional
permissions:</p>
<ul>
<li>actions: read &amp; write</li>
<li>contents: read (to read the workflow file, ChatGPT suggests)</li>
</ul>
<p>I then set up a http call to:</p>
<p><code>https://api.github.com/repos/&lt;gh-username&gt;/&lt;repo-name&gt;/actions/workflows/&lt;workflow-filename&gt;/dispatches</code></p>
<ul>
<li><code>&lt;gh-username&gt;</code>: use your github username from the url</li>
<li><code>&lt;repo-name&gt;</code>: name of your repo, again from the url</li>
<li><code>&lt;workflow-filename&gt;</code>: The filename of the workflow you want to trigger</li>
</ul>
<p>To triger my hugo run, I used:
<code>https://api.github.com/repos/drone-ah/wordsonsand/actions/workflows/hugo.yaml/dispatches</code></p>
<p>Under advanced, I set the following Headers:</p>
<ul>
<li><code>Accept</code>: <code>application/vnd.github+json</code></li>
<li><code>Authorization</code>: <code>token &lt;personal-access-token&gt;</code></li>
<li><code>Content-Type</code>: <code>application/json</code></li>
<li><code>User-Agent</code>: <code>cronjob</code></li>
</ul>
<p>Set <code>Request method</code> to <code>POST</code></p>
<p><code>Request body</code>:</p>
```json
{
  "ref": "main"
}
```
<h3 id="bluesky">BlueSky</h3>
<p>This one - posting to BlueSky was far more complicated than I anticipated. All
the complexity was around its requirement to separate the post out into facets.
I recognise and value the semantic content such a process would output. However,
I could not find an algorithm or any details on how to extract the facets from
some text - e.g. markdown.</p>
<p>I referenced some code from a couple of sources for a stopgap solution to
address urls and hashtags.</p>
<p>And then, I found <a href="https://github.com/dmoggles/blueskysocial">blueskysocial</a></p>
<h3 id="reddit">Reddit</h3>
<p>You first need to <a href="https://www.reddit.com/prefs/apps/">register an app</a> on
reddit, from a page I don&rsquo;t seem to be able to get from anywhere except a direct
link.</p>
<p>Once I registered a <code>personal script</code>, which will let any of the developers
registered on that client to post, I got to try and login and was faced with:</p>
<p><code>prawcore.exceptions.OAuthException: invalid_grant error processing request</code></p>
<h4 id="red-herrings">Red Herrings</h4>
<p>I tried directly with curl:</p>
```bash
curl -u "$CLIENT_ID:$CLIENT_SECRET" \
  -d "grant_type=password&username=$USERNAME&password=$PASSWORD" \
  -A "$APP_NAME" \
  https://www.reddit.com/api/v1/access_token
```
<p>and I got a similar error:</p>
<p><code>{&quot;error&quot;: &quot;invalid_grant&quot;}</code></p>
<p>After stumbling around for a while, verifying and re-verifying the credentials,
I also set up a brand new account using password auth (mine was originally
oauth). It also returned the same error.</p>
<p>Some resources that I followed:</p>
<ul>
<li><a href="https://www.reddit.com/r/redditdev">redditdev</a></li>
<li><a href="https://github.com/reddit/reddit/wiki/OAuth2-Quick-Start-Example">OAuth2 Quick Start Example</a></li>
</ul>
<p>While I was lookin around, I noticed in tiny little letters on the page to
<a href="https://www.reddit.com/prefs/apps/">register an app</a>, when you create a new
app:</p>
<blockquote>
<p>By creating an app, you agree to Reddit&rsquo;s Developer Terms and Data Api Terms.
<strong>You must also
<a href="https://www.reddit.com/r/reddit.com/wiki/api/#wiki_read_the_full_api_terms_and_sign_up_for_usage">register to use the API</a>.</strong></p></blockquote>
<p>(Emphasis mine)</p>
<p>I followed the instructions on that page, which felt more like red tape, but
easy enough for an app that is only intended to post on a schedule.</p>
<p>Alas, this too did not help!</p>
<h4 id="final-solution">Final Solution</h4>
<p>Perhaps not surprisingly, the final solution was to not use the password, but
get a refresh token instead.</p>
<p>You can do this manually on the browser. Start by going to the following URL:</p>
<p><code>https://www.reddit.com/api/v1/authorize?client_id=YOUR_CLIENT_ID&amp;response_type=code&amp;state=xyz&amp;redirect_uri=http://localhost&amp;duration=permanent&amp;scope=identity,submit,read</code></p>
<ul>
<li><code>YOUR_CLIENT_ID</code>: Replace this with the client id from your reddit app</li>
<li><code>redirect_uri</code>: This value has (<code>http://localhost</code> in the example) has to
match the <code>redirect_uri</code> setting in your app</li>
<li><code>scope</code>: Update to the scopes you are looking for.
<a href="https://www.reddit.com/api/v1/scopes">/api/vi/scopes</a> will return the list of
valid scopes and their descriptions.</li>
<li><code>state</code>: can be any value. It&rsquo;s supposed to match in the next step</li>
</ul>
<p>The browser will then ask your permission (of the scopes you defined). If you
approve, the browser will redirect to localhost (or whatever url you define for
the redirect above).</p>
<p>This redirect will likely fail, but that&rsquo;s ok. There is one parameter in the URL
that you are looking for - <code>code</code></p>
<p>In my case, I got something like:</p>
<p><code>http://localhost:8080/?state=xyz&amp;code=RilF7XDhRTr7o7B-iov2gpdDgum5pA#_</code></p>
<p>(don&rsquo;t worry - that code isn&rsquo;t the actual one)</p>
<p>You want to take the code, but without the <code>#_</code> at the end and substitute it in
the following:</p>
```bash
curl -X POST -A "despatcher" --user "$CLIENT_ID:$CLIENT_SECRET" \
  --data "grant_type=authorization_code&code=$CODE&redirect_uri=http://localhost:8080" \
  https://www.reddit.com/api/v1/access_token
```
<ul>
<li><code>CLIENT_ID</code>: The app id from your app settings page (again)</li>
<li><code>CLIENT_SECRET</code>: The secret from you app settings page</li>
<li><code>CODE</code>: The code that was in the URL above</li>
<li><code>redirect_uri</code>: exactly the same <code>redirect_uri</code> as above, and in the app
settings</li>
</ul>
```json
{
  "access_token": "<access-token>",
  "token_type": "bearer",
  "expires_in": 86400,
  "refresh_token": "<refresh_token>",
  "scope": "read submit identity"
}
```
<ul>
<li><code>access_token</code>: You can use this to auth, but not so useful for long term use
as it will expire</li>
<li><code>refresh_token</code>: more useful as it can be used to get a new access token. Pass
to <code>praw</code></li>
</ul>
<p><a href="https://github.com/drone-ah/wordsonsand/tree/main/tools/despatcher/despatch.py">tools/despatcher/despatch.py</a></p>
```python
client_id = os.environ.get("APP_REDDIT_CLIENT_ID")
client_secret = os.environ.get("APP_REDDIT_CLIENT_SECRET")
refresh_token = os.environ.get("APP_REDDIT_REFRESH_TOKEN")

reddit = praw.Reddit(
    client_id=client_id,
    client_secret=client_secret,
    refresh_token=refresh_token,
    user_agent="despatcher",
)

print(reddit.user.me())
```
<h2 id="posting--tracking">Posting &amp; Tracking</h2>
<p>Once a post has been submitted, it is important that we log it somehow.
Otherwise, we&rsquo;ll end up posting it again (and again (and again)).</p>
<p>The cleanest solution I could think of was to update the markdown file, then
commit and push the change. This will also help to keep a log of it.</p>
<p><a href="https://github.com/drone-ah/wordsonsand/blob/main/.github/workflows/despatcher.yaml">.github/workflows/despatcher.yaml</a></p>
```yaml
- name: Commit and push if changed
  run: |
    git config user.name "drone-ah bot"
    git config user.email "github.actions@drone-ah.com"

    if ! git diff --quiet; then
      git add -u
      git commit -m "auto: log post submissions"
      git push
    else
      echo "No changes to commit"
    fi
```
<h2 id="partial-successes">Partial Successes</h2>
<p>Now, I thought I&rsquo;d covered the worst offenders for risk of repeated posting, but
I&rsquo;d missed one case.</p>
<p>What happens when something gets posted, then the script errors?</p>
<p>Well, the git commit won&rsquo;t happen - and sadly this happened to my. My apologies
to the nice folks at <a href="https://www.reddit.com/r/selfhosted/">r/selfhosted</a> who
got a handful of my posts about automated posting - eek :(</p>
<p>Embarrassment aside, it identified at least one fix - probably two. Extra
embarrassing because something like this has happened to me before - many years
ago - but you live!</p>
<p>The first update is to get GitHub Actions to carry on even if there is an error:</p>
<p><a href="https://github.com/drone-ah/wordsonsand/blob/main/.github/workflows/despatcher.yaml">.github/workflows/despatcher.yaml</a></p>
```yaml
- name: Run despatcher script
  working-directory: tools/despatcher
  continue-on-error: true
  run: poetry run ./despatch.py ../../despatches/
```
<p>The second fix it to catch any errors from the dispatchers.</p>
<p><a href="https://github.com/drone-ah/wordsonsand/tree/main/tools/despatcher/despatch.py">tools/despatcher/despatch.py</a></p>
```python
try:
    ptype = p.get("type")
    if ptype == "bluesky":
        url = post_bluesky(p)

    if ptype == "reddit":
        url = post_reddit(p)
except Exception as e:
    print(f"[ERROR] Failed to post to {ptype} for {path}: {e}")
    continue  # Skip to the next file
```
<h2 id="wrap-up">Wrap Up</h2>
<p>In the end, what I thought was a two hour job took me two days, but such is the
life of a software engineer (probably everyone).</p>
<p>I am looking forward to see how it works, and a little scared if it&rsquo;ll go off
and do random things in my name - but we&rsquo;ll see</p>
<h2 id="links">Links</h2>
<ul>
<li><a href="https://icle.es/tools/despatcher/">Code Repo</a></li>
</ul>
<h2 id="updates">Updates</h2>
<ul>
<li>2025-07-08: Switch to <code>cron-job.org</code></li>
<li>2025-07-02: Add note about GA cron unreliability</li>
<li>2025-07-02: Add details of handling partial success</li>
</ul>
]]></content:encoded></item></channel></rss>