Parallelizing SFTP Downloads in Ruby - Nestor G Pestelos Jr (ngpestelos)

*When building the right thing costs the same as building the quick thing* *Published February 8, 2026* --- A vendor sync job had a 20GB file that took more than a day over single-connection SFTP. A CLI tool called lftp already does parallel downloads over SSH. But wrapping that in a proper Ruby gem with integrity verification, resume support, and automatic retry? Nobody builds a real library for a file transfer problem. I built it in an afternoon with an AI coding assistant. --- ## The Problem SFTP runs over a single SSH connection. Ruby's `net-sftp` gem is reliable, but for a 20GB file at 2MB/s, that's a hard ceiling: roughly 3 hours. Our connection had more bandwidth available. The protocol just wouldn't use it. On bad days the transfer would stall, timeout, and start from zero. lftp, a command-line transfer program that's been around since 1996, already solves this: ```bash lftp -c " open sftp://user:pass@host pget -n 8 /remote/large_file.zip -o /local/large_file.zip " ``` `pget` splits the file into byte ranges and downloads segments over 8 parallel SSH connections. 20GB goes from 1 day to less than 1 hour. It handles resume, retries, and timeouts. The hard part was already solved, just not as a Ruby library. --- ## From Script to Gem In the past, that working lftp command would have stayed a shell script in `bin/`. It would work. It would be fragile. Nobody would touch it because nobody would want to understand the lftp flag combinations. Here's what the gem actually constructs under the hood: ``` lftp -c "set net:timeout 60; set net:max-retries 15; set net:reconnect-interval-base 10; set sftp:connect-program 'ssh -o StrictHostKeyChecking=no'; open sftp://user:password@host; pget -n 8 -c /remote/large_file.zip -o /local/large_file.zip" ``` Timeout tuning, retry configuration, reconnection intervals, SSH options, segment count, resume flags. That's the kind of incantation that lives in a script nobody maintains. When the next project needs parallel SFTP, someone writes another script from scratch. I was using Claude Code for other parts of this codebase. On a whim I described what I wanted: a gem that wraps lftp's pget, handles connection options, verifies zip integrity after download, and resumes interrupted transfers. The kind of spec that would normally sit in a backlog for months. The first working version took an afternoon. Not because AI wrote perfect code on the first try. It didn't. But the iteration cycle collapsed. Describe a feature, get an implementation, test it against a real server, describe what's wrong, get a fix. The back-and-forth that normally stretches across days (reading man pages, figuring out which flags interact, handling the dozen ways a subprocess can fail) compressed into a conversation. The expensive part of building a gem isn't any single function. It's the accumulated friction of dozens of small decisions. So the production code ended up looking like this: ```ruby ParallelSftp.download( host: ENV['SFTP_HOST'], user: ENV['SFTP_USER'], password: ENV['SFTP_PASSWORD'], remote_path: '/exports/daily_feed.zip', local_path: Rails.root.join('tmp', 'vendor_daily.zip').to_s ) ``` One method call. That lftp incantation, the subprocess management, the file verification, all behind a Ruby interface that anyone on the team can read. --- ## The Features You'd Normally Skip A throwaway script downloads the file. A proper gem handles the things that go wrong. **Zip integrity.** Parallel downloads can corrupt files in a subtle way. lftp reassembles segments, and occasionally the boundaries produce a bad file even though lftp reports success. The gem runs `unzip -t` after every download. If it fails, it retries at the same segment count (corruption is often transient), then halves the segments (fewer boundaries, less risk), then falls back to a single connection, then gives up. That degradation strategy is the kind of thing you'd sketch on a whiteboard and never implement in a script. Maybe two hours of work for an edge case that happens once a month. You'd skip it. In a gem where the total development cost was an afternoon, there was no reason not to build it right. **Resume.** lftp tracks each segment's byte offset in a status file. Kill the process, restart it, and it picks up where it left off. The gem exposes this with `resume: true`. The only thing that matters: download paths must be deterministic so retries find the partial file instead of starting fresh. With that, your job's `retry_on` and lftp's resume work together automatically. A deploy mid-download doesn't mean starting over. **Error specificity.** `ConnectionError`, `ZipIntegrityError`, `DownloadError` instead of a generic failure. Monitoring routes alerts appropriately without a catch-all rescue. Every one of these features exists because the cost of building them dropped low enough that skipping them stopped making sense. --- ## Results Before: a daily sync that started before lunch and maybe finished by end of day. Downloads that failed at 80% meant starting over. Corrupt zips discovered hours later when the import blew up. The whole team knew which day was "vendor sync day" because that was the day things broke. After: the sync runs in under 40 minutes. Interrupted downloads resume. Corrupt files are caught and retried automatically. The SFTP step became boring, which is exactly what infrastructure should be. There are cases where this doesn't make sense: files under 100MB, non-zip formats that can't be verified, servers that restrict concurrent connections. But for the problem that started this, it worked. lftp's pget has existed for years. The integrity strategy is straightforward once you write it down. The insight was that an AI coding assistant collapsed the economics of "doing it right." The gap between a disposable script and a proper library (tests, edge case handling, integrity verification, automatic retry) used to be days of work. Now it's an afternoon. When building the right thing costs the same as building the quick thing, you build the right thing. --- - **GitHub**: [parallel_sftp](https://github.com/ngpestelos/parallel_sftp) - **lftp project**: [github.com/lavv17/lftp](https://github.com/lavv17/lftp)