No description
Find a file
2021-11-30 15:44:50 +01:00
classes Add option to ignore a directory on a vessel 2021-11-30 15:44:50 +01:00
.gitignore Add sqlite3 journal file to .gitignore 2021-11-26 07:06:40 +01:00
__main__.py Current code status 2021-11-20 15:40:07 +01:00
const.py Fixing Dog Handler usage 2021-11-25 10:40:25 +01:00
README.md Add option to ignore a directory on a vessel 2021-11-30 15:44:50 +01:00
requirements.txt Current state 2021-11-22 11:14:38 +01:00
settings.example.ini Fix value in default settings file, add README 2021-11-30 08:58:29 +01:00
worker.py Logic improvements 2021-11-25 16:31:49 +01:00

ContentMonster

ContentMonster is a Python package used to replicate the contents of directories on one server ("shore") to other servers ("vessels") using SFTP over unstable network connections. The files are split into smaller chunks which are transferred separately and reassembled on the server.

It comes with a daemon application (worker.py) which monitors the configured local directories for changes and instantly pushes them to the vessels. Once a file has been replicated to all vessels, it is moved to a "processed" subdirectory of its source directory and removed from the queue.

Prerequisites

ContentMonster is written in Python3 and makes use of syntactical features introduced in Python 3.8. It depends on two packages installable by pip, paramiko (for SSH/SFTP connections) and watchdog (to monitor local directories for changes).

It was tested on Ubuntu 21.04 and Debian 10, but I don't see a reason why it would not work on other Unixoids or even Windows (although it might need some changes to properly work on the latter) as all dependencies are platform-independent.

Vessels (destination servers) need to have an SSH server with SFTP support. This has been tested with a default OpenSSH server as well as a Dropbear server with OpenSSH's sftp-server. They also have to provide the cat command which is used to reassemble the uploaded chunks.

Installation

It is recommended that you use a virtual environment in order to maintain a clean Python environment independent from system updates and other Python projects on the same host. Note that you may have to install the venv package from your OS's package repositories first (on Debian-based distributions: apt install python3-venv).

In a terminal, navigate to the ContentMonster directory, then (assuming you are running bash) execute the following commands:

python3 -m venv venv  # Create a virtual environment in the "venv" subdirectory
. venv/bin/activate  # Activate the virtual environment (just in case)
pip install -Ur requirements.txt  # Install the package dependencies (paramiko/watchdog)

Configuration

The application is configured using the settings.ini file. Start off by copying the provided settings.example.ini to settings.ini and opening it in a text editor. Note that all keys and values are case-sensitive. Required keys are identified as such in the comments below, all other keys are optional. The file consists of (at least) three sections:

MONSTER

The MONSTER section contains a few global configuration options for the application:

[MONSTER]
ChunkSize = 10485760  # Size of individual chunks in bytes (default: 10 MiB)

Directory

You can configure as many directories to be replicated as you want by adding multiple Directory sections. The directories are replicated to the same location on the vessels that they are located at on the shore.

[Directory sampledir]  # Each directory needs a unique name - here: "sampledir"
Location = /home/user/replication  # Required: File system location of the directory

Note: Currently, the same Location value is used on both the shore and the vessels, although this may be configurable in a future version. The directory has to be writable by the configured users on all of the configured vessels. In the above example, files are taken from /home/user/replication on the shore and put into /home/user/replication on each of the vessels.

Vessel

You can configure as many vessels to replicate your files to as you want by adding multiple Vessel sections. All configured directories are replicated to all vessels by default, but you can use the IgnoreDirs directive to exclude a directory from a given vessel. If you want to use an SSH key to authenticate on the vessels, make sure that it is picked up by the local SSH agent (i.e. you can login using the key when connecting with the ssh command).

[Vessel samplevessel]  # Each vessel needs a unique name - here: "samplevessel"
Address = example.com  # Required: Hostname / IP address of the vessel
TempDir = /tmp/.ContentMonster  # Temporary directory for uploaded chunks (default: /tmp/.ContentMonster) - needs to be writable
Username = replication  # Username to authenticate as on the vessel (default: same as user running ContentMonster)
Password = verysecret  # Password to use to authenticate on the vessel (default: none, use SSH key)
Passphrase = moresecret  # Passphrase of the SSH key you use to authenticate (default: none, key has no passphrase)
Port = 22  # Port of the SSH server on the vessel (default: 22)
IgnoreDirs = sampledir, anotherdir  # Names of directories *not* to replicate to this vessel, separated by commas

Running

To run the application after creating the settings.ini, navigate to ContentMonster's base directory in a terminal and make sure you are in the right virtual environment:

. venv/bin/activate

Then, you can run the worker like this:

python worker.py

Keep an eye on the output for the first minute or so, to check for any issues during initialization.

systemd Service

You may want to run ContentMonster as a systemd service to make sure it starts automatically after a system reboot. Assuming that it is installed into /opt/ContentMonster/ following the instructions above and supposed to run as the replication user, something like this should work:

[Unit]
Description=ContentMonster
After=syslog.target network.target

[Service]
Type=simple
User=replication
WorkingDirectory=/opt/ContentMonster/
ExecStart=/opt/ContentMonster/venv/python /opt/ContentMonster/worker.py
Restart=on-abort

[Install]
WantedBy=multi-user.target

Write this to /etc/systemd/system/contentmonster.service, then enable the service like this:

systemctl daemon-reload
systemctl enable --now contentmonster
systemctl status contentmonster  # Check that the service started properly

The service should now start automatically after every reboot. You can use commands like systemctl status contentmonster and journalctl -xeu contentmonster to keep an eye on the status of the service.