[PARALLEL] Do you have a big task to execute and you are short | LinuxCheatSheet
[PARALLEL] Do you have a big task to execute and you are short on time? Split it in smaller ones and try GNU Parallel
GNU parallel is available as package for the most common Linux distributions. Basically it works like xargs but the tasks it receives from standard input are executed in background and in parallel to maximize the use of all our cpu cores/threads.
Sometime there are tasks that simply take a lot of time if implemented with classical scripting, for example:
- you have a directory that contain a lot of subdirectories. i.e. /srv/data/images/01 /srv/data/images/02 .. /srv/data/images/99 each subdir from 00 to 99 contain a lot of small files and you want to archive it in single subdir archives, like 00.tar.gz 01.tar.gz and so on. You may try to implement it with a for cycle in bash, but you notice that it will take too much time. With GNU parallel you can do it faster with seq -w 00 99 | parallel tar -C /srv/data/images/ -czf /srv/data/{}.tar.gz {} You will obtain /srv/data/00.tar.gz /srv/data/01.tar.gz and so on.
- you have a big file in one server (let's say greater than 100GB). You want to distribute it to ten servers. You may scp to each server one by one, or do it with parallel. Suppose you have a text file with the ip address of each server ~]$ cat /tmp/server-list 10.0.0.1 10.0.0.2 10.0.0.3 10.0.0.4 ... and that you already have your ssh key in each server to login passwordless. Then you can do it with cat /tmp/server-list | parallel --progress rsync -aq /data/bigfile user@{}:/data/bigfile rsync will be then spawned n times and you will copy the same file to multiple servers at one time.
If you like it please vote the post and share the channel with your friends: http://t.me/linuxcheatsheet Bye! G.
This channel is dedicated to broadcast linux suggestions, tricks on the command line, and black magic done with the shell. It is inspired to the (now closed) portico.org web site. The channel post wil...