Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Linux split and join commands are very helpful when you are manipulating large files. This article explains how to use Linux split and join command with descriptive examples.
- Join and split command syntax:
- join [OPTION]… FILE1 FILE2
- split [OPTION]… [INPUT [PREFIX]]
- Linux Split Command Examples
- 1. Basic Split Example
- Here is a basic example of split command.
- $ split split.zip
- $ ls
- split.zip xab xad xaf xah xaj xal xan xap xar xat xav xax xaz xbb xbd xbf xbh xbj xbl xbn
- xaa xac xae xag xai xak xam xao xaq xas xau xaw xay xba xbc xbe xbg xbi xbk xbm xbo
- So we see that the file split.zip was split into smaller files with x** as file names. Where ** is the two character suffix that is added by default. Also, by default each x** file would contain 1000 lines.
- $ wc -l *
- 40947 split.zip
- 1000 xaa
- 1000 xab
- 1000 xac
- 1000 xad
- 1000 xae
- 1000 xaf
- 1000 xag
- 1000 xah
- 1000 xai
- ...
- ...
- ...
- So the output above confirms that by default each x** file contains 1000 lines.
- 2.Change the Suffix Length using -a option
- As discussed in example 1 above, the default suffix length is 2. But this can be changed by using -a option.
- As you see in the following example, it is using suffix of length 5 on the split files.
- $ split -a5 split.zip
- $ ls
- split.zip xaaaac xaaaaf xaaaai xaaaal xaaaao xaaaar xaaaau xaaaax xaaaba xaaabd xaaabg xaaabj xaaabm
- xaaaaa xaaaad xaaaag xaaaaj xaaaam xaaaap xaaaas xaaaav xaaaay xaaabb xaaabe xaaabh xaaabk xaaabn
- xaaaab xaaaae xaaaah xaaaak xaaaan xaaaaq xaaaat xaaaaw xaaaaz xaaabc xaaabf xaaabi xaaabl xaaabo
- Note: Earlier we also discussed about other file manipulation utilities – tac, rev, paste.
- 3.Customize Split File Size using -b option
- Size of each output split file can be controlled using -b option.
- In this example, the split files were created with a size of 200000 bytes.
- $ split -b200000 split.zip
- $ ls -lart
- total 21084
- drwxrwxr-x 3 himanshu himanshu 4096 Sep 26 21:20 ..
- -rw-rw-r-- 1 himanshu himanshu 10767315 Sep 26 21:21 split.zip
- -rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xad
- -rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xac
- -rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xab
- -rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xaa
- -rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xah
- -rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xag
- -rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xaf
- -rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xae
- -rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xar
- ...
- ...
- ...
- 4. Create Split Files with Numeric Suffix using -d option
- As seen in examples above, the output has the format of x** where ** are alphabets. You can change this to number using -d option.
- Here is an example. This has numeric suffix on the split files.
- $ split -d split.zip
- $ ls
- split.zip x01 x03 x05 x07 x09 x11 x13 x15 x17 x19 x21 x23 x25 x27 x29 x31 x33 x35 x37 x39
- x00 x02 x04 x06 x08 x10 x12 x14 x16 x18 x20 x22 x24 x26 x28 x30 x32 x34 x36 x38 x40
- 5. Customize the Number of Split Chunks using -C option
- To get control over the number of chunks, use the -C option.
- This example will create 50 chunks of split files.
- $ split -n50 split.zip
- $ ls
- split.zip xac xaf xai xal xao xar xau xax xba xbd xbg xbj xbm xbp xbs xbv
- xaa xad xag xaj xam xap xas xav xay xbb xbe xbh xbk xbn xbq xbt xbw
- xab xae xah xak xan xaq xat xaw xaz xbc xbf xbi xbl xbo xbr xbu xbx
- 6. Avoid Zero Sized Chunks using -e option
- While splitting a relatively small file in large number of chunks, its good to avoid zero sized chunks as they do not add any value. This can be done using -e option.
- Here is an example:
- $ split -n50 testfile
- $ ls -lart x*
- -rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xag
- -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xaf
- -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xae
- -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xad
- -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xac
- -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xab
- -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xaa
- -rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xbx
- -rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xbw
- -rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xbv
- ...
- ...
- ...
- So we see that lots of zero size chunks were produced in the above output. Now, lets use -e option and see the results:
- $ split -n50 -e testfile
- $ ls
- split.zip testfile xaa xab xac xad xae xaf
- $ ls -lart x*
- -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xaf
- -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xae
- -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xad
- -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xac
- -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xab
- -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xaa
- So we see that no zero sized chunk was produced in the above output.
- 7. Customize Number of Lines using -l option
- Number of lines per output split file can be customized using the -l option.
- As seen in the example below, split files are created with 20000 lines.
- $ split -l20000 split.zip
- $ ls
- split.zip testfile xaa xab xac
- $ wc -l x*
- 20000 xaa
- 20000 xab
- 947 xac
- 40947 total
- Get Detailed Information using –verbose option
- To get a diagnostic message each time a new split file is opened, use –verbose option as shown below.
- $ split -l20000 --verbose split.zip
- creating file `xaa'
- creating file `xab'
- creating file `xac'
- Linux Join Command Examples
- 8. Basic Join Example
- Join command works on first field of the two files (supplied as input) by matching the first fields.
- Here is an example :
- $ cat testfile1
- 1 India
- 2 US
- 3 Ireland
- 4 UK
- 5 Canada
- $ cat testfile2
- 1 NewDelhi
- 2 Washington
- 3 Dublin
- 4 London
- 5 Toronto
- $ join testfile1 testfile2
- 1 India NewDelhi
- 2 US Washington
- 3 Ireland Dublin
- 4 UK London
- 5 Canada Toronto
- So we see that a file containing countries was joined with another file containing capitals on the basis of first field.
- 9. Join works on Sorted List
- If any of the two files supplied to join command is not sorted then it shows up a warning in output and that particular entry is not joined.
- In this example, since the input file is not sorted, it will display a warning/error message.
- $ cat testfile1
- 1 India
- 2 US
- 3 Ireland
- 5 Canada
- 4 UK
- $ cat testfile2
- 1 NewDelhi
- 2 Washington
- 3 Dublin
- 4 London
- 5 Toronto
- $ join testfile1 testfile2
- 1 India NewDelhi
- 2 US Washington
- 3 Ireland Dublin
- join: testfile1:5: is not sorted: 4 UK
- 5 Canada Toronto
- 10. Ignore Case using -i option
- When comparing fields, the difference in case can be ignored using -i option as shown below.
- $ cat testfile1
- a India
- b US
- c Ireland
- d UK
- e Canada
- $ cat testfile2
- a NewDelhi
- B Washington
- c Dublin
- d London
- e Toronto
- $ join testfile1 testfile2
- a India NewDelhi
- c Ireland Dublin
- d UK London
- e Canada Toronto
- $ join -i testfile1 testfile2
- a India NewDelhi
- b US Washington
- c Ireland Dublin
- d UK London
- e Canada Toronto
- 11. Verify that Input is Sorted using –check-order option
- Here is an example. Since testfile1 was unsorted towards the end so an error was produced in the output.
- $ cat testfile1
- a India
- b US
- c Ireland
- d UK
- f Australia
- e Canada
- $ cat testfile2
- a NewDelhi
- b Washington
- c Dublin
- d London
- e Toronto
- $ join --check-order testfile1 testfile2
- a India NewDelhi
- b US Washington
- c Ireland Dublin
- d UK London
- join: testfile1:6: is not sorted: e Canada
- 12. Do not Check the Sortness using –nocheck-order option
- This is the opposite of the previous example. No check for sortness is done in this example, and it will not display any error message.
- $ join --nocheck-order testfile1 testfile2
- a India NewDelhi
- b US Washington
- c Ireland Dublin
- d UK London
- 13. Print Unpairable Lines using -a option
- If both the input files cannot be mapped one to one then through -a[FILENUM] option we can have those lines that cannot be paired while comparing. FILENUM is the file number (1 or 2).
- In the following example, we see that using -a1 produced the last line in testfile1 (marked as bold below) which had no pair in testfile2.
- $ cat testfile1
- a India
- b US
- c Ireland
- d UK
- e Canada
- f Australia
- $ cat testfile2
- a NewDelhi
- b Washington
- c Dublin
- d London
- e Toronto
- $ join testfile1 testfile2
- a India NewDelhi
- b US Washington
- c Ireland Dublin
- d UK London
- e Canada Toronto
- $ join -a1 testfile1 testfile2
- a India NewDelhi
- b US Washington
- c Ireland Dublin
- d UK London
- e Canada Toronto
- f Australia
- 14. Print Only Unpaired Lines using -v option
- In the above example both paired and unpaired lines were produced in the output. But, if only unpaired output is desired then use -v option as shown below.
- $ join -v1 testfile1 testfile2
- f Australia
- 15. Join Based on Different Columns from Both Files using -1 and -2 option
- By default the first columns in both the files is used for comparing before joining. You can change this behavior using -1 and -2 option.
- In the following example, the first column of testfile1 was compared with the second column of testfile2 to produce the join command output.
- $ cat testfile1
- a India
- b US
- c Ireland
- d UK
- e Canada
- $ cat testfile2
- NewDelhi a
- Washington b
- Dublin c
- London d
- Toronto e
- $ join -1 1 -2 2 testfile1 testfile2
- a India NewDelhi
- b US Washington
- c Ireland Dublin
- d UK London
- e Canada Toronto
Add Comment
Please, Sign In to add comment