v1ral_ITS

Split and Join terminal commands

Jun 10th, 2020
159
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Bash 9.54 KB | None | 0 0
  1. Linux split and join commands are very helpful when you are manipulating large files. This article explains how to use Linux split and join command with descriptive examples.
  2.  
  3. Join and split command syntax:
  4.  
  5.     join [OPTION]… FILE1 FILE2
  6.     split [OPTION][INPUT [PREFIX]]
  7.  
  8. Linux Split Command Examples
  9. 1. Basic Split Example
  10.  
  11. Here is a basic example of split command.
  12.  
  13. $ split split.zip
  14.  
  15. $ ls
  16. split.zip  xab  xad  xaf  xah  xaj  xal  xan  xap  xar  xat  xav  xax  xaz  xbb  xbd  xbf  xbh  xbj  xbl  xbn
  17. xaa        xac  xae  xag  xai  xak  xam  xao  xaq  xas  xau  xaw  xay  xba  xbc  xbe  xbg  xbi  xbk  xbm  xbo
  18.  
  19. So we see that the file split.zip was split into smaller files with x** as file names. Where ** is the two character suffix that is added by default. Also, by default each x** file would contain 1000 lines.
  20.  
  21. $ wc -l *
  22.    40947 split.zip
  23.     1000 xaa
  24.     1000 xab
  25.     1000 xac
  26.     1000 xad
  27.     1000 xae
  28.     1000 xaf
  29.     1000 xag
  30.     1000 xah
  31.     1000 xai
  32. ...
  33. ...
  34. ...
  35.  
  36. So the output above confirms that by default each x** file contains 1000 lines.
  37. 2.Change the Suffix Length using -a option
  38.  
  39. As discussed in example 1 above, the default suffix length is 2. But this can be changed by using -a option.
  40.  
  41. As you see in the following example, it is using suffix of length 5 on the split files.
  42.  
  43. $ split -a5 split.zip
  44. $ ls
  45. split.zip  xaaaac  xaaaaf  xaaaai  xaaaal  xaaaao  xaaaar  xaaaau  xaaaax  xaaaba  xaaabd  xaaabg  xaaabj  xaaabm
  46. xaaaaa     xaaaad  xaaaag  xaaaaj  xaaaam  xaaaap  xaaaas  xaaaav  xaaaay  xaaabb  xaaabe  xaaabh  xaaabk  xaaabn
  47. xaaaab     xaaaae  xaaaah  xaaaak  xaaaan  xaaaaq  xaaaat  xaaaaw  xaaaaz  xaaabc  xaaabf  xaaabi  xaaabl  xaaabo
  48.  
  49. Note: Earlier we also discussed about other file manipulation utilities – tac, rev, paste.
  50. 3.Customize Split File Size using -b option
  51.  
  52. Size of each output split file can be controlled using -b option.
  53.  
  54. In this example, the split files were created with a size of 200000 bytes.
  55.  
  56. $ split -b200000 split.zip
  57.  
  58. $ ls -lart
  59. total 21084
  60. drwxrwxr-x 3 himanshu himanshu     4096 Sep 26 21:20 ..
  61. -rw-rw-r-- 1 himanshu himanshu 10767315 Sep 26 21:21 split.zip
  62. -rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xad
  63. -rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xac
  64. -rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xab
  65. -rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xaa
  66. -rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xah
  67. -rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xag
  68. -rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xaf
  69. -rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xae
  70. -rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xar
  71. ...
  72. ...
  73. ...
  74.  
  75. 4. Create Split Files with Numeric Suffix using -d option
  76.  
  77. As seen in examples above, the output has the format of x** where ** are alphabets. You can change this to number using -d option.
  78.  
  79. Here is an example. This has numeric suffix on the split files.
  80.  
  81. $ split -d split.zip
  82. $ ls
  83. split.zip  x01  x03  x05  x07  x09  x11  x13  x15  x17  x19  x21  x23  x25  x27  x29  x31  x33  x35  x37  x39
  84. x00        x02  x04  x06  x08  x10  x12  x14  x16  x18  x20  x22  x24  x26  x28  x30  x32  x34  x36  x38  x40
  85.  
  86. 5. Customize the Number of Split Chunks using -C option
  87.  
  88. To get control over the number of chunks, use the -C option.
  89.  
  90. This example will create 50 chunks of split files.
  91.  
  92. $ split -n50 split.zip
  93. $ ls
  94. split.zip  xac  xaf  xai  xal  xao  xar  xau  xax  xba  xbd  xbg  xbj  xbm  xbp  xbs  xbv
  95. xaa        xad  xag  xaj  xam  xap  xas  xav  xay  xbb  xbe  xbh  xbk  xbn  xbq  xbt  xbw
  96. xab        xae  xah  xak  xan  xaq  xat  xaw  xaz  xbc  xbf  xbi  xbl  xbo  xbr  xbu  xbx
  97.  
  98. 6. Avoid Zero Sized Chunks using -e option
  99.  
  100. While splitting a relatively small file in large number of chunks, its good to avoid zero sized chunks as they do not add any value. This can be done using -e option.
  101.  
  102. Here is an example:
  103.  
  104. $ split -n50 testfile
  105.  
  106. $ ls -lart x*
  107. -rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xag
  108. -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xaf
  109. -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xae
  110. -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xad
  111. -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xac
  112. -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xab
  113. -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xaa
  114. -rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xbx
  115. -rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xbw
  116. -rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xbv
  117. ...
  118. ...
  119. ...
  120.  
  121. So we see that lots of zero size chunks were produced in the above output. Now, lets use -e option and see the results:
  122.  
  123. $ split -n50 -e testfile
  124. $ ls
  125. split.zip  testfile  xaa  xab  xac  xad  xae  xaf
  126.  
  127. $ ls -lart x*
  128. -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xaf
  129. -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xae
  130. -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xad
  131. -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xac
  132. -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xab
  133. -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xaa
  134.  
  135. So we see that no zero sized chunk was produced in the above output.
  136. 7. Customize Number of Lines using -l option
  137.  
  138. Number of lines per output split file can be customized using the -l option.
  139.  
  140. As seen in the example below, split files are created with 20000 lines.
  141.  
  142. $ split -l20000 split.zip
  143.  
  144. $ ls
  145. split.zip  testfile  xaa  xab  xac
  146.  
  147. $ wc -l x*
  148.    20000 xaa
  149.    20000 xab
  150.      947 xac
  151.    40947 total
  152.  
  153. Get Detailed Information using –verbose option
  154.  
  155. To get a diagnostic message each time a new split file is opened, use –verbose option as shown below.
  156.  
  157. $ split -l20000 --verbose split.zip
  158. creating file `xaa'
  159. creating file `xab'
  160. creating file `xac'
  161.  
  162. Linux Join Command Examples
  163. 8. Basic Join Example
  164.  
  165. Join command works on first field of the two files (supplied as input) by matching the first fields.
  166.  
  167. Here is an example :
  168.  
  169. $ cat testfile1
  170. 1 India
  171. 2 US
  172. 3 Ireland
  173. 4 UK
  174. 5 Canada
  175.  
  176. $ cat testfile2
  177. 1 NewDelhi
  178. 2 Washington
  179. 3 Dublin
  180. 4 London
  181. 5 Toronto
  182.  
  183. $ join testfile1 testfile2
  184. 1 India NewDelhi
  185. 2 US Washington
  186. 3 Ireland Dublin
  187. 4 UK London
  188. 5 Canada Toronto
  189.  
  190. So we see that a file containing countries was joined with another file containing capitals on the basis of first field.
  191. 9. Join works on Sorted List
  192.  
  193. If any of the two files supplied to join command is not sorted then it shows up a warning in output and that particular entry is not joined.
  194.  
  195. In this example, since the input file is not sorted, it will display a warning/error message.
  196.  
  197. $ cat testfile1
  198. 1 India
  199. 2 US
  200. 3 Ireland
  201. 5 Canada
  202. 4 UK
  203.  
  204. $ cat testfile2
  205. 1 NewDelhi
  206. 2 Washington
  207. 3 Dublin
  208. 4 London
  209. 5 Toronto
  210.  
  211. $ join testfile1 testfile2
  212. 1 India NewDelhi
  213. 2 US Washington
  214. 3 Ireland Dublin
  215. join: testfile1:5: is not sorted: 4 UK
  216. 5 Canada Toronto
  217.  
  218. 10. Ignore Case using -i option
  219.  
  220. When comparing fields, the difference in case can be ignored using -i option as shown below.
  221.  
  222. $ cat testfile1
  223. a India
  224. b US
  225. c Ireland
  226. d UK
  227. e Canada
  228.  
  229. $ cat testfile2
  230. a NewDelhi
  231. B Washington
  232. c Dublin
  233. d London
  234. e Toronto
  235.  
  236. $ join testfile1 testfile2
  237. a India NewDelhi
  238. c Ireland Dublin
  239. d UK London
  240. e Canada Toronto
  241.  
  242. $ join -i testfile1 testfile2
  243. a India NewDelhi
  244. b US Washington
  245. c Ireland Dublin
  246. d UK London
  247. e Canada Toronto
  248.  
  249. 11. Verify that Input is Sorted using –check-order option
  250.  
  251. Here is an example. Since testfile1 was unsorted towards the end so an error was produced in the output.
  252.  
  253. $ cat testfile1
  254. a India
  255. b US
  256. c Ireland
  257. d UK
  258. f Australia
  259. e Canada
  260.  
  261. $ cat testfile2
  262. a NewDelhi
  263. b Washington
  264. c Dublin
  265. d London
  266. e Toronto
  267.  
  268. $ join --check-order testfile1 testfile2
  269. a India NewDelhi
  270. b US Washington
  271. c Ireland Dublin
  272. d UK London
  273. join: testfile1:6: is not sorted: e Canada
  274.  
  275. 12. Do not Check the Sortness using –nocheck-order option
  276.  
  277. This is the opposite of the previous example. No check for sortness is done in this example, and it will not display any error message.
  278.  
  279. $ join --nocheck-order testfile1 testfile2
  280. a India NewDelhi
  281. b US Washington
  282. c Ireland Dublin
  283. d UK London
  284.  
  285. 13. Print Unpairable Lines using -a option
  286.  
  287. If both the input files cannot be mapped one to one then through -a[FILENUM] option we can have those lines that cannot be paired while comparing. FILENUM is the file number (1 or 2).
  288.  
  289. In the following example, we see that using -a1 produced the last line in testfile1 (marked as bold below) which had no pair in testfile2.
  290.  
  291. $ cat testfile1
  292. a India
  293. b US
  294. c Ireland
  295. d UK
  296. e Canada
  297. f Australia
  298.  
  299. $ cat testfile2
  300. a NewDelhi
  301. b Washington
  302. c Dublin
  303. d London
  304. e Toronto
  305.  
  306. $ join testfile1 testfile2
  307. a India NewDelhi
  308. b US Washington
  309. c Ireland Dublin
  310. d UK London
  311. e Canada Toronto
  312.  
  313. $ join -a1 testfile1 testfile2
  314. a India NewDelhi
  315. b US Washington
  316. c Ireland Dublin
  317. d UK London
  318. e Canada Toronto
  319. f Australia
  320.  
  321. 14. Print Only Unpaired Lines using -v option
  322.  
  323. In the above example both paired and unpaired lines were produced in the output. But, if only unpaired output is desired then use -v option as shown below.
  324.  
  325. $ join -v1 testfile1 testfile2
  326. f Australia
  327.  
  328. 15. Join Based on Different Columns from Both Files using -1 and -2 option
  329.  
  330. By default the first columns in both the files is used for comparing before joining. You can change this behavior using -1 and -2 option.
  331.  
  332. In the following example, the first column of testfile1 was compared with the second column of testfile2 to produce the join command output.
  333.  
  334. $ cat testfile1
  335. a India
  336. b US
  337. c Ireland
  338. d UK
  339. e Canada
  340.  
  341. $ cat testfile2
  342. NewDelhi a
  343. Washington b
  344. Dublin c
  345. London d
  346. Toronto e
  347.  
  348. $ join -1 1 -2 2 testfile1 testfile2
  349. a India NewDelhi
  350. b US Washington
  351. c Ireland Dublin
  352. d UK London
  353. e Canada Toronto
Add Comment
Please, Sign In to add comment