In short, this technique corresponds to a scenario such as the following
Assume that there is text such as the following
ccccaaaabbbbddddbbbbccccaaaa
It needs to be re-processed now. This is very easy,sort-u, but suppose I want to keep the text in its original order. For example, there are two aaaa
, I just want to get rid of the second one aaaa
, and the first one aaaa
in bbbb
front. Go to the heavy and still be in front of it. So I expect the output to be
ccccaaaabbbbdddd
Of course, the problem itself is not difficult. Written in C + + or Python is very easy, but the so-called sledgehammer, can be solved with the shell command. It is always our first choice. The answer is given at the end. Here's how I thought about it.
There are times when we want to add our own folders to the environment variable path is written in the ~/.BASHRC file, for example, to add a folder of $home/bin
export PATH=$HOME/bin:$PATH
This way we are appending the path $home/bin in path and let it be searched at the front. But when we run source ~/.bashrc
, the$Home/bin folder will be added to the path, assuming we add another folder next time, for example
export PATH=$HOME/local/bin:$HOME/bin:$PATH
When run source ~/.bashrc
, the$home/bin folder will actually have two records in path. Although this does not affect the use. But for an obsessive-compulsive disorder, this is intolerable. So the problem becomes, we need to remove the path of the repeated paths, and keep the original path order unchanged, that is, who was in front. The go-to-weight is still in front, since the first path starts when you run the shell command. So the order is very important.
All right. Having said so much we have come to reveal the results of the last. Take the data from the beginning of the article as an example, assuming the input file is in.txt. Commands such as the following
cat -n in.txt | sort -k2,2 -k1,1n | uniq -f1 | sort -k1,1n | cut -f2-
These are very easy shell commands, which are explained in a little bit below
cat -n in.txt : 输出文本,并在前面加上行号。以\t分隔sort -k2,2 -k1,1n : 对输入内容排序,primary key是第二个字段,second key是第一个字段而且按数字大小排序uniq -f1 : 忽略第一列。对文本进行去重,但输出时会包括第一列sort -k1,1n : 对输入内容排序,key是第一个字段并按数字大小排序cut -f2- : 输出第2列及之后的内容。默认分隔符为\t
You can start with the first command and combine it in turn. Look at the actual output, so it's easier to understand. What to do with the repeated paths in path. Or a sample of the past, just need to use TR before and after the conversion can be
export PATH=$HOME/local/bin:$HOME/bin:$PATHexport PATH=`echo $PATH | tr ‘:‘ ‘\n‘ | cat -n | sort -k2,2 -k1,1n | uniq -f1 | sort -k1,1n | cut -f2- | tr ‘\n‘ ‘:‘`
In fact, there is a problem with using path, for example, when we run the above command. Suppose you want to remove the $home/bin path. It's not enough to just change things like the following
export PATH=$HOME/local/bin:$PATHexport PATH=`echo $PATH | tr ‘:‘ ‘\n‘ | cat -n | sort -k2,2 -k1,1n | uniq -f1 | sort -k1,1n | cut -f2- | tr ‘\n‘ ‘:‘`
Since we have added $ home/bin to the $path, doing so does not have the effect of deleting it, perhaps the best way is to know all the paths clearly, and then display the designations instead of taking the append way
Shell command Tips--text to weight and maintain the original order