当前位置:  开发笔记 > 编程语言 > 正文

如何将shell脚本翻译为Perl?

如何解决《如何将shell脚本翻译为Perl?》经验,为你挑选了3个好方法。

我有一个shell脚本,非常大.现在我的老板说我必须用Perl重写它.有没有办法编写Perl脚本并使用现有的shell代码,就像在我的Perl脚本中一样.类似于Inline :: C的东西.

是否有像Inline :: Shell这样的东西?我看了一下内联模块,但它只支持语言.



1> Daniel C. So..:

我会认真回答.我不知道将shell脚本转换为Perl的任何程序,我怀疑任何解释器模块都能提供性能优势.所以我将概述一下如何实现它.

现在,您希望尽可能多地重用代码.在这种情况下,我建议选择该代码的部分,编写它的Perl版本,然后从主脚本调用Perl脚本.这将使您能够以小步骤进行转换,断言转换后的部分正在工作,并逐步提高您的Perl知识.

你可以从Perl脚本调用外部程序,你甚至可以用Perl替换一些更大的逻辑,并从Perl调用更小的shell脚本(或其他命令)做一些你感觉不舒服的转换.所以你将有一个shell脚本调用perl脚本调用另一个shell脚本.事实上,我用我自己的第一个Perl脚本完成了这个.

当然,选择要转换的内容非常重要.我将在下面解释在Perl中编写shell脚本中常见的模式,以便您可以在脚本中识别它们,并通过尽可能多的剪切和粘贴来创建替换.

首先,Perl脚本和Shell脚本都是代码+函数.即,任何不是函数声明的东西都按照它遇到的顺序执行.但是,在使用之前不需要声明函数.这意味着可以保留脚本的总体布局,但是将内容保存在内存中的能力(如整个文件或其处理形式)可以简化任务.

在Unix中,Perl脚本以如下内容开头:

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper;
#other libraries

(rest of the code)

显然,第一行指向用于运行脚本的命令,就像普通的shell一样.以下两个"使用"行使语言更严格,这应该减少你遇到的错误的数量,因为你不熟悉语言(或平原做错了).第三个使用行导入"数据"模块的"转储"功能.它对于调试目的很有用.如果你想知道数组或散列表的值,只需打印Dumper(无论如何).

另请注意,注释就像shell一样,以"#"开头.

现在,您调用外部程序并从中管道或管道.例如:

open THIS, "cat $ARGV[0] |";

这将运行cat,传递" $ARGV[0]",在shell上为1美元 - 传递给它的第一个参数.其结果将通过"THIS"传递到您的Perl脚本中,您可以使用它从中读取它,我将在稍后展示.

你可以用"|" 在行的开头或结尾,指示模式"管道到"或"管道从",并指定要运行的命令,并且您也可以在开头使用">"或">>",以打开用于写入的文件,有或没有截断,"<"表示打开文件进行读取(默认),或"+ <"和"+>"进行读写.请注意,后者将首先截断文件.

"open"的另一种语法是将开放模式作为第二个参数,它将避免名称中包含此类字符的文件出现问题:

open THIS, "-|", "cat $ARGV[0]";

这将做同样的事情.模式" - |" 代表"管道从","| - "代表"管道到".其余模式可以原样使用(>, >>, <, +>, +<).虽然还有更多内容可以打开,但对大多数事情来说应该足够了.

但是你应该避免尽可能多地调用外部程序.open THIS, "$ARGV[0]";例如,您可以直接打开文件,并且具有更好的性能.

那么,你可以削减哪些外部程序?好吧,差不多一切.但是让我们留下基础知识:猫,grep,cut,head,tail,uniq,wc,sort.

嗯,关于这个没什么可说的.请记住,如果可能,只读取一次文件并将其保存在内存中.如果文件很大,你当然不会这样做,但几乎总有办法避免多次读取文件.

无论如何,cat的基本语法是:

my $filename = "whatever";
open FILE, "$filename" or die "Could not open $filename!\n";
while() {
  print $_;
}
close FILE;

这将打开一个文件,并打印所有内容(" while()"将循环直到EOF,将每一行分配给" $_"),然后再次关闭它.

如果我想将输出定向到另一个文件,我可以这样做:

my $filename = "whatever";
my $anotherfile = "another";
open (FILE, "$filename") || die "Could not open $filename!\n";
open OUT, ">", "$anotherfile" or die "Could not open $anotherfile for writing!\n";
while() {
  print OUT $_;
}
close FILE;

这会将行打印到" OUT" 表示的文件中.您也可以在适当的位置使用STDIN,STDOUTSTDERR无需先打开它们.实际上," print"默认为STDOUT," die"默认为" STDERR".

另请注意" or die ..."和" || die ...".运算符or||意味着它只会在第一个返回false时执行以下命令(这意味着空字符串,空引用,0等).die命令使用错误消息停止脚本.

" or"和" ||" 之间的主要区别是优先权.如果在上面的例子中" or"被" "替换||,它将无法按预期工作,因为该行将被解释为:

open FILE, ("$filename" || die "Could not open $filename!\n");

这根本不是预期的.由于" or"的优先级较低,因此可行.在||使用" " 的行中,参数将open在括号之间传递,从而可以使用" ||".

唉,有一些东西几乎与猫有关:

while(<>) {
  print $_;
}

这将打印命令行中的所有文件,或通过STDIN传递的任何文件.

GREP

那么,我们的"grep"脚本将如何工作?我假设"grep -E",因为在Perl中这比简单的grep更容易.无论如何:

my $pattern = $ARGV[0];
shift @ARGV;
while(<>) {
        print $_ if /$pattern/o;
}

The "o" passed to $patttern instructs Perl to compile that pattern only once, thus gaining you speed. Not the style "something if cond". It means it will only execute "something" if the condition is true. Finally, "/$pattern/", alone, is the same as "$_ =~ m/$pattern/", which means compare $_ with the regex pattern indicated. If you want standard grep behavior, ie, just substring matching, you could write:

print $_ if $_ =~ "$pattern";

CUT

Usually, you do better using regex groups to get the exact string than cut. What you would do with "sed", for instance. Anyway, here are two ways of reproducing cut:

while(<>) {
  my @array = split ",";
  print $array[3], "\n";
}

That will get you the fourth column of every line, using "," as separator. Note @array and $array[3]. The @ sigil means "array" should be treated as an, well, array. It will receive an array composed of each column in the currently processed line. Next, the $ sigil means array[3] is a scalar value. It will return the column you are asking for.

This is not a good implementation, though, as "split" will scan the whole string. I once reduced a process from 30 minutes to 2 seconds just by not using split -- the lines where rather large, though. Anyway, the following has a superior performance if the lines are expected to be big, and the columns you want are low:

while(<>) {
  my ($column) = /^(?:[^,]*,){3}([^,]*),/;
  print $column, "\n";
}

This leverages regular expressions to get the desired information, and only that.

If you want positional columns, you can use:

while(<>) {
  print substr($_, 5, 10), "\n";
}

Which will print 10 characters starting from the sixth (again, 0 means the first character).

HEAD

This one is pretty simple:

my $printlines = abs(shift);
my $lines = 0;
my $current;
while(<>) {
  if($ARGV ne $current) {
    $lines = 0;
    $current = $ARGV;
  }
  print "$_" if $lines < $printlines;
  $lines++;
}

Things to note here. I use "ne" to compare strings. Now, $ARGV will always point to the current file, being read, so I keep track of them to restart my counting once I'm reading a new file. Also note the more traditional syntax for "if", right along with the post-fixed one.

I also use a simplified syntax to get the number of lines to be printed. When you use "shift" by itself it will assume "shift @ARGV". Also, note that shift, besides modifying @ARGV, will return the element that was shifted out of it.

As with a shell, there is no distinction between a number and a string -- you just use it. Even things like "2"+"2" will work. In fact, Perl is even more lenient, cheerfully treating anything non-number as a 0, so you might want to be careful there.

This script is very inefficient, though, as it reads ALL file, not only the required lines. Let's improve it, and see a couple of important keywords in the process:

my $printlines = abs(shift);
my @files;
if(scalar(@ARGV) == 0) {
  @files = ("-");
} else {
  @files = @ARGV;
}
for my $file (@files) {
  next unless -f $file && -r $file;
  open FILE, "<", $file or next;
  my $lines = 0;

  while() {
    last if $lines == $printlines;
    print "$_";
    $lines++;
  }

  close FILE;
}

The keywords "next" and "last" are very useful. First, "next" will tell Perl to go back to the loop condition, getting the next element if applicable. Here we use it to skip a file unless it is truly a file (not a directory) and readable. It will also skip if we couldn't open the file even then.

Then "last" is used to immediately jump out of a loop. We use it to stop reading the file once we have reached the required number of lines. It's true we read one line too many, but having "last" in that position shows clearly that the lines after it won't be executed.

There is also "redo", which will go back to the beginning of the loop, but without reevaluating the condition nor getting the next element.

TAIL

I'll do a little trick here.

my $skiplines = abs(shift);
my @lines;
my $current = "";
while(<>) {
  if($ARGV ne $current) {
    print @lines;
    undef @lines;
    $current = $ARGV;
  }
  push @lines, $_;
  shift @lines if $#lines == $skiplines;
}
print @lines;

Ok, I'm combining "push", which appends a value to an array, with "shift", which takes something from the beginning of an array. If you want a stack, you can use push/pop or shift/unshift. Mix them, and you have a queue. I keep my queue with at most 10 elements with $#lines which will give me the index of the last element in the array. You could also get the number of elements in @lines with scalar(@lines).

UNIQ

Now, uniq only eliminates repeated consecutive lines, which should be easy with what you have seen so far. So I'll eliminate all of them:

my $current = "";
my %lines;
while(<>) {
  if($ARGV ne $current) {
    undef %lines;
    $current = $ARGV;
  }
  print $_ unless defined($lines{$_});
  $lines{$_} = "";
}

Now here I'm keeping the whole file in memory, inside %lines. The use of the % sigil indicates this is a hash table. I'm using the lines as keys, and storing nothing as value -- as I have no interest in the values. I check where the key exist with "defined($lines{$_})", which will test if the value associated with that key is defined or not; the keyword "unless" works just like "if", but with the opposite effect, so it only prints a line if the line is NOT defined.

Note, too, the syntax $lines{$_} = "" as a way to store something in a hash table. Note the use of {} for hash table, as opposed to [] for arrays.

WC

This will actually use a lot of stuff we have seen:

my $current;
my %lines;
my %words;
my %chars;
while(<>) {
  $lines{"$ARGV"}++;
  $chars{"$ARGV"} += length($_);
  $words{"$ARGV"} += scalar(grep {$_ ne ""} split /\s/);
}

for my $file (keys %lines) {
  print "$lines{$file} $words{$file} $chars{$file} $file\n";
}

Three new things. Two are the "+=" operator, which should be obvious, and the "for" expression. Basically, a "for" will assign each element of the array to the variable indicated. The "my" is there to declare the variable, though it's unneeded if declared previously. I could have an @array variable inside those parenthesis. The "keys %lines" expression will return as an array they keys (the filenames) which exist for the hash table "%lines". The rest should be obvious.

The third thing, which I actually added only revising the answer, is the "grep". The format here is:

grep { code } array

It will run "code" for each element of the array, passing the element as "$_". Then grep will return all elements for which the code evaluates to "true" (not 0, not "", etc). This avoids counting empty strings resulting from consecutive spaces.

Similar to "grep" there is "map", which I won't demonstrate here. Instead of filtering, it will return an array formed by the results of "code" for each element.

SORT

Finally, sort. This one is easy too:

my @lines;
my $current = "";
while(<>) {
  if($ARGV ne $current) {
    print sort @lines;
    undef @lines;
    $current = $ARGV;
  }
  push @lines, $_;
}
print sort @lines;

Here, "sort" will sort the array. Note that sort can receive a function to define the sorting criteria. For instance, if I wanted to sort numbers I could do this:

my @lines;
my $current = "";
while(<>) {
  if($ARGV ne $current) {
    print sort @lines;
    undef @lines;
    $current = $ARGV;
  }
  push @lines, $_;
}
print sort {$a <=> $b} @lines;

Here "$a" and "$b" receive the elements to be compared. "<=>" returns -1, 0 or 1 depending on whether the number is less than, equal to or greater than the other. For strings, "cmp" does the same thing.

HANDLING FILES, DIRECTORIES & OTHER STUFF

As for the rest, basic mathematical expressions should be easy to understand. You can test certain conditions about files this way:

for my $file (@ARGV) {
  print "$file is a file\n" if -f "$file";
  print "$file is a directory\n" if -d "$file";
  print "I can read $file\n" if -r "$file";
  print "I can write to $file\n" if -w "$file";
}

I'm not trying to be exaustive here, there are many other such tests. I can also do "glob" patterns, like shell's "*" and "?", like this:

for my $file (glob("*")) {
  print $file;
  print "*" if -x "$file" && ! -d "$file";
  print "/" if -d "$file";
  print "\t";
}

If you combined that with "chdir", you can emulate "find" as well:

sub list_dir($$) {
  my ($dir, $prefix) = @_;
  my $newprefix = $prefix;
  if ($prefix eq "") {
    $newprefix = $dir;
  } else {
    $newprefix .= "/$dir";
  }
  chdir $dir;
  for my $file (glob("*")) {
    print "$prefix/" if $prefix ne "";
    print "$dir/$file\n";
    list_dir($file, $newprefix) if -d "$file";
  }
  chdir "..";
}

list_dir(".", "");

Here we see, finally, a function. A function is declared with the syntax:

sub name (params) { code }

Strictly speakings, "(params)" is optional. The declared parameter I used, "($$)", means I'm receiving two scalar parameters. I could have "@" or "%" in there as well. The array "@_" has all the parameters passed. The line "my ($dir, $prefix) = @_" is just a simple way of assigning the first two elements of that array to the variables $dir and $prefix.

This function does not return anything (it's a procedure, really), but you can have functions which return values just by adding "return something;" to it, and have it return "something".

The rest of it should be pretty obvious.

MIXING EVERYTHING

Now I'll present a more involved example. I'll show some bad code to explain what's wrong with it, and then show better code.

For this first example, I have two files, the names.txt file, which names and phone numbers, the systems.txt, with systems and the name of the responsible for them. Here they are:

names.txt

John Doe, (555) 1234-4321
Jane Doe, (555) 5555-5555
The Boss, (666) 5555-5555

systems.txt

Sales, Jane Doe
Inventory, John Doe
Payment, That Guy

I want, then, to print the first file, with the system appended to the name of the person, if that person is responsible for that system. The first version might look like this:

#!/usr/bin/perl

use strict;
use warnings;

open FILE, "names.txt";

while() {
  my ($name) = /^([^,]*),/;
  my $system = get_system($name);
  print $_ . ", $system\n";
}

close FILE;

sub get_system($) {
  my ($name) = @_;
  my $system = "";

  open FILE, "systems.txt";

  while() {
    next unless /$name/o;
    ($system) = /([^,]*)/;
  }

  close FILE;

  return $system;
}

This code won't work, though. Perl will complain that the function was used too early for the prototype to be checked, but that's just a warning. It will give an error on line 8 (the first while loop), complaining about a readline on a closed filehandle. What happened here is that "FILE" is global, so the function get_system is changing it. Let's rewrite it, fixing both things:

#!/usr/bin/perl

use strict;
use warnings;

sub get_system($) {
  my ($name) = @_;
  my $system = "";

  open my $filehandle, "systems.txt";

  while(<$filehandle>) {
    next unless /$name/o;
    ($system) = /([^,]*)/;
  }

  close $filehandle;

  return $system;
}

open FILE, "names.txt";

while() {
  my ($name) = /^([^,]*),/;
  my $system = get_system($name);
  print $_ . ", $system\n";
}

close FILE;

This won't give any error or warnings, nor will it work. It returns just the sysems, but not the names and phone numbers! What happened? Well, what happened is that we are making a reference to "$_" after calling get_system, but, by reading the file, get_system is overwriting the value of $_!

To avoid that, we'll make $_ local inside get_system. This will give it a local scope, and the original value will then be restored once returned from get_system:

#!/usr/bin/perl

use strict;
use warnings;

sub get_system($) {
  my ($name) = @_;
  my $system = "";
  local $_;

  open my $filehandle, "systems.txt";

  while(<$filehandle>) {
    next unless /$name/o;
    ($system) = /([^,]*)/;
  }

  close $filehandle;

  return $system;
}

open FILE, "names.txt";

while() {
  my ($name) = /^([^,]*),/;
  my $system = get_system($name);
  print $_ . ", $system\n";
}

close FILE;

And that still doesn't work! It prints a newline between the name and the system. Well, Perl reads the line including any newline it might have. There is a neat command which will remove newlines from strings, "chomp", which we'll use to fix this problem. And since not every name has a system, we might, as well, avoid printing the comma when that happens:

#!/usr/bin/perl

use strict;
use warnings;

sub get_system($) {
  my ($name) = @_;
  my $system = "";
  local $_;

  open my $filehandle, "systems.txt";

  while(<$filehandle>) {
    next unless /$name/o;
    ($system) = /([^,]*)/;
  }

  close $filehandle;

  return $system;
}

open FILE, "names.txt";

while() {
  my ($name) = /^([^,]*),/;
  my $system = get_system($name);
  chomp;
  print $_;
  print ", $system" if $system ne "";
  print "\n";
}

close FILE;

That works, but it also happens to be horribly inefficient. We read the whole systems file for every line in the names file. To avoid that, we'll read all data from systems once, and then use that to process names.

Now, sometimes a file is so big you can't read it into memory. When that happens, you should try to read into memory any other file needed to process it, so that you can do everything in a single pass for each file. Anyway, here is the first optimized version of it:

#!/usr/bin/perl

use strict;
use warnings;

our %systems;
open SYSTEMS, "systems.txt";
while() {
  my ($system, $name) = /([^,]*),(.*)/;
  $systems{$name} = $system;
}
close SYSTEMS;

open NAMES, "names.txt";
while() {
  my ($name) = /^([^,]*),/;
  chomp;
  print $_;
  print ", $systems{$name}" if defined $systems{$name};
  print "\n";
}
close NAMES;

Unfortunately, it doesn't work. No system ever appears! What has happened? Well, let's look into what "%systems" contains, by using Data::Dumper:

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper;

our %systems;
open SYSTEMS, "systems.txt";
while() {
  my ($system, $name) = /([^,]*),(.*)/;
  $systems{$name} = $system;
}
close SYSTEMS;

print Dumper(%systems);

open NAMES, "names.txt";
while() {
  my ($name) = /^([^,]*),/;
  chomp;
  print $_;
  print ", $systems{$name}" if defined $systems{$name};
  print "\n";
}
close NAMES;

The output will be something like this:

$VAR1 = ' Jane Doe';
$VAR2 = 'Sales';
$VAR3 = ' That Guy';
$VAR4 = 'Payment';
$VAR5 = ' John Doe';
$VAR6 = 'Inventory';
John Doe, (555) 1234-4321
Jane Doe, (555) 5555-5555
The Boss, (666) 5555-5555

Those $VAR1/$VAR2/etc is how Dumper displays a hash table. The odd numbers are the keys, and the succeeding even numbers are the values. Now we can see that each name in %systems has a preceeding space! Silly regex mistake, let's fix it:

#!/usr/bin/perl

use strict;
use warnings;

our %systems;
open SYSTEMS, "systems.txt";
while() {
  my ($system, $name) = /^\s*([^,]*?)\s*,\s*(.*?)\s*$/;
  $systems{$name} = $system;
}
close SYSTEMS;

open NAMES, "names.txt";
while() {
  my ($name) = /^\s*([^,]*?)\s*,/;
  chomp;
  print $_;
  print ", $systems{$name}" if defined $systems{$name};
  print "\n";
}
close NAMES;

So, here, we are aggressively removing any spaces from the beginning or end of name and system. There are other ways to form that regex, but that's beside the point. There is still one problem with this script, which you'll have seen if your "names.txt" and/or "systems.txt" files have an empty line at the end. The warnings look like this:

Use of uninitialized value in hash element at ./exemplo3e.pl line 10,  line 4.
Use of uninitialized value in hash element at ./exemplo3e.pl line 10,  line 4.
John Doe, (555) 1234-4321, Inventory
Jane Doe, (555) 5555-5555, Sales
The Boss, (666) 5555-5555
Use of uninitialized value in hash element at ./exemplo3e.pl line 19,  line 4.

What happened here is that nothing went into the "$name" variable when the empty line was processed. There are many ways around that, but I choose the following:

#!/usr/bin/perl

use strict;
use warnings;

our %systems;
open SYSTEMS, "systems.txt" or die "Could not open systems.txt!";
while() {
  my ($system, $name) = /^\s*([^,]+?)\s*,\s*(.+?)\s*$/;
  $systems{$name} = $system if defined $name;
}
close SYSTEMS;

open NAMES, "names.txt" or die "Could not open names.txt!";
while() {
  my ($name) = /^\s*([^,]+?)\s*,/;
  chomp;
  print $_;
  print ", $systems{$name}" if defined($name) && defined($systems{$name});
  print "\n";
}
close NAMES;

The regular expressions now require at least one character for name and system, and we test to see if "$name" is defined before we use it.

CONCLUSION

Well, then, these are the basic tools to translate a shell script. You can do MUCH more with Perl, but that was not your question, and it wouldn't fit here anyway.

Just as a basic overview of some important topics,

A Perl script that might be attacked by hackers need to be run with the -T option, so that Perl will complain about any vulnerable input which has not been properly handled.

There are libraries, called modules, for database accesses, XML&cia handling, Telnet, HTTP & other protocols. In fact, there are miriads of modules which can be found at CPAN.

As mentioned by someone else, if you make use of AWK or SED, you can translate those into Perl with A2P and S2P.

Perl can be written in an Object Oriented way.

There are multiple versions of Perl. As of this writing, the stable one is 5.8.8 and there is a 5.10.0 available. There is also a Perl 6 in development, but experience has taught everyone not to wait too eagerly for it.

There is a free, good, hands-on, hard & fast book about Perl called Learning Perl The Hard Way. It's style is similar to this very answer. It might be a good place to go from here.

I hope this helped.

DISCLAIMER

I'm NOT trying to teach Perl, and you will need to have at least some reference material. There are guidelines to good Perl habits, such as using "use strict;" and "use warnings;" at the beginning of the script, to make it less lenient of badly written code, or using STDOUT and STDERR on the print lines, to indicate the correct output pipe.

This is stuff I agree with, but I decided it would detract from the basic goal of showing patterns for common shell script utilities.


如果只是为了表现出纯粹的全面性和努力,我必须对此进行投票!
对于一个不那么奇妙的问题,这是一个很棒的答案.
请尽可能使用`open()`的3参数形式.http://perldoc.perl.org/functions/open.html

2> Brian Agnew..:

我不知道你的shell脚本中有什么,但不要忘记有类似的工具

    a2p - awk-to-perl

    s2p - sed-to-perl

也许还有更多.值得一看.

您可能会发现,由于Perl的功能/特性,它并不是一项大工作,因为您可能已经通过各种bash功能和实用程序来跳过具有本地Perl功能的东西.

像任何迁移项目一样,使用两个解决方案运行一些预制回归测试是有用的,所以如果你没有这些,我会先生成这些.



3> j_random_hac..:

令人惊讶的是,还没有人提到内核Perl附带的Shell模块,该模块使您可以使用函数调用语法执行外部命令。例如(改编自摘要):

use Shell qw(cat ps cp);
$passwd = cat '

如果您使用了括号,您甚至可以使用$PATH您未在use网上提到的程序来调用其他程序,例如:

gcc('-o', 'foo', 'foo.c');

请注意,它将Shell收集子进程的STDOUT并将其作为字符串或数组返回。这样可以简化脚本编写,但是这并不是最有效的方法,如果您依赖未缓冲的命令输出,可能会引起麻烦。

模块文档提到了一些缺点,例如cd不能使用相同的语法调用shell内部命令(例如)。实际上,他们建议不要将模块用于生产系统!但是,在将代码移植到“适当的” Perl之前,一定要依靠它。

推荐阅读
Life一切安好
这个屌丝很懒,什么也没留下!
DevBox开发工具箱 | 专业的在线开发工具网站    京公网安备 11010802040832号  |  京ICP备19059560号-6
Copyright © 1998 - 2020 DevBox.CN. All Rights Reserved devBox.cn 开发工具箱 版权所有