Runaway Check Tutorial
TheCAT
11/26/2014
Objective: automate the process of examining all running processes on a list of linux/unix based servers and obtaining a list of processes that fall outside guidelines. These guidelines can be found on internal documentation like the DCM, but basically indicate there are cutoffs for CPU use, memory use and so on. The results should indicate which user and what process and machine are involved. At that point further consideration can be given on a case by case basis.
The goal of this lecture is to give you some legos to build things with. The legos you get here today will be most of the legos you need to build a runaway script with, but these legos can be used for other things. Cause who doesn't like legos; except unseen legos in the middle of the night with barefeet.
Minimum Command Requirements:
ssh
ps
Optional Command Requirements:
netgrouplist
ssh-keygen
ssh-agent
ssh-add
uptime
grep
echo
awk
sed
cut
>>
>
Possible Syntax:
for do done
if then else
` `
$
Tools:
vim/emacs/nano
Likely the end results of this script will be more complicated than the previous lesson's results. Again, when considering functionality examine the guidelines for runaway scripts in our in-house documentation.
Overview:
One way to tackle this is to use an ssh for-loop. Basically we set up a for-loop with every computer involved and ssh to that machine get the desired output, parse it to our satisfaction, and dump the results into a RESULTS.txt. From there as previously indicated we would examine things on a case by case basis.
Recall we will need some starting syntax at the top of our file:
#! /bin/bash
and we will ultimately need to chmod our script to be executable:
me@machine$ chmod 700 my_script.bash
That being said let's talk about keys.
Keys:
If you haven't got an ssh-key you will need one. Currently there are over 50 machines that need to be examined and you can't be entering in your password every time you run the script ... for each machine.
So, ssh to your favorite machine, I recommend a cs compute powerhouse not an irc box or lab box. Once there, generate an ssh-key if you don't have one.
yourname@machine$ ssh-keygen -t rsa -b 4096
and when prompted for a passphrase, give it a decent one that you can remember. Accept the default file name given.
Now let's get your key onto machines where it has to go. The thing is though your key has a passphrase associated with it, so every time you use your key you now have to answer the passphrase - so how is that different than just entering the password to the machine?
We can use an agent to bypass that. The agent will store our passphrase for us so we don't have to enter it over and over.
yourname@machine$ ssh-agent bash
yourname@machine$ ssh-add -t 3600
Enter passphrase for /u/yourname/.ssh/id_rsa:
yourname@machine$
First we started an agent with a new bash shell. Go ahead and up-arrow. All your previous commands are gone. If you type exit (don't do this now) it will appear that nothing happened, but actually it closes the agent shell. Then we allowed the agent to have access to our key for 3,600 seconds. The clock is ticking.
So now let's transfer our public key to the machines we want to get to. We are going to do this in a for-loop on the command line to practice our syntax.
yourname@machine$ for machine in aaa bbb ccc; do ssh-copy-id yourname@$machine; done
We will still have to enter our password to ssh to those machines, but only one last time. Why did I pick those three machines? Hint: NFS mounted directories.
Since our directories are NFS mounted, did we actually have to use ssh-copy-id at all?
Now we can get around without passwords as long as we have an agent running. You need to remember the agent when running/ testing your script.
But what machines to use? This is in the DCM but we'll go over it a bit more here.
From your favorite machine:
yourname@machine$ netgrouplist -l
This gives you a listing of all our machines grouped by various categories. We are concerned with the following three: linux-login-sys, ece-secure-sys and cs-secure-sys.
yourname@machine$ netgrouplist cs-secure-sys
Because our inventory and the names of machines changes constantly, we want to make certain we use whatever is most current. As such we are either going to (A) dynamically import this list of machines from these three lists straight into our script or (B) put them all in one list and use the contents of that list in our
script. Remember the name of each machine is part of a for-loop just like it was in our command line example.
Here is how you could get a list into the script, you use backticks:
#! /bin/bash
for list in `netgrouplist -l`; do
echo $list
done
Go ahead make this script, chmod it accordingly and run it. How would you change it to get a list of machines instead of a list of lists? How would you change it to get three lists of machines?
What if you want to put the list results in a file? There are many commands that put their results at stdout (the screen). But what if you want those results in a file?
Change the above script as follows:
#! /bin/bash
for machine in `netgrouplist cs-secure-sys`; do
echo $machine >> RESULTS.out
done
Now run it
yourname@machine$ ./your_script.bash
yourname@machine$ ls
RESULTS.out
We could have gotten the same thing as follows:
yourname@machine$ netgrouplist cs-secure-sys >> MYRESULTS.out
The >> is called a redirection operator and it redirects stdout to a file of your choosing, specifically it appends data to that file - it doesn't over write it. If no file by that name exists, it is created.
The > operator is very similar EXCEPT that it overwrites data. Any old data in the file will be lost.
Now that we can get to machines let's get that data about those processes.
yourname@machine$ ps
yourname@machine$ ps -ef
yourname@machine$ man ps
As you can see ps is a very powerful tool. Once you have a chance to read the man pagea bit, take a look at the examples portion. You should be able to get some ideas about how to construct your ps.
I am specifically NOT going to tell you what the flags are for ps that you will want. You should read the man page and get some ideas, then experiment with the results locally, and incorporate that into your script.
Look at the requirements document (DCM) and see what matches in the man page. Once you have built your script and come up with ideas that work (on your own) I will be happy to answer any questions about how to make it better.
For now we will practice the fundamentals with ps -ef
But ps gives us some out put and we want to be able to separate that output based on results. One thing we can do to eliminate some out put is with grep. Grep stands for Grab REgular exPressions. No, I am lying, but it should.
So we can use grep to exclude things - like maybe anything that is a daemon or root (if we want), with -v ie
yourname@machine$ ps -ef | grep -v root > test.out
In reality, if we are looking for runaway processes, do we really want to reject a process because it is being run by root? What if that process owned by root is busted or worse - shouldn't be running as a root process. I would want to know about a process consuming 100% cpu even if it is run by root.
Now we know how to get some results to play with, let's do that.
#! /bin/bash
for machine in `netgrouplist cs-secure-sys`; do
ssh $machine ps -ef >> temp.out
done
awk '($1 !~/USER/) && ($2 > 32000)' temp.out >> RESULTS.out
rm temp.out
What did we get?
yourname@machine$ cat RESULTS.out
You now have most of the tools you need to construct a runaway script. You will need to spend more time with ps and awk (or sed) to actually get the info you need and parse it. Consider adding uptime as well.
No comments:
Post a Comment