Sun Circle

Friday, October 19, 2012

Tuesday, May 8, 2012

Show changed files in a commit: git show --name-only [SHA]
Show diff files between two commits: git diff --name-only SHA1 SHA2
Show diff files with operations: git diff --name-status SHA1 SHA2
Show modified files with operations: git log --name-status [SHA]

Saturday, July 9, 2011

Using dm-crypt and ecryptfs for data encryption

First of all, dm-crypt is a device mapper in Linux kernel to provide an encrypted disk. eCryptfs is a stacked, or layered filesystem on top of other fs to provide an encrypted view.

dm-crypt To use dm-crypt, it is better using cryptsetup tool instead of the original dmsetup for device mapper. In a Ubuntu box, you need to install it and a hashalot tool first:
sudo apt-get install cryptsetup hashalot
Then modprobe dm-crypt to load the kernel module. To use a device, say /dev/sdb1, we create a mapper like this(su to root to avoid sudo):
cryptsetup -c [cipher-string] -b `blockdev --getsize /dev/sdb1` -h ripemd160 create mapper1 /dev/sdb1
You can find more information in man cryptsetup, here -c specify the cipher in the format of: --, e.g. aes-cbc-plain will use AES algorithm in CBC cipher mode and a 'plain' IV generator. -h ripemd160 specify the hash method to get keys from your passphrase, which will be asked for by the cryptsetup command above. This command create a mapper under /dev/mapper called mapper1. Then you can treat it as a hard drive and mount it, or create a fs first.
Current dm-crypt doesn't allow ECB mode, however, you can easily find the code preventing ECB in /drivers/md/dm-crypt.c and comment it.

To delete it,

cryptsetup remove mapper1

eCryptfs is simpler, just get any empty folder, say /home/name/data, and do a mount:
mount -t ecryptfs /home/name/data /home/name/data
This will hide /home/name/data with the encrypted view provided by eCryptfs. The mount will ask you for some questions, just answer them as you like. Oh, don't forget to modporbe ecryptfs for the kernel module.

Now you might want to measure the performance. Every benchmark has its own features, but one important thing is to clear the page cache before read:
echo 3 > /proc/sys/vm/drop_caches

By the way, both of them work well with KGPU's AES cipher with a little change.

More tips:

OK, more about RAID:

To setup a RAID6 with 10 disks and 128KB chunk using RAMDisks:

mdadm --create /dev/md0 /dev/ram[0-9] -n 10 -l 6 --chunk=128

To set a faulty:

mdadm --manage --set-faulty /dev/md0 /dev/ramX

To remove it:

mdadm --stop /dev/md0

And about dd:

Using [i/o]flag=direct to test real disk performance.

Tuesday, June 21, 2011

Tips from Perl SNMP and Expect project

First of all, a good place to obtain MIBs and query them is SNMP-Link: http://www.snmplink.org/OnLineMIB/.
To use Expect with Perl, a code snippet from Emulab snmpit tool:


    my $spawn_cmd = "telnet $self->{NAME}";
    my $exp = new Expect();
    if (!$exp) {
        # upper layer will check this
        return undef;
    }
    $exp->raw_pty(0);
    $exp->log_stdout(0);
    $exp->spawn($spawn_cmd)
    or die "Cannot spawn $spawn_cmd: $!\n";

    # have to send a newline to make login prompt come out
    $exp->send("\n");
    $exp->expect($CLI_TIMEOUT,
         ["login:", sub {
             my $e = shift;
             $e->send("admin\n");
             exp_continue;}],
         ["password:", sub {
             my $e = shift;
             $e->send($self->{PASSWORD}."\n");
             exp_continue;}],
         [timeout => sub {
             my $e = shift;
             $e->hard_close();
             die "timeout when connect to switch!\n";}],
         [qr/$self->{CLI_PROMPT}/, sub {;}]);

    # Disable paging in CLI output so that no "next page"
    # will bother us.
    $exp->send("disable clipaging\n");
    $exp->expect($CLI_TIMEOUT,
         [qr/$CLI_ERROR_PATTERN/,
          sub {
              my $e = shift;
              die "Incorrect command and result: ".$e->match().
              $e->after()."\n";}],
         [qr/$CLI_PROMPT/ => sub {;}],
         [timeout => sub {
             my $e = shift;
             $e->hard_close();
             die "timeout when connect to switch!\n";}]);

Tuesday, April 19, 2011

Introducing KGPU

The OS augmentation with GPU I mentioned in the last post turns out to be a framework on Linux, the KGPU system. It is now available on Github and Google Code.

To run it, a CUDA-enabled GPU is required and it is better with compute capability >= 2.0 support. There is a demo GPU service to provide AES algorithm and a block cipher for Linux kernel for calling GPU-cipher by other kernel code. There is also a modified eCryptfs, which is an encrypted filesytem in Linux. The GPU-cipher is called in this special eCryptfs to replace the CPU one. When reading and writing with large buffers(>= 8 or 16KB), the performance can be from 1.7 to 2.5 x faster or even higher. You can run Iozone on it to see how faster can the GPU-cipher based eCryptfs run over the CPU one.

We are going to implement next demo app for KGPU. And if you are interested in it, either implementing new app or new service, feel free to contact me. (The KGPU framework is quite simple now, so I hope others can figure out how to add services and apps easily... but if not, I can help you.)

Friday, March 11, 2011

Lessons learned from the HotOS rejection

We submitted a paper to HotOS'11 and was recently rejected. Although it is not comfortable to admit the weakness pointed out by the reviewers, lessons are still valuable for the future research and academia life.

I re-thought the points we made and the emphasized contributions in that paper, and surprisingly found that we actually headed to an incorrect direction that has so few related interests of HotOS.

About the paper: The paper is about augmenting OSes with the GPU. We used some programming tricks to make GPU code callable from a Linux kernel.

Some lessons:

We filled most content of the paper with the technical challenges and solutions, in which the challenges can be easily eliminated by very-near-future development of GPUs. What's more surprising is that we even also pointed out the near-future solutions to those challenges but still wrote almost two of the limited five overall pages of fluffy to show off our techniques. This is definitely not acceptable wrt. the HotOS, which focuses on the long-run trends and solutions. The review comments also pointed out that mis-directed writing.
We failed to identify the problem. Until recently when I keep revising my research proposal I realized that we didn't yet point out any problem in that paper. The widely deployed GPUs and ignorance of OSes are absolutely not problems! They are current states. The problem is that the because of the development of various applications, OS needs more computing resources that the current CPUs can not give and what's more, current OSes are not designed with partitioning-based parallel algorithms so that they can not process a single large request by splitting it and handling them on different cores. Simply, the lack of computing resources and obsoleted parallel design of OSes are the two problems we need to solve. It just happens that GPUs are widely deployed and idled and they even happen to have massively parallel processing power.
We fail to defend ourselves by explaining why GPU will make things different but not previously existed cryptographic accelerators. It is simple, GPUs are widely deployed and available on almost every machines but other accelerators are not. We did mentioned the wide-deployment of GPUs, but did not say the comparison EXPLICITLY.
I did not discuss this idea with different experts and professionals to get responses and comments for improving and consolidating it. The original idea is easy to be a weak one. Comments and feedbacks and critics will let me make it perfect to against attack by changing it to a maybe totally different idea or direction.
I do need do things as soon as possible rather than put off and delay it until the deadline. So that there would be time to broadcast it for feedbacks and comments and hence improvement.

Hope it will be getting better.

The paper link (pdf).

Tuesday, February 15, 2011

A not-so-standard non-blocking I/O implementation

It is not easy to write a proc file module, or a char device, or other similar fs related stuffs with non-blocking I/O support. But if it happens that you have full control of both userspace and kernel space code, a tricky approach can be made to do the non-blocking thing.

Simply, return -EAGAIN in the read syscall implementation immediately once the data is unavailable. Like here:

static int reqfs_read(char *buf, char **bufloc,
                      off_t offset, int buflen,
                      int *eof, void *data)
{
    int ret;
    struct list_head *r;
    struct kgpu_req *req = NULL;
    ret = 0;

    spin_lock(&reqlock);

    if (!list_empty(&reqs)) {
        r = reqs.next;
        list_del(r);
        req = list_entry(r, struct kgpu_req, list);
        if (req) {
            copy_to_user(buf, (char*)&(req->kureq), sizeof(struct ku_request));
            ret = sizeof(struct ku_request);
        }
    } else {
        ret = -EAGAIN;
    }

    spin_unlock(&reqlock);

    if (ret > 0 && req) {
        spin_lock(&rtdreqlock);

        INIT_LIST_HEAD(&req->list);
        list_add_tail(&req->list, &rtdreqs);

        spin_unlock(&rtdreqlock);
    }

    return ret;
}

Ignore the lock and list stuffs. If no data available, which is indicated by an empty list, the read of a proc fs item will return -EAGAIN to let the userspace code try again. In the userspace, read syscall will return -1, and the errno will be EAGAIN, the return value and errno must be checked explicitly to ensure the try-again response not to be treated as a fault error.

Sunday, January 9, 2011

Use kernel crypto lib

An Emacs tip first: M-q format paragraph

I only dealt with encryption and decryption, compression and hash seem similar but still need relative hack work if you try to use them.

Firstly you need allocate a cipher, which is identified by both the cipher mode(say ECB) and the cipher algorithm(say AES). The API is struct crypto_blkcipher crypto_alloc_blkcipher(char* name, type, mask). I have no idea about the type and mask so simply set them 0. The name is in such format "CipherMode(CipherAlgorithm)", e.g. "ecb(aes)". You can find examples from /proc/crypto and by loading all modules from /lib/module/kernel-verion/kernel/crypto and then check /proc/crypto. An algorithm has two IDs: name and driver. Name may be shared by other implementations like a HW-accelerated one, but driver is unique. So you can specify the driver to alloc exact what you want, like "ecb(aes-generic)" to avoid loading "ecb(aes-asm)" in which the 'aes-asm' has higher priority than 'aes-generic' when allocating a cipher.

Data are represented in a scatterlist which you can think to be a triple tuple (page_address, offset, len). It is a struct scatterlist. You can initialize a list of scatterlist objects by sg_init_table(scatterlist*, n). Don't be confused by the name scatterlist, it is actually an element because it just has three important fields as described in the triple tuple. To set the tuple, use sg_set_buf(scatterlist*, void* buf, len) which means this scatterlist tuple represent a buffer from buf and has a length of len.

Before encryption/decryption, set the key with crypto_blkcipher_setkey(struct crypto_blkcipher*, char* key, key_len) to set a key with length of key_len for the cipher. Then save the cipher in a struct blkcipher_desc. (See the code)

Then you can do real things with crypto_blkcipher_encrypt(struct blkcipher_desc*, scatterlist *dst, scatterlist *src, len) and crypto_blkcipher_decrypt(struct blkcipher_desc*, scatterlist *dst, scatterlist *src, len).

Finally free the cipher with crypto_free_blkcipher(struct *crypto_blkcipher).

Here is an example test of AES128-ECB:

#include <linux/module.h>
#include <linux/init.h>
#include <linux/types.h>
#include <linux/crypto.h>
#include <linux/scatterlist.h>
#include <linux/gfp.h>
#include <linux/err.h>
#include <linux/timex.h>

#define MAX_BLK_SIZE (64*1024*1024)
#define MIN_BLK_SIZE (16)

#define TEST_TIMES 100

/* Test ECB(AES-128) */
void test_aes(void)
{
    struct crypto_blkcipher *tfm;
    struct blkcipher_desc desc;
    u32 bs;
    int i,j;
    u32 npages;
    
    struct scatterlist *src;
    struct scatterlist *dst;
    char *buf;
    char **ins, **outs;
    
    unsigned int ret;
    
    u8 key[] = {0x00, 0x01, 0x02, 0x03, 0x05, 0x06, 0x07, 
        0x08, 0x0A, 0x0B, 0x0C, 0x0D, 0x0F, 0x10, 0x11, 0x12};
    
    npages = MAX_BLK_SIZE/PAGE_SIZE;
    
    src = kmalloc(npages*sizeof(struct scatterlist), __GFP_ZERO|GFP_KERNEL);
    if (!src) {
        printk("taes ERROR: failed to alloc src\n");        
        return;
    }
    dst = kmalloc(npages*sizeof(struct scatterlist), __GFP_ZERO|GFP_KERNEL);
    if (!dst) {
        printk("taes ERROR: failed to alloc dst\n");
        kfree(src);        
        return;
    }
    ins = kmalloc(npages*sizeof(char*), __GFP_ZERO|GFP_KERNEL);
    if (!ins) {
        printk("taes ERROR: failed to alloc ins\n");
        kfree(src);
        kfree(dst);        
        return;
    }
    outs = kmalloc(npages*sizeof(char*), __GFP_ZERO|GFP_KERNEL);
    if (!outs) {
        printk("taes ERROR: failed to alloc outs\n");
        kfree(src);
        kfree(dst);
        kfree(ins);        
        return;
    }
    
    tfm = crypto_alloc_blkcipher("ecb(aes-generic)", 0, 0);
    
    if (IS_ERR(tfm)) {
        printk("failed to load transform for %s: %ld\n", AES_GENERIC,
            PTR_ERR(tfm));
        goto out;
    }
    desc.tfm = tfm;
    desc.flags = 0;
    
    ret = crypto_blkcipher_setkey(tfm, key, sizeof(key));
    if (ret) {
        printk("setkey() failed flags=%x\n",
                crypto_blkcipher_get_flags(tfm));
         goto out;
    }
    
    sg_init_table(src, npages);
    for (i=0; i<npages; i++) {
        buf = (void *)__get_free_page(GFP_KERNEL);
        if (!buf) {
            printk("taes ERROR: alloc free page error\n");
            goto free_err_pages;
        }
        ins[i] = buf;
        sg_set_buf(src+i, buf, PAGE_SIZE);
        buf = (void *)__get_free_page(GFP_KERNEL);
        if (!buf) {
            printk("taes ERROR: alloc free page error\n");
            goto free_err_pages;
        }
        outs[i] = buf;
        sg_set_buf(dst+i, buf, PAGE_SIZE);
    }
    
    for (bs = MIN_BLK_SIZE; bs <= MAX_BLK_SIZE; bs*=2) {
        struct timeval t0, t1;
        long int enc, dec;

        do_gettimeofday(&t0);
        for (j=0; j<TEST_TIMES; j++) {            
            if (j%2==0)
                ret = crypto_blkcipher_encrypt(&desc, dst, src, bs);
            else
                ret = crypto_blkcipher_encrypt(&desc, src, dst, bs);
            if (ret) {
                printk("taes ERROR: enc error\n");
                goto free_err_pages;
            }
        }
        do_gettimeofday(&t1);
        enc = 1000000*(t1.tv_sec-t0.tv_sec) + 
            ((int)(t1.tv_usec) - (int)(t0.tv_usec));
            
        do_gettimeofday(&t0);
        for (j=0; j<TEST_TIMES; j++) {            
            if (j%2==0)
                ret = crypto_blkcipher_decrypt(&desc, src, dst, bs);
            else
                ret = crypto_blkcipher_decrypt(&desc, dst, src, bs);
            if (ret) {
                printk("taes ERROR: dec error\n");
                goto free_err_pages;
            }
        }
        do_gettimeofday(&t1);
        dec = 1000000*(t1.tv_sec-t0.tv_sec) + 
            ((int)(t1.tv_usec) - (int)(t0.tv_usec));
        
        printk("Size %u, enc %ld, dec %ld\n",
            bs, enc, dec);
    }
    
    
free_err_pages:
    for (i=0; i<npages && ins[i]; i++){        
        free_page((unsigned long)ins[i]);
    }
    for (i=0; i<npages && outs[i]; i++){
        free_page((unsigned long)outs[i]);
    }
out:
    kfree(src);
    kfree(dst);
    kfree(ins);
    kfree(outs);
    crypto_free_blkcipher(tfm);    
}

Tuesday, December 21, 2010

Convert userspace virtual address to physical address, x86_64

It is somewhat strange that kernel module developers have to write their own function to convert the userspace VA to PA. It is not very hard, but still not so easy if you are not familiar with those complex types that represent the page-tables.

It is much easier to convert the kernel VA to PA, but userspace is a different story, especially some mmap-ed memory area that is mapped by a device driver, in which case the get_user_pages can't work.

OK, I wrote one and am posting it here in case someone or I will use it in future. It can only handle x86_64 which has a full four-level-page-table. I didn't include any headers, because headers always change..

static int bad_address(void *p)
{
    unsigned long dummy;
    return probe_kernel_address((unsigned long*)p, dummy);
}

/*
 * map any virtual address of the current process to its
 * physical one.
 */
static unsigned long any_v2p(unsigned long vaddr)
{
    pgd_t *pgd = pgd_offset(current->mm, vaddr);
    pud_t *pud;
    pmd_t *pmd;
    pte_t *pte;

    /* to lock the page */
    struct page *pg;
    unsigned long paddr;

    if (bad_address(pgd)) {
        printk(KERN_ALERT "[nskk] Alert: bad address of pgd %p\n", pgd);
        goto bad;
    }
    if (!pgd_present(*pgd)) {
        printk(KERN_ALERT "[nskk] Alert: pgd not present %lu\n", *pgd);
        goto out;
    }

    pud = pud_offset(pgd, vaddr);
    if (bad_address(pud)) {
        printk(KERN_ALERT "[nskk] Alert: bad address of pud %p\n", pud);
        goto bad;
    }
    if (!pud_present(*pud) || pud_large(*pud)) {
        printk(KERN_ALERT "[nskk] Alert: pud not present %lu\n", *pud);
        goto out;
    }

    pmd = pmd_offset(pud, vaddr);
    if (bad_address(pmd)) {
        printk(KERN_ALERT "[nskk] Alert: bad address of pmd %p\n", pmd);
        goto bad;
    }
    if (!pmd_present(*pmd) || pmd_large(*pmd)) {
        printk(KERN_ALERT "[nskk] Alert: pmd not present %lu\n", *md);
        goto out;
    }

    pte = pte_offset_kernel(pmd, vaddr);
    if (bad_address(pte)) {
        printk(KERN_ALERT "[nskk] Alert: bad address of pte %p\n", pte);
        goto bad;
    }
    if (!pte_present(*pte)) {
        printk(KERN_ALERT "[nskk] Alert: pte not present %lu\n", *pte);
        goto out;
    }

    pg = pte_page(*pte);
    paddr = (pte_val(*pte) & PHYSICAL_PAGE_MASK) | (vaddr&(PAGE_SIZE-1));

out:
    return paddr;
bad:
    printk(KERN_ALERT "[nskk] Alert: Bad address\n");
    return 0;
}