当前位置:  开发笔记 > 编程语言 > 正文

高可用性存储

如何解决《高可用性存储》经验,为你挑选了1个好方法。

我想通过NFS和CIFS提供2 TB左右.我正在寻找一个2(或更多)服务器解决方案,以实现高可用性,并尽可能在服务器之间实现负载平衡.有关群集或高可用性解决方案的任何建议吗?

这是商业用途,计划在未来几年内增长到5-10 TB.我们的设施几乎每天24小时,每周六天.我们可能有15-30分钟的停机时间,但我们希望尽量减少数据丢失.我想尽量减少凌晨3点的电话.

我们目前在Solaris上运行一台带有ZFS的服务器,我们正在考虑用于HA部分的AVS,但是我们在Solaris上遇到了一些小问题(CIFS实现不能与Vista一起使用等).

我们已经开始关注了

GFS上的DRDB(GFS用于分布式锁定功能)

Gluster(需要客户端部分,没有本地CIFS支持吗?)

Windows DFS(doc说文件关闭后只复制?)

我们正在寻找一个提供数据的"黑匣子".

我们目前在ZFS中对数据进行快照,并通过网络将快照发送到远程数据中心进行异地备份.

我们最初的计划是每10到15分钟就有一台第二台机器和rsync.失败的问题在于,正在进行的生产过程将丢失15分钟的数据并留在"中间".从一开始,它们几乎更容易开始,而不是找出中间拾取的位置.这就是驱使我们看待HA解决方案的原因.



1> Tony Dodd..:

我最近使用DRBD作为后端部署了hanfs,在我的情况下,我正在运行主动/待机模式,但我也在主/主模式下使用OCFS2成功测试了它.遗憾的是,关于如何最好地实现这一点的文档并不多,大多数存在的文章几乎没有用处.如果您沿着drbd路线前进,我强烈建议您加入drbd邮件列表,并阅读所有文档.这是我写的ha/drbd设置和脚本来处理ha的失败:


DRBD8是必需的 - 这是由drbd8-utils和drbd8-source提供的.一旦安装了这些(我相信它们是由backports提供的),你可以使用模块助手安装它 - ma ai drbd8.要么depmod -a,要么重新启动,如果你depmod -a,你需要modprobe drbd.

你需要一个后端分区用于drbd,不要制作这个分区LVM,否则你会遇到各种各样的问题.不要将LVM放在drbd设备上,否则你会遇到各种各样的问题.

Hanfs1:

/etc/drbd.conf

global {
        usage-count no;
}
common {
        protocol C;
        disk { on-io-error detach; }
}
resource export {
        syncer {
                rate 125M;
        }
        on hanfs2 {
                address         172.20.1.218:7789;
                device          /dev/drbd1;
                disk            /dev/sda3;
                meta-disk       internal;
        }
        on hanfs1 {
                address         172.20.1.219:7789;
                device          /dev/drbd1;
                disk            /dev/sda3;
                meta-disk       internal;
       }
}

Hanfs2的/etc/drbd.conf:

global {
        usage-count no;
}
common {
        protocol C;
        disk { on-io-error detach; }
}
resource export {
        syncer {
                rate 125M;
        }
        on hanfs2 {
                address         172.20.1.218:7789;
                device          /dev/drbd1;
                disk            /dev/sda3;
                meta-disk       internal;
        }
        on hanfs1 {
                address         172.20.1.219:7789;
                device          /dev/drbd1;
                disk            /dev/sda3;
                meta-disk       internal;
       }
}

配置完成后,我们需要调出drbd.

drbdadm create-md export
drbdadm attach export
drbdadm connect export

我们现在必须执行数据的初始同步 - 显然,如果这是一个全新的drbd集群,那么选择哪个节点并不重要.

完成后,你需要在你的drbd设备上使用mkfs.yourchoiceoffiles - 上面配置中的设备是/ dev/drbd1.http://www.drbd.org/users-guide/p-work.html是一个在使用drbd时阅读的有用文档.

心跳

安装heartbeat2.(非常简单,apt-get install heartbeat2).

每台机器上的/etc/ha.d/ha.cf应包括:

hanfs1:

logfacility local0
keepalive 2
warntime 10
deadtime 30
initdead 120

ucast eth1 172.20.1.218

auto_failback no

node hanfs1 node hanfs2

hanfs2:

logfacility local0
keepalive 2
warntime 10
deadtime 30
initdead 120

ucast eth1 172.20.1.219

auto_failback no

node hanfs1 node hanfs2

两个ha盒子上的/etc/ha.d/haresources应该是相同的:

hanfs1 IPaddr::172.20.1.230/24/eth1
hanfs1  HeartBeatWrapper

我写了一个包装器脚本来处理故障转移场景中由nfs和drbd引起的特性.此脚本应存在于每台计算机上的/etc/ha.d/resources.d/中.

!/bin/bash heartbeat fails hard. so this is a wrapper to get around that stupidity I'm just wrapping the heartbeat scripts, except for in the case of umount as they work, mostly

if [[ -e /tmp/heartbeatwrapper ]]; then runningpid=$(cat /tmp/heartbeatwrapper) if [[ -z $(ps --no-heading -p $runningpid) ]]; then echo "PID found, but process seems dead. Continuing." else
echo "PID found, process is alive, exiting."
exit 7
fi
fi

echo $$ > /tmp/heartbeatwrapper

if [[ x$1 == "xstop" ]]; then

/etc/init.d/nfs-kernel-server stop #>/dev/null 2>&1

NFS init script isn't LSB compatible, exit codes are 0 no matter what happens. Thanks guys, you really make my day with this bullshit. Because of the above, we just have to hope that nfs actually catches the signal to exit, and manages to shut down its connections. If it doesn't, we'll kill it later, then term any other nfs stuff afterwards. I found this to be an interesting insight into just how badly NFS is written.

sleep 1

#we don't want to shutdown nfs first!
#The lock files might go away, which would be bad.

#The above seems to not matter much, the only thing I've determined
#is that if you have anything mounted synchronously, it's going to break
#no matter what I do.  Basically, sync == screwed; in NFSv3 terms.      
#End result of failing over while a client that's synchronous is that   
#the client hangs waiting for its nfs server to come back - thing doesn't
#even bother to time out, or attempt a reconnect.                        
#async works as expected - it insta-reconnects as soon as a connection seems
#to be unstable, and continues to write data.  In all tests, md5sums have   
#remained the same with/without failover during transfer.                   

#So, we first unmount /export - this prevents drbd from having a shit-fit
#when we attempt to turn this node secondary.                            

#That's a lie too, to some degree. LVM is entirely to blame for why DRBD
#was refusing to unmount.  Don't get me wrong, having /export mounted doesn't
#help either, but still.                                                     
#fix a usecase where one or other are unmounted already, which causes us to terminate early.

if [[ "$(grep -o /varlibnfs/rpc_pipefs /etc/mtab)" ]]; then                                 
    for ((test=1; test <= 10; test++)); do                                                  
        umount /export/varlibnfs/rpc_pipefs  >/dev/null 2>&1                                
        if [[ -z $(grep -o /varlibnfs/rpc_pipefs /etc/mtab) ]]; then                        
            break                                                                           
        fi                                                                                  
        if [[ $? -ne 0 ]]; then                                                             
            #try again, harder this time                                                    
            umount -l /var/lib/nfs/rpc_pipefs  >/dev/null 2>&1                              
            if [[ -z $(grep -o /varlibnfs/rpc_pipefs /etc/mtab) ]]; then                    
                break                                                                       
            fi                                                                              
        fi                                                                                  
    done                                                                                    
    if [[ $test -eq 10 ]]; then                                                             
        rm -f /tmp/heartbeatwrapper                                                         
        echo "Problem unmounting rpc_pipefs"                                                
        exit 1                                                                              
    fi                                                                                      
fi                                                                                          

if [[ "$(grep -o /dev/drbd1 /etc/mtab)" ]]; then                                            
    for ((test=1; test <= 10; test++)); do                                                  
        umount /export  >/dev/null 2>&1                                                     
        if [[ -z $(grep -o /dev/drbd1 /etc/mtab) ]]; then                                   
            break                                                                           
        fi                                                                                  
        if [[ $? -ne 0 ]]; then                                                             
            #try again, harder this time                                                    
            umount -l /export  >/dev/null 2>&1                                              
            if [[ -z $(grep -o /dev/drbd1 /etc/mtab) ]]; then                               
                break                                                                       
            fi                                                                              
        fi                                                                                  
    done                                                                                    
    if [[ $test -eq 10 ]]; then                                                             
        rm -f /tmp/heartbeatwrapper                                                         
        echo "Problem unmount /export"                                                      
        exit 1                                                                              
    fi                                                                                      
fi                                                                                          


#now, it's important that we shut down nfs. it can't write to /export anymore, so that's fine.
#if we leave it running at this point, then drbd will screwup when trying to go to secondary.  
#See contradictory comment above for why this doesn't matter anymore.  These comments are left in
#entirely to remind me of the pain this caused me to resolve.  A bit like why churches have Jesus
#nailed onto a cross instead of chilling in a hammock.                                           

pidof nfsd | xargs kill -9 >/dev/null 2>&1

sleep 1                                   

if [[ -n $(ps aux | grep nfs | grep -v grep) ]]; then
    echo "nfs still running, trying to kill again"   
    pidof nfsd | xargs kill -9 >/dev/null 2>&1       
fi                                                   

sleep 1

/etc/init.d/nfs-kernel-server stop #>/dev/null 2>&1

sleep 1

#next we need to tear down drbd - easy with the heartbeat scripts
#it takes input as resourcename start|stop|status                
#First, we'll check to see if it's stopped                       

/etc/ha.d/resource.d/drbddisk export status >/dev/null 2>&1
if [[ $? -eq 2 ]]; then                                    
    echo "resource is already stopped for some reason..."  
else                                                       
    for ((i=1; i <= 10; i++)); do                          
        /etc/ha.d/resource.d/drbddisk export stop >/dev/null 2>&1
        if [[ $(egrep -o "st:[A-Za-z/]*" /proc/drbd | cut -d: -f2) == "Secondary/Secondary" ]] || [[ $(egrep -o "st:[A-Za-z/]*" /proc/drbd | cut -d: -f2) == "Secondary/Unknown" ]]; then                                                                                                                             
            echo "Successfully stopped DRBD"                                                                                                             
            break                                                                                                                                        
        else                                                                                                                                             
            echo "Failed to stop drbd for some reason"                                                                                                   
            cat /proc/drbd                                                                                                                               
            if [[ $i -eq 10 ]]; then                                                                                                                     
                    exit 50                                                                                                                              
            fi                                                                                                                                           
        fi                                                                                                                                               
    done                                                                                                                                                 
fi                                                                                                                                                       

rm -f /tmp/heartbeatwrapper                                                                                                                              
exit 0                                                                                                                                                   

elif [[ x$1 == "xstart" ]]; then

#start up drbd first
/etc/ha.d/resource.d/drbddisk export start >/dev/null 2>&1
if [[ $? -ne 0 ]]; then                                   
    echo "Something seems to have broken. Let's check possibilities..."
    testvar=$(egrep -o "st:[A-Za-z/]*" /proc/drbd | cut -d: -f2)       
    if [[ $testvar == "Primary/Unknown" ]] || [[ $testvar == "Primary/Secondary" ]]
    then                                                                           
        echo "All is fine, we are already the Primary for some reason"             
    elif [[ $testvar == "Secondary/Unknown" ]] || [[ $testvar == "Secondary/Secondary" ]]
    then                                                                                 
        echo "Trying to assume Primary again"                                            
        /etc/ha.d/resource.d/drbddisk export start >/dev/null 2>&1                       
        if [[ $? -ne 0 ]]; then                                                          
            echo "I give up, something's seriously broken here, and I can't help you to fix it."
            rm -f /tmp/heartbeatwrapper                                                         
            exit 127                                                                            
        fi                                                                                      
    fi                                                                                          
fi                                                                                              

sleep 1                                                                                         

#now we remount our partitions                                                                  

for ((test=1; test <= 10; test++)); do                                                          
    mount /dev/drbd1 /export >/tmp/mountoutput                                                  
    if [[ -n $(grep -o export /etc/mtab) ]]; then                                               
        break                                                                                   
    fi                                                                                          
done                                                                                            

if [[ $test -eq 10 ]]; then                                                                     
    rm -f /tmp/heartbeatwrapper                                                                 
    exit 125                                                                                    
fi                                                                                              


#I'm really unsure at this point of the side-effects of not having rpc_pipefs mounted.          
#The issue here, is that it cannot be mounted without nfs running, and we don't really want to start
#nfs up at this point, lest it ruin everything.                                                     
#For now, I'm leaving mine unmounted, it doesn't seem to cause any problems.                        

#Now we start up nfs.

/etc/init.d/nfs-kernel-server start >/dev/null 2>&1
if [[ $? -ne 0 ]]; then
    echo "There's not really that much that I can do to debug nfs issues."
    echo "probably your configuration is broken.  I'm terminating here."
    rm -f /tmp/heartbeatwrapper
    exit 129
fi

#And that's it, done.

rm -f /tmp/heartbeatwrapper
exit 0

elif [[ "x$1" == "xstatus" ]]; then

#Lets check to make sure nothing is broken.

#DRBD first
/etc/ha.d/resource.d/drbddisk export status >/dev/null 2>&1
if [[ $? -ne 0 ]]; then
    echo "stopped"
    rm -f /tmp/heartbeatwrapper
    exit 3
fi

#mounted?
grep -q drbd /etc/mtab >/dev/null 2>&1
if [[ $? -ne 0 ]]; then
    echo "stopped"
    rm -f /tmp/heartbeatwrapper
    exit 3
fi

#nfs running?
/etc/init.d/nfs-kernel-server status >/dev/null 2>&1
if [[ $? -ne 0 ]]; then
    echo "stopped"
    rm -f /tmp/heartbeatwrapper
    exit 3
fi

echo "running"
rm -f /tmp/heartbeatwrapper
exit 0

fi

完成上述所有操作后,您只需要配置/ etc/exports即可

/export 172.20.1.0/255.255.255.0(rw,sync,fsid=1,no_root_squash)

然后,这只是在两台机器上启动心跳并在其中一台机器上发出hb_takeover的情况.您可以通过确保您发出接管的那个是主要的 - 检查/ proc/drbd,设备是否正确安装以及您可以访问nfs来测试它是否正常工作.

-

祝你好运.对我来说,从头开始设置它是一种非常痛苦的经历.

推荐阅读
小色米虫_524
这个屌丝很懒,什么也没留下!
DevBox开发工具箱 | 专业的在线开发工具网站    京公网安备 11010802040832号  |  京ICP备19059560号-6
Copyright © 1998 - 2020 DevBox.CN. All Rights Reserved devBox.cn 开发工具箱 版权所有