HDFS与关系型数据库数据交换利器—sqoop初探

作者：个性2402852463 | 2021-08-26 23:35

Sqoop是一种用于hadoop与RDBMS进行数据传输的工具。配置比较简单。去apache官网下载最新的sqoop包。下载地址：www.apache.orgdistsqoop1.99.1解压缩到服务器上。服务器要求本身有jdk，hadoop，hive。配置：confsqoop-env.sh#

Sqoop是一种用于hadoop与RDBMS进行数据传输的工具。
配置比较简单。
去apache官网下载最新的sqoop包。
下载地址：http://www.apache.org/dist/sqoop/1.99.1/
解压缩到服务器上。服务器要求本身有jdk，hadoop，hive。
配置：
conf/sqoop-env.sh
#Set path to where bin/hadoop is available
export HADOOP_HOME=/home/hadoop/hadoop-0.20.205.0
#Set the path to where bin/hive is available
export HIVE_HOME=/home/hadoop/hive-0.8.1
这时候就可以进行试验了。我们主要是利用其与hive进行交互，实际就是将关系型的数据库中的数据提交到hive，保存到HDFS中，以便于大数据的计算。

sqoop主要包含了以下命令，或者说功能。

 codegen             Import a table definition into Hive eval                Evaluate a SQL statement and display the results export              Export an HDFS directory to a database table help                List available commands import              Import a table from a database to HDFS import-all-tables   Import tables from a database to HDFS job                 Work with saved jobs list-databases      List available databases on a server list-tables         List available tables in a database merge               Merge results of incremental imports metastore           Run a standalone Sqoop metastore version             Display version information  这里主要是使用其中的import功能。export功能的命令语法类似。


示例

./sqoop import --connect jdbc:mysql://lcoalhost:3306/dbname--username dbuser --password dbpassword --table tablename --hive-import --hive-table hivedb.hivetable --hive-drop-import-delims --hive-overwrite --num-mappers 6


以上命令的意思就是要将本地数据库dbname中的tablename表的数据导入到hivedb的hivetable表中。

其中一些常用的参数就不进行解释了。

–hive-import 标识本次导入的地址为hive

–hive-table 标识hive中的表信息

–hive-drop-import-delims 这个比较重要，因为数据从数据库中导入到HDFS中，如果包含了特殊的字符，对MR解析是存在问题的，比如数据库中

有text类型的字段，有可能包含\t,\n等参数，加入这个参数后，会自动将特殊字符进行处理。

–hive-overwrite 如果原有的hive表已经存在，则会进行覆盖操作。

–num-mappers 会指定执行本次导入的mapper任务数量。
还有一个比较重要的参数 –direct 这个参数可以通过数据库的dump功能进行数据导入，这样的性能比上例更好，但是其不能与–hive-drop-import-delims参数功能使用。所以还是要根据自己数据库的情况来进行判断使用何种命令。

如下是sqoop的import命令








    Argument
    

    Description
    





    --connect 
    

    Specify JDBC connect string
    



    --connection-manager 
    

    Specify connection manager class to                                          use
    



    --driver 
    

    Manually specify JDBC driver class                                          to use
    



    --hadoop-home 
    

    Override $HADOOP_HOME
    



    --help
    

    Print usage instructions
    



    -P
    

    Read password from console
    



    --password 
    

    Set authentication password
    



    --username 
    

    Set authentication username
    



    --verbose
    

    Print more information while working
    



    --connection-param-file 
    

    Optional properties file that                                          provides connection parameters
    










    Argument
    

    Description
    





    --hive-home 
    

    Override $HIVE_HOME
    



    --hive-import
    

    Import tables into Hive (Uses Hive’s                               default delimiters if none are set.)
    



    --hive-overwrite
    

    Overwrite existing data in the Hive table.
    



    --create-hive-table
    

    If set, then the job will fail if the target hive
    




    table exits. By default this property is false.
    



    --hive-table 
    

    Sets the table name to use when importing                              to Hive.
    



    --hive-drop-import-delims
    

    Drops \n, \r, and \01 from string                              fields when importing to Hive.
    



    --hive-delims-replacement
    

    Replace \n, \r, and \01 from string                              fields with user defined string when importing to Hive.
    



    --hive-partition-key
    

    Name of a hive field to partition are                               sharded on
    



    --hive-partition-value 
    

    String-value that serves as partition key                              for this imported into hive in this job.
    



    --map-column-hive 
    

    Override default mapping from SQL type to                              Hive type for configured columns.
    




以下为一些参考示例

写入条件

sqoop import –table test –columns “id,name” –where “id>400″

使用dump功能

sqoop import –connect jdbc:mysql://server.foo.com/db –table bar –direct — –default-character-set=latin1

列类型重新定义

sqoop import … –map-column-java id=String,value=Integer

定义分割符

sqoop import –connect jdbc:mysql://db.foo.com/corp –table EMPLOYEES –fields-terminated-by ‘\t’ –lines-terminated-by ‘\n’ –optionally-enclosed-by ‘\”‘

    
        原文地址：HDFS与关系型数据库数据交换利器—sqoop初探, 感谢原作者分享。

Argument	Description
`--connect`	Specify JDBC connect string
`--connection-manager`	Specify connection manager class to use
`--driver`	Manually specify JDBC driver class to use
`--hadoop-home`	Override $HADOOP_HOME
`--help`	Print usage instructions
`-P`	Read password from console
`--password`	Set authentication password
`--username`	Set authentication username
`--verbose`	Print more information while working
`--connection-param-file`	Optional properties file that provides connection parameters

Argument	Description
`--hive-home`	Override `$HIVE_HOME`
`--hive-import`	Import tables into Hive (Uses Hive’s default delimiters if none are set.)
`--hive-overwrite`	Overwrite existing data in the Hive table.
`--create-hive-table`	If set, then the job will fail if the target hive
table exits. By default this property is false.
`--hive-table`	Sets the table name to use when importing to Hive.
`--hive-drop-import-delims`	Drops \n, \r, and \01 from string fields when importing to Hive.
`--hive-delims-replacement`	Replace \n, \r, and \01 from string fields with user defined string when importing to Hive.
`--hive-partition-key`	Name of a hive field to partition are sharded on
`--hive-partition-value`	String-value that serves as partition key for this imported into hive in this job.
`--map-column-hive`	Override default mapping from SQL type to Hive type for configured columns.



    

    

    
        推荐阅读
        
            
                                
                    
                        程序员
                        Laravel不会将我的域名传递给MailGun驱动程序,因此我无法发送邮件
                    

                    
                                                
                        如何解决《Laravel不会将我的域名传递给MailGun驱动程序,因此我无法发送邮件》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        通过this.history.pushState()传递状态
                    

                    
                                                
                        如何解决《通过this.history.pushState()传递状态》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        Flask/Apache中的PyMySQL有时返回空结果
                    

                    
                                                
                        如何解决《Flask/Apache中的PyMySQL有时返回空结果》经验，为你挑选了0个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        需要退货声明吗？ - 新编码
                    

                    
                                                
                        如何解决《需要退货声明吗？-新编码》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        使用Scalaz(或Shapeless)为每个子类创建Monoids
                    

                    
                                                
                        如何解决《使用Scalaz(或Shapeless)为每个子类创建Monoids》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        有人可以解释这种语法吗？
                    

                    
                                                
                        如何解决《有人可以解释这种语法吗？》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        如何使用dplyr获得两列的平均值？
                    

                    
                                                
                        如何解决《如何使用dplyr获得两列的平均值？》经验，为你挑选了2个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        如何将HoverTool添加到数据表(Bokeh,Python)
                    

                    
                                                
                            
                        
                                                
                        如何解决《如何将HoverTool添加到数据表(Bokeh,Python)》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        在指向另一个链接之前的加载屏幕.MVC Razor
                    

                    
                                                
                        如何解决《在指向另一个链接之前的加载屏幕.MVCRazor》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        为什么有4个版本的Node JS？
                    

                    
                                                
                        如何解决《为什么有4个版本的NodeJS？》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        Javascript解析Facebook登录问题
                    

                    
                                                
                        如何解决《Javascript解析Facebook登录问题》经验，为你挑选了3个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        为什么我的Git存储库比工作目录大得多？
                    

                    
                                                
                        如何解决《为什么我的Git存储库比工作目录大得多？》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        是否需要打字稿类型定义？
                    

                    
                                                
                        如何解决《是否需要打字稿类型定义？》经验，为你挑选了0个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        Spring @Autowired无法连接Jpa存储库
                    

                    
                                                
                        如何解决《Spring@Autowired无法连接Jpa存储库》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        熊猫组由自定义功能
                    

                    
                                                
                        如何解决《熊猫组由自定义功能》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        PECL yaml在PHP7中
                    

                    
                                                
                        如何解决《PECLyaml在PHP7中》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        可以使用Dapper流式传输大型SQL Server数据库结果集吗？
                    

                    
                                                
                        如何解决《可以使用Dapper流式传输大型SQLServer数据库结果集吗？》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        在Swift/Objective-C中监听window resize事件
                    

                    
                                                
                        如何解决《在Swift/Objective-C中监听windowresize事件》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        通过HTTPS进行Mojolicious
                    

                    
                                                
                        如何解决《通过HTTPS进行Mojolicious》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        如何使用Selenium2Library查找电子邮件字段的值
                    

                    
                                                
                        如何解决《如何使用Selenium2Library查找电子邮件字段的值》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                

            
        
    

    
        吐了个 "CAO" !
        
            
                吐个槽吧,看都看了
            
            
                
                                        会员登录 | 用户注册
























    

    
        
            
            
                
                    
                
            

            
                个性2402852463            

            
                这个屌丝很懒，什么也没留下！            
            
            

                                
                    
                    关注作者
                            

        
    


    
        Tags | 热门标签
        
            
                                
                    ambari
                
                                
                    cloudera
                
                                
                    flink
                
                                
                    hadoop
                
                                
                    hdfs
                
                                
                    hive
                
                                
                    kafka
                
                                
                    kylin
                
                                
                    mapreduce
                
                                
                    spark
                
                                
                    sqoop
                
                                
                    talkingdata
                
                                
                    tdengine
                
                                
                    zookeeper
                
                                
                    涛思数据
                
                                
            
        
    


    
        RankList | 热门文章
        
            
                                
                    1点按时,Universal Links不会将用户重定向到我的应用
                
                                
                    2将功能参数识别为输入或输出
                
                                
                    3如何为QTableWidget创建过滤器？
                
                                
                    4如何正确导入Wand到python？
                
                                
                    5如何在ViewController中为UIButton创建事件？
                
                                
                    6webpack umd lib和外部文件
                
                                
                    7PHP:重新加载时重置会话生存期
                
                                
                    8是否有可能在另一个请求中有一个Angular js $ http请求？
                
                                
                    9C#任务未完成(命令提示符中没有结果)
                
                                
                    10Android数据绑定后备/默认值
                
                                
                    11Visual Studio 2015挂起然后:'XamarinIOSPackage'包未正确加载
                
                                
                    12位置固定100父母
                
                                
                    13新ASP.NET MVC 6标识中AspNetUsers表中ConcurrencyStamp列的用途是什么？
                
                                
                    14Array.create和锯齿状数组
                
                                
                    15多态，参数多态，临时多态
                
                                
                    16不能使用Asp MVC 6 Web Api的HttpClient
                
                                
                    17Handlebarsjs检查字符串是否等于值
                
                                
                    18ES2015在相同范围内两次构造对象
                
                                
                    19在多个uWSGI进程中使用Flask-SQLAlchemy
                
                                
                    20来自RStudio的编织PDf文件