我想我错过了什么,但无法弄清楚是什么.我想使用特定的sql语句使用SQLContext和JDBC加载数据
select top 1000 text from table1 with (nolock) where threadid in ( select distinct id from table2 with (nolock) where flag=2 and date >= '1/1/2015' and userid in (1, 2, 3) )
我应该使用哪种SQLContext方法?我看到的例子总是指定表名和下边距和上边距.
提前致谢.
您应该将有效的子查询作为dbtable
参数传递.例如在Scala中:
val query = """(SELECT TOP 1000
-- and the rest of your query
-- ...
) AS tmp -- alias is mandatory*"""
val url: String = ???
val jdbcDF = sqlContext.read.format("jdbc")
.options(Map("url" -> url, "dbtable" -> query))
.load()
*Hive语言手册子查询:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries