cqlsh不允许嵌套查询,所以我不能选择导出数据到CSV ..我想选择的数据(约200,000行与一列)使用卡桑德拉出口:
echo "SELECT distinct imei FROM listener.snapshots;" > select.cql
bin/cqlsh -f select.cql > output.txt
它只是永远没有任何错误,并且文件没有增长.
如果我在最后一行使用strace,我会得到很多行:
select(0, NULL, NULL, NULL, {0, 2000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 4000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 8000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 16000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 32000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 2000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 4000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 8000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 16000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 32000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 2000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 4000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 8000}) = 0 (Timeout)
并且--debug只给我:
cqlsh --debug -f select.cql > output.txt
Using CQL driver:
怎么了? 是否有更好的方法从大型C*表中获取不同的分区键?
我用捕获:
cqlsh> CAPTURE 'temp.csv' Now capturing query output to 'temp.csv'. cqlsh> SELECT distinct imei FROM listener.snapshots; ---MORE--- ---MORE--- ---MORE--- ---MORE--- . . . cqlsh> cqlsh>
然后按Enter键直到完成.
更快的选择是使用分页:
cqlsh> PAGING off Disabled Query paging. cqlsh> CAPTURE 'temp.csv' Now capturing query output to 'temp.csv'. cqlsh> SELECT distinct imei FROM listener.snapshots;
它会立即将数据提取到文件中(如果你得到一个OperationTimedOut,你应该编辑cassandra.yaml中的超时设置).
我不能相信它是快速的方式...我知道我可以使用CassandraSQLContext使用spark导出数据,但是当我需要为非常大的表格中的非常列创建rdd查询C*时它不那么快(2B行〜),并将它们打印到文件:
val conf = new SparkConf().setAppName("ExtractDistinctImeis") val sc = new SparkContext(conf) val sqlContext = new SQLContext(sc) val connector = CassandraConnector(conf) val cc = new CassandraSQLContext(sc) val snapshots_imeis = cc.sql("select distinct imei from listener.snapshots").map(row => row(0).toString) val imeis = snapshots_imeis.collect def printToFile(f: java.io.File)(op: java.io.PrintWriter => Unit) { val p = new java.io.PrintWriter(f) try { op(p) } finally { p.close() } } printToFile(new File("/path/to/file.txt")) { p => imeis.foreach(p.println) }
用火花花了3.5个小时!通过捕获,我设法在3分钟/ 3秒后获取我的文件.