使用Pandas和PyMongoMongoDB数据加载到DataFrame的更好方法?

Question

问题说明

我有一个0.7 GB的MongoDB数据库，其中包含要尝试加载到数据帧中的推文.但是，我得到一个错误.

I have a 0.7 GB MongoDB database containing tweets that I'm trying to load into a dataframe. However, I get an error.

MemoryError:

我的代码如下:

cursor = tweets.find() #Where tweets is my collection
tweet_fields = ['id']
result = DataFrame(list(cursor), columns = tweet_fields)

我已经尝试了以下答案中的方法，这些方法有时会在加载数据库之前创建数据库所有元素的列表.

I've tried the methods in the following answers, which at some point create a list of all the elements of the database before loading it.

https://stackoverflow.com/a/17805626/2297475
https://stackoverflow.com/a/16255680/2297475

但是，在另一个有关list()的答案中，此人表示这对小型数据集非常有用，因为所有内容都已加载到内存中.

However, in another answer which talks about list(), the person said that it's good for small data sets, because everything is loaded into memory.

https://stackoverflow.com/a/13215411/2297475

就我而言，我认为这是错误的根源.太多数据无法加载到内存中.我还能使用什么其他方法?

In my case, I think it's the source of the error. It's too much data to be loaded into memory. What other method can I use?

Answer 1

正确答案

#1

我已将代码修改为以下内容:

I've modified my code to the following:

cursor = tweets.find(fields=['id'])
tweet_fields = ['id']
result = DataFrame(list(cursor), columns = tweet_fields)

通过在find()函数中添加 fields 参数，我限制了输出.这意味着我没有将每个字段都加载，而是仅将所选字段加载到DataFrame中.现在一切正常.

By adding the fields parameter in the find() function I restricted the output. Which means that I'm not loading every field but only the selected fields into the DataFrame. Everything works fine now.

这篇好文章是转载于：编程之路

使用Pandas和PyMongoMongoDB数据加载到DataFrame的更好方法?

问题说明

正确答案

YouTube API 不能在 iOS (iPhone/iPad) 工作，但在桌面浏览器工作正常?

保持在后台运行的 iPhone 应用程序完全可操作

iPhone，一张图像叠加到另一张图像上以创建要保存的新图像?(水印)

使用 iPhone 进行移动设备管理

在android同时打开手电筒和前置摄像头

扫描 NFC 标签时是否可以启动应用程序?

检查邮件是否发送成功

Android微调工具-删除当前选择

希伯来语的空格句子标记化错误

Android App 和三星 Galaxy S4 不兼容