IT박스

MongoDB 데이터베이스 파일 크기 줄이기

itboxs 2020. 6. 2. 19:01
반응형

MongoDB 데이터베이스 파일 크기 줄이기


한때 큰 (> 3GB) MongoDB 데이터베이스가 있습니다. 그 이후로 문서가 삭제되었으며 이에 따라 데이터베이스 파일의 크기가 줄어들 것으로 예상했습니다.

그러나 MongoDB는 할당 된 공간을 유지하므로 파일이 여전히 큽니다.

여기에서 admin 명령 mongod --repair이 사용되지 않은 공간을 비우는 데 사용 된다는 것을 읽었 지만 디스크 에이 명령을 실행하기에 충분한 공간이 없습니다.

사용하지 않는 공간을 확보 할 수있는 방법을 알고 있습니까?


업데이트 :compact명령 같은 WiredTiger 그것은 본다 추가 디스크 공간이 실제로 OS에 발표 될 예정이다 .


업데이트 : v1.9 이상에서는 compact명령이 있습니다.

이 명령은 "인라인"압축을 수행합니다. 여전히 여분의 공간이 필요하지만 그다지 많지는 않습니다.


MongoDB는 다음을 통해 파일을 압축합니다.

  • 파일을 새 위치로 복사
  • 문서 반복 및 재주문 / 재 해결
  • 원본 파일을 새 파일로 교체

실행 mongod --repair하거나 직접 연결하고 실행 하여이 "압축"을 수행 할 수 있습니다 db.repairDatabase().

두 경우 모두 파일을 복사 할 공간이 필요합니다. 이제 압축을 수행하기에 충분한 공간이없는 이유를 모르겠지만 공간이 더 많은 다른 컴퓨터가있는 경우 몇 가지 옵션이 있습니다.

  1. Mongo가 설치된 다른 컴퓨터로 데이터베이스를 내보내고 (을 사용하여 mongoexport) 동일한 데이터베이스를 가져 옵니다 (을 사용하여 mongoimport). 그러면 압축 된 새 데이터베이스가 생성됩니다. 이제 mongod새 데이터베이스 파일로 원래 바꾸기를 중지 할 수 있습니다 .
  2. 현재 mongod를 중지하고 데이터베이스 파일을 더 큰 컴퓨터에 복사 한 후 해당 컴퓨터에서 복구를 실행하십시오. 그런 다음 새 데이터베이스 파일을 원래 컴퓨터로 다시 이동할 수 있습니다.

현재 몽고를 사용하여 "컴팩트 한 위치에"적합한 방법은 없습니다. 그리고 몽고는 확실히 많은 공간을 빨아 들일 수 있습니다.

압축을위한 최고의 전략은 Master-Slave 설정을 실행하는 것입니다. 그런 다음 슬레이브를 압축하여 따라 잡고 전환 할 수 있습니다. 나는 아직도 약간 털이 알고 있습니다. 아마도 몽고 팀은 압축을 더 잘 할 수 있지만 목록에 있다고 생각하지는 않습니다. 드라이브 공간은 현재 저렴하다고 가정합니다 (보통).


나는 같은 문제를 겪었고 단순히 명령 줄 에서이 작업을 수행하여 해결했습니다.

mongodump -d databasename
echo 'db.dropDatabase()' | mongo databasename
mongorestore dump/databasename

Mongo v1.9 +가 컴팩트를 지원하는 것처럼 보입니다!

> db.runCommand( { compact : 'mycollectionname' } )

http://docs.mongodb.org/manual/reference/command/compact/ 문서를 참조하십시오.

"repairDatabase와 달리, 컴팩트 명령은 작업을 수행하기 위해 이중 디스크 공간이 필요하지 않습니다. 작업하는 동안 적은 양의 추가 공간이 필요합니다. 또한 컴팩트가 빠릅니다."


현재 데이터베이스의 모든 컬렉션 압축

db.getCollectionNames().forEach(function (collectionName) {
    print('Compacting: ' + collectionName);
    db.runCommand({ compact: collectionName });
});

전체 복구를 실행해야하는 경우 repairpath옵션을 사용하십시오 . 사용 가능한 공간이 더 많은 디스크를 가리 킵니다.

예를 들어, 내 Mac에서는 다음을 사용했습니다.

mongod --config /usr/local/etc/mongod.conf --repair --repairpath /Volumes/X/mongo_repair

업데이트 : MongoDB Core Server Ticket 4266--nojournal따라 오류를 피하기 위해 추가해야 할 수도 있습니다 .

mongod --config /usr/local/etc/mongod.conf --repair --repairpath /Volumes/X/mongo_repair --nojournal

Mongo 2.8 버전 부터 compression을 사용할 수 있습니다 . WiredTiger 엔진 인 mmap (2.6의 기본값은 압축을 제공하지 않음)을 사용하여 3 가지 압축 수준을 갖게됩니다.

다음은 16GB의 데이터를 저장할 수있는 공간의 예입니다.

여기에 이미지 설명을 입력하십시오

기사 에서 데이터를 가져옵니다 .


StorageEngine을 기반으로 두 가지 방법을 풀어야합니다.

1. MMAP () 엔진 :

명령 : db.repairDatabase ()

NOTE: repairDatabase requires free disk space equal to the size of your current data set plus 2 gigabytes. If the volume that holds dbpath lacks sufficient space, you can mount a separate volume and use that for the repair. When mounting a separate volume for repairDatabase you must run repairDatabase from the command line and use the --repairpath switch to specify the folder in which to store temporary repair files. eg: Imagine DB size is 120 GB means, (120*2)+2 = 242 GB Hard Disk space required.

another way you do collection wise, command: db.runCommand({compact: 'collectionName'})

2. WiredTiger: Its automatically resolved it-self.


There has been some considerable confusion over space reclamation in MongoDB, and some recommended practice are downright dangerous to do in certain deployment types. More details below:

TL;DR repairDatabase attempts to salvage data from a standalone MongoDB deployments that is trying to recover from a disk corruption. If it recovers space, it is purely a side effect. Recovering space should never be the primary consideration of running repairDatabase.

Recover space in a standalone node

WiredTiger: For a standalone node with WiredTiger, running compact will release space to the OS, with one caveat: The compact command on WiredTiger on MongoDB 3.0.x was affected by this bug: SERVER-21833 which was fixed in MongoDB 3.2.3. Prior to this version, compact on WiredTiger could silently fail.

MMAPv1: Due to the way MMAPv1 works, there is no safe and supported method to recover space using the MMAPv1 storage engine. compact in MMAPv1 will defragment the data files, potentially making more space available for new documents, but it will not release space back to the OS.

You may be able to run repairDatabase if you fully understand the consequences of this potentially dangerous command (see below), since repairDatabase essentially rewrites the whole database by discarding corrupt documents. As a side effect, this will create new MMAPv1 data files without any fragmentation on it and release space back to the OS.

For a less adventurous method, running mongodump and mongorestore may be possible as well in an MMAPv1 deployment, subject to the size of your deployment.

Recover space in a replica set

For replica set configurations, the best and the safest method to recover space is to perform an initial sync, for both WiredTiger and MMAPv1.

If you need to recover space from all nodes in the set, you can perform a rolling initial sync. That is, perform initial sync on each of the secondaries, before finally stepping down the primary and perform initial sync on it. Rolling initial sync method is the safest method to perform replica set maintenance, and it also involves no downtime as a bonus.

Please note that the feasibility of doing a rolling initial sync also depends on the size of your deployment. For extremely large deployments, it may not be feasible to do an initial sync, and thus your options are somewhat more limited. If WiredTiger is used, you may be able to take one secondary out of the set, start it as a standalone, run compact on it, and rejoin it to the set.

Regarding repairDatabase

Please don't run repairDatabase on replica set nodes. This is very dangerous, as mentioned in the repairDatabase page and described in more details below.

The name repairDatabase is a bit misleading, since the command doesn't attempt to repair anything. The command was intended to be used when there's disk corruption on a standalone node, which could lead to corrupt documents.

The repairDatabase command could be more accurately described as "salvage database". That is, it recreates the databases by discarding corrupt documents in an attempt to get the database into a state where you can start it and salvage intact document from it.

In MMAPv1 deployments, this rebuilding of the database files releases space to the OS as a side effect. Releasing space to the OS was never the purpose.

Consequences of repairDatabase on a replica set

In a replica set, MongoDB expects all nodes in the set to contain identical data. If you run repairDatabase on a replica set node, there is a chance that the node contains undetected corruption, and repairDatabase will dutifully remove the corrupt documents for you.

Predictably, this makes that node contains a different dataset from the rest of the set. If an update happens to hit that single document, the whole set could crash.

To make matters worse, it is entirely possible that this situation could stay dormant for a long time, only to strike suddenly with no apparent reason.


In case a large chunk of data is deleted from a collection and the collection never uses the deleted space for new documents, this space needs to be returned to the operating system so that it can be used by other databases or collections. You will need to run a compact or repair operation in order to defragment the disk space and regain the usable free space.

Behavior of compaction process is dependent on MongoDB engine as follows

db.runCommand({compact: collection-name })

MMAPv1

Compaction operation defragments data files & indexes. However, it does not release space to the operating system. The operation is still useful to defragment and create more contiguous space for reuse by MongoDB. However, it is of no use though when the free disk space is very low.

An additional disk space up to 2GB is required during the compaction operation.

A database level lock is held during the compaction operation.

WiredTiger

The WiredTiger engine provides compression by default which consumes less disk space than MMAPv1.

The compact process releases the free space to the operating system. Minimal disk space is required to run the compact operation. WiredTiger also blocks all operations on the database as it needs database level lock.

For MMAPv1 engine, compact doest not return the space to operating system. You require to run repair operation to release the unused space.

db.runCommand({repairDatabase: 1})

Mongodb 3.0 and higher has a new storage engine - WiredTiger. In my case switching engine reduced disk usage from 100 Gb to 25Gb.


Database files cannot be reduced in size. While "repairing" database, it is only possible for mongo server to delete some of its files. If large amount of data has been deleted, mongo server will "release" (delete), during repair, some of its existing files.


In general compact is preferable to repairDatabase. But one advantage of repair over compact is you can issue repair to the whole cluster. compact you have to log into each shard, which is kind of annoying.


When i had the same problem, i stoped my mongo server and started it again with command

mongod --repair

Before running repair operation you should check do you have enough free space on your HDD (min - is the size of your database)


mongoDB -repair is not recommended in case of sharded cluster.

If using replica set sharded cluster, use compact command, it will rewrites and defragments all data and index files of all collections. syntax:

db.runCommand( { compact : "collection_name" } )

when used with force:true, compact runs on primary of replica set. e.g. db.runCommand ( { command : "collection_name", force : true } )

Other points to consider: -It blocks the operations. so recommended to execute in maintenance window. -If replica sets running on different servers, needs to be execute on each member separately - In case of sharded cluster, compact needs to execute on each shard member separately. Cannot execute against mongos instance.


For standalone mode you could use compact or repair,

For sharded cluster or replica set, in my experience, after you running compact on the primary, followed by compact the secondary, the size of primary database reduced, but not the secondary. You might want to do resync member to reduce the size of secondary database. and by doing this you might find that the size of secondary database is even more reduced than the primary, i guess the compact command not really compacting the collection. So, i ended up switching the primary and secondary of the replica set and doing resync member again.

my conclusion is, the best way to reduce the size of sharded/replica set is by doing resync member, switch primary secondary, and resync again.


내가 할 수있는 한 가지 방법. 기존 데이터의 안전성을 보장하지 않습니다. 자신의 위험을 감수하십시오.

데이터 파일을 직접 삭제하고 mongod를 다시 시작하십시오.

예를 들어, 우분투 (데이터의 기본 경로 : / var / lib / mongodb)를 사용하면 collection. #과 같은 이름의 파일이 몇 개 있습니다. 컬렉션을 유지하고 0을 모두 삭제했습니다.

데이터베이스에 심각한 데이터가없는 경우 더 쉬운 방법 인 것 같습니다.

참고 URL : https://stackoverflow.com/questions/2966687/reducing-mongodb-database-file-size

반응형