.Contains가 느린 이유는 무엇입니까?

IT박스

.Contains가 느린 이유는 무엇입니까?

itboxs 2020. 12. 29. 06:51

.Contains가 느린 이유는 무엇입니까? 기본 키로 여러 항목을 가져 오는 가장 효율적인 방법은 무엇입니까?

기본 키로 여러 항목을 선택하는 가장 효율적인 방법은 무엇입니까?

public IEnumerable<Models.Image> GetImagesById(IEnumerable<int> ids)
{

    //return ids.Select(id => Images.Find(id));       //is this cool?
    return Images.Where( im => ids.Contains(im.Id));  //is this better, worse or the same?
    //is there a (better) third way?

}

비교하기 위해 몇 가지 성능 테스트를 수행 할 수 있다는 것을 알고 있지만 실제로 두 쿼리보다 더 좋은 방법이 있는지 궁금합니다.이 두 쿼리의 차이점이 있다면 그 차이가 무엇인지에 대한 깨달음을 찾고 있습니다. '번역됨'.

업데이트 : EF6에 InExpression이 추가됨에 따라 Enumerable.Contains 처리 성능이 크게 향상되었습니다. 이 답변의 분석은 훌륭하지만 2013 년 이후로 거의 사용되지 않습니다.

ContainsEntity Framework에서 사용 하는 것은 실제로 매우 느립니다. INSQL 의 절로 변환되고 SQL 쿼리 자체가 빠르게 실행 된다는 것은 사실입니다 . 그러나 문제와 성능 병목 현상은 LINQ 쿼리를 SQL로 변환하는 것입니다. 생성 될 표현식 트리는 . OR를 나타내는 네이티브 표현식이 없기 때문에 긴 연결 체인으로 확장됩니다 IN. SQL이 생성 될 때 많은 ORs 의이 표현식 이 인식되고 SQL IN절로 다시 축소됩니다 .

그렇다고해서 컬렉션의 Contains요소 당 하나의 쿼리를 실행하는 것보다을 사용하는 것이 나쁘다는 의미는 아닙니다 ids(첫 번째 옵션). 적어도 컬렉션이 너무 크지 않은 경우에는 여전히 더 좋습니다. 그러나 대규모 컬렉션의 경우 정말 나쁩니다. 나는 얼마 전에 Contains약 12.000 개의 요소가 있는 쿼리를 테스트 했지만 SQL의 쿼리가 1 초 이내에 실행 되었음에도 불구하고 약 1 분 정도 걸렸다 는 것을 기억합니다 .

Contains각 왕복 에 대한 식에 더 적은 수의 요소를 사용하여 데이터베이스에 대한 여러 왕복 조합의 성능을 테스트하는 것이 좋습니다.

이 접근 방식과 ContainsEntity Framework와 함께 사용할 때의 제한 사항 은 다음과 같습니다.

Contains () 연산자가 Entity Framework의 성능을 크게 저하시키는 이유는 무엇입니까?

원시 SQL 명령이이 상황에서 가장 잘 수행 될 수 있습니다. 즉, 호출 dbContext.Database.SqlQuery<Image>(sqlString)하거나 @Rune의 답변에 표시된 SQL이 dbContext.Images.SqlQuery(sqlString)어디에 있는지를 의미합니다 sqlString.

편집하다

다음은 몇 가지 측정입니다.

550000 개의 레코드와 11 개의 열 (ID는 간격없이 1부터 시작)이있는 테이블에서이 작업을 수행하고 무작위로 20000 개의 ID를 선택했습니다.

using (var context = new MyDbContext())
{
    Random rand = new Random();
    var ids = new List<int>();
    for (int i = 0; i < 20000; i++)
        ids.Add(rand.Next(550000));

    Stopwatch watch = new Stopwatch();
    watch.Start();

    // here are the code snippets from below

    watch.Stop();
    var msec = watch.ElapsedMilliseconds;
}

테스트 1

var result = context.Set<MyEntity>()
    .Where(e => ids.Contains(e.ID))
    .ToList();

결과-> msec = 85.5 초

테스트 2

var result = context.Set<MyEntity>().AsNoTracking()
    .Where(e => ids.Contains(e.ID))
    .ToList();

결과-> msec = 84.5 초

이 작은 효과 AsNoTracking는 매우 드뭅니다. 병목 현상이 객체 구체화가 아님을 나타냅니다 (아래에 표시된 SQL이 아님).

두 테스트 모두 SQL 프로필러에서 SQL 쿼리가 데이터베이스에 매우 늦게 도착 함을 알 수 있습니다. (정확히 측정하지는 않았지만 70 초가 넘었습니다.) 분명히이 LINQ 쿼리를 SQL로 변환하는 데 비용이 많이 듭니다.

테스트 3

var values = new StringBuilder();
values.AppendFormat("{0}", ids[0]);
for (int i = 1; i < ids.Count; i++)
    values.AppendFormat(", {0}", ids[i]);

var sql = string.Format(
    "SELECT * FROM [MyDb].[dbo].[MyEntities] WHERE [ID] IN ({0})",
    values);

var result = context.Set<MyEntity>().SqlQuery(sql).ToList();

결과-> msec = 5.1 초

테스트 4

// same as Test 3 but this time including AsNoTracking
var result = context.Set<MyEntity>().SqlQuery(sql).AsNoTracking().ToList();

결과-> msec = 3.8 초

이번에는 추적 비활성화의 효과가 더 눈에.니다.

테스트 5

// same as Test 3 but this time using Database.SqlQuery
var result = context.Database.SqlQuery<MyEntity>(sql).ToList();

결과-> msec = 3.7 초

내 이해는 context.Database.SqlQuery<MyEntity>(sql)과 동일 context.Set<MyEntity>().SqlQuery(sql).AsNoTracking()하므로 테스트 4와 테스트 5간에 예상되는 차이가 없습니다.

(결과 세트의 길이는 무작위 ID 선택 후 중복 가능성으로 인해 항상 같지는 않았지만 항상 19600에서 19640 사이였습니다.)

편집 2

테스트 6

데이터베이스로의 20000 왕복도 다음을 사용하는 것보다 빠릅니다 Contains.

var result = new List<MyEntity>();
foreach (var id in ids)
    result.Add(context.Set<MyEntity>().SingleOrDefault(e => e.ID == id));

결과-> msec = 73.6 초

Note that I have used SingleOrDefault instead of Find. Using the same code with Find is very slow (I cancelled the test after several minutes) because Find calls DetectChanges internally. Disabling auto change detection (context.Configuration.AutoDetectChangesEnabled = false) leads to roughly the same performance as SingleOrDefault. Using AsNoTracking reduces the time by one or two seconds.

Tests were done with database client (console app) and database server on the same machine. The last result might get significantly worse with a "remote" database due to the many roundtrips.

The second option is definitely better than the first. The first option will result in ids.Length queries to the database, while the second option can use an 'IN' operator in the SQL query. It will basically turn your LINQ query into something like the following SQL:

SELECT *
FROM ImagesTable
WHERE id IN (value1,value2,...)

where value1, value2 etc. are the values of your ids variable. Be aware, however, that I think there may be an upper limit on the number of values that can be serialized into a query in this way. I'll see if I can find some documentation...

I am using Entity Framework 6.1 and found out using your code that, is better to use:

return db.PERSON.Find(id);

rather than:

return db.PERSONA.FirstOrDefault(x => x.ID == id);

Performance of Find() vs. FirstOrDefault are some thoughts on this.

Weel, recently have a similar problem and the best way I found was insert the list of contains in a temp Table and after make a join.

private List<Foo> GetFoos(IEnumerable<long> ids)
{
    var sb = new StringBuilder();
    sb.Append("DECLARE @Temp TABLE (Id bitint PRIMARY KEY)\n");

    foreach (var id in ids)
    {
        sb.Append("INSERT INTO @Temp VALUES ('");
        sb.Append(id);
        sb.Append("')\n");
    }

    sb.Append("SELECT f.* FROM [dbo].[Foo] f inner join @Temp t on f.Id = t.Id");

    return this.context.Database.SqlQuery<Foo>(sb.ToString()).ToList();
}

It's not a pretty way, but for large lists it is very performant.

Transforming the List to an Array with toArray() increases performance. You can do it this way:

ids.Select(id => Images.Find(id));     
    return Images.toArray().Where( im => ids.Contains(im.Id));

ReferenceURL : https://stackoverflow.com/questions/8107439/why-is-contains-slow-most-efficient-way-to-get-multiple-entities-by-primary-ke

'IT박스' 카테고리의 다른 글

가져 오기 : JSON 오류 객체로 약속 거부 (0)	2020.12.29
iOS에서 오디오 볼륨 레벨 및 볼륨 변경 알림을받는 방법은 무엇입니까? (0)	2020.12.29
std :: string을 std :: vector에 복사하는 방법 (0)	2020.12.29
Jekyll과 함께 Live Reload 사용 (0)	2020.12.29
def`self.function` 이름은 무엇을 의미합니까? (0)	2020.12.29

현재글.Contains가 느린 이유는 무엇입니까?

itboxs

.Contains가 느린 이유는 무엇입니까?

.Contains가 느린 이유는 무엇입니까? 기본 키로 여러 항목을 가져 오는 가장 효율적인 방법은 무엇입니까?

'IT박스' 카테고리의 다른 글

'IT박스'의 다른글

티스토리툴바

.Contains가 느린 이유는 무엇입니까?

.Contains가 느린 이유는 무엇입니까? 기본 키로 여러 항목을 가져 오는 가장 효율적인 방법은 무엇입니까?

'IT박스' 카테고리의 다른 글

'IT박스'의 다른글

관련글

티스토리툴바