User Tools

Site Tools


map_suite_geocoder_performance_guide

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
map_suite_geocoder_performance_guide [2019/03/20 09:55]
tgwikiupdate
map_suite_geocoder_performance_guide [2019/03/21 02:49] (current)
tgwikiupdate
Line 1: Line 1:
 ====== Map Suite Geocoder Performance Guide ====== ====== Map Suite Geocoder Performance Guide ======
-Welcome to Map Suite Geocoder performance test and promotion guide, we strongly recommend you read the presentation of Map Suite Geocoder to know what is it and how is it work before reading this guide. If you have already understood Map Suite Geocoder, let's go on with this guide.+Welcome to Map Suite Geocoder performance test and promotion guide, we strongly recommend you read the presentation of [[thinkgeo_sdk_geocoding|Map Suite Geocoder]] to know what is it and how is it work before reading this guide. If you have already understood Map Suite Geocoder, let's go on with this guide.
  
 The purpose of this guide is to help you to know following things: The purpose of this guide is to help you to know following things:
-  * A simple ​introduction for inner structure and workflow ​of Map Suite Geocoder. +  * An introduction for best practice ​of Map Suite Geocoder. 
-  * What is the bottleneck of Map Suite Geocoder performance+  * Using multi-thread to improve ​the performance of batch query. 
-  * How to improve the performance of batch query+  * The bottleneck of Map Suite Geocoder performance. 
-===== Structure and Workflow ​===== +===== Best Practice ​===== 
-This section will introduce the structure and workflow ​of Map Suite Geocoder ​roughly.+This section will introduce the best practice ​of Map Suite Geocoder.
 ==== Data Source ==== ==== Data Source ====
-The Geocoder uses a native ​data source with an optimized set of United States street based on the [[https://​www.census.gov/​geo/​maps-data/​data/​tiger.html|TIGER® data]] from the [[https://​www.census.gov|U.S. Census Bureau]]. Below Figure 1 shows a partial source data files.+The Geocoder uses a native source ​data with an optimized set of United States street based on the [[https://​www.census.gov/​geo/​maps-data/​data/​tiger.html|TIGER® data]] from the [[https://​www.census.gov|U.S. Census Bureau]]. Below Figure 1 shows a partial source data files.
  
 {{:​map_suite_geocoder_performance_guide_001.png}} {{:​map_suite_geocoder_performance_guide_001.png}}
 \\ \\
-//Figure 1. Data source files.//+//Figure 1. Source ​Data Files.//
  
 The latest data from 2018 is available with purchase, you can refer our [[http://​blog.thinkgeo.com/​2018/​12/​10/​map-suite-geocoder-2018-data-refresh|Blog]] to know the detail. The latest data from 2018 is available with purchase, you can refer our [[http://​blog.thinkgeo.com/​2018/​12/​10/​map-suite-geocoder-2018-data-refresh|Blog]] to know the detail.
-==== Open and Close Geocoder ====+==== Open Close Geocoder ====
 After instantiating a //​**UsaGeocoder**//​ in your code, you have to call //​**Open**//​ function before doing any match queries, and it's recommended to call //​**Close**//​ function after finishing your query. The code looks like: After instantiating a //​**UsaGeocoder**//​ in your code, you have to call //​**Open**//​ function before doing any match queries, and it's recommended to call //​**Close**//​ function after finishing your query. The code looks like:
 <code csharp> <code csharp>
Line 24: Line 24:
 usaGeocoder.Close();​ usaGeocoder.Close();​
 </​code>​ </​code>​
-When opening a Geocoder, some source data you specified would be preprocessed and loaded into memory, most of them are index data that would be read frequently in match query. These data would be released from memory after close Geocoder.+When opening a Geocoder, some source data you specified would be preprocessed and loaded into memory, most of them are index data that would be read frequently in match query. These data would be released from memory after closing ​Geocoder.
  
-You may see the //​**MatchMode**//​ and //​**StreetNumberMatchingMode**//​ when initializing the //​**UsaGeocoder**//,​ they can be used to specify different match policies. It's important that the more strict policy the less match query time and vice versa. All code snippets in this guide use //**Default MatchMode**//​ with //​**ExactMatch**//​ and //**Default StreetNumberMatchingMode**//​ with //​**ExactMatch**//​.+You may see the //​**MatchMode**//​ and //​**StreetNumberMatchingMode**// ​parameters ​when initializing the //​**UsaGeocoder**//,​ they can be used to specify different match policies. It's important that the more strict policythe less match query timeand vice versa. All code snippets in this guide use //**Default MatchMode**//​ with //​**ExactMatch**//​ and //**Default StreetNumberMatchingMode**//​ with //​**ExactMatch**//​.
 ==== Match Query ==== ==== Match Query ====
 The Geocoder supports three overloaded //​**Match**//​ functions to do the match query, see below code snippet: The Geocoder supports three overloaded //​**Match**//​ functions to do the match query, see below code snippet:
Line 38: Line 38:
   * The 2nd call with //​**streetAddress**//​ and //**zip**// parameters is relative fastest, only geocoding would be considered and the match results would be returned directly.   * The 2nd call with //​**streetAddress**//​ and //**zip**// parameters is relative fastest, only geocoding would be considered and the match results would be returned directly.
  
-  * The 3rd call with //​**streetAddress**//,​ //​**city**//​ and //​**state**//​ parameters is slower than 2nd call, the different is that the results would be handled to filter city and state before returning.+  * The 3rd call with //​**streetAddress**//,​ //​**city**//​ and //​**state**//​ parameters is slower than 2nd call, the different ​of this call is that the results would be handled to filter city and state before returning.
  
 To test the ultimate performance of Geocoder, we use the 2nd call in this guide. To test the ultimate performance of Geocoder, we use the 2nd call in this guide.
-===== Bottleneck of Performance ​===== +===== Benchmark ​===== 
-In this section we will discuss what is the bottleneck of Map Suite Geocoder performance. +In this section we will help you to use multi-thread ​to improve ​the geocoding ​when doing huge queries, then show you the benchmark ​results that we did.
-==== IO ==== +
-The most time spent of Geocoder is on IO. Due to the size of source data is about 9.06 GB, Geocoder cannot load all of them into memory when opening, if we do that, it will spend a lot of time to open Geocoder. We already had optimized and made the Geocoder to load necessary source data when opening, and added in memory cache to increase the read speed, it still would read data from files when doing match query, the IO is the one of a bottleneck of performance. +
-==== Normalize Text ==== +
-The another bottleneck is input/​output text normalization,​ customer wants to input various of text to do the match query, Geocoder have to split and explain them at first, then compare ​the normalized text cluster with cache or file data to do the match. The matched ​results ​also need to be normalized as the Geocoder result. We always keep to update our normalization algorithm to make it more faster and accurate. +
-==== Performance Test ==== +
-Below Figure 2 is the test result ​that did 1,000 queries, the execution time is 4.669 seconds: +
- +
-{{:​map_suite_geocoder_performance_guide_002.png}} +
-\\ +
-//Figure 2. Performance test with 1,000 queries.//​ +
- +
-And the below Figure 3 is the test result that did 10,000 queries, the execution time is 16.588 seconds: +
- +
-{{:​map_suite_geocoder_performance_guide_003.png}} +
-\\ +
-//Figure 3. Performance test with 10,000 queries.//​ +
- +
-It's obvious that the most time spent is on IO (includes file or cache read), then normalization. +
-===== Performance Improvement ===== +
-In this section we will help you to improve the geocoding when doing huge queries.+
 ==== Using Multi-thread ==== ==== Using Multi-thread ====
-With the limitation of cache and file read, Geocoder doesn'​t support inner multi-thread to do the batch query. But it's bad if customer wants to do a huge query with more than 100,000 input text, below code snippet will help you to improve the query performance:​+With the limitation of cache and file data read, Geocoder doesn'​t support inner multi-thread to do the batch query. But it's bad if customer wants to do a huge query with more than 100,000 input texts, below code snippet will help you to improve the query performance:​
 <code csharp> <code csharp>
 // convert original custom data to address-zip map list // convert original custom data to address-zip map list
Line 132: Line 112:
 Task.WaitAll(tasks.ToArray());​ Task.WaitAll(tasks.ToArray());​
 </​code>​ </​code>​
-We preprocess the input texts to split them to chunks, then generate ​tasks to do the batch queries, each task maintains independent Geocoder to avoid multi-thread error. ​Below Figure 4 shows the elapsed time compare:+We preprocess the input texts to split them to eight chunks, then generate ​eight tasks to do the batch queries, each task maintains independent Geocoder to avoid multi-thread error. ​
  
-{{:map_suite_geocoder_performance_guide_004.png}}+ 
 +==== Benchmark Reports ==== 
 +To compare the performance between single thread and multi-thread,​ we used the following (Figure 2) machine hardware device to do the benchmark:​ 
 + 
 +{{:map_suite_geocoder_performance_guide_002.png}}
 \\ \\
-//​Figure ​4Query compare.//+//​Figure ​2Machine Hardware Device Information.//
  
-It shows that we improved about three times speed after using multi-thread. +Below Figure 3 shows the benchmark result:
-==== Hardware Performance ==== +
-We used the following machine hardware device to do the test in Figure ​4. +
-  * CPU: Intel® Core™ i7-4790 CPU @ 3.60GHz +
-  * Memory: 8.00 GB +
-  * Disk: Crucial CT500MX200SSD1 +
-  * System: Windows 10 64-bit +
-And the hardware usage when querying is: +
-  * CPU: 70% - 90% +
-  * Memory: 300 MB + +
-  * IO30 MB/Seconds+
  
-We also used different Amazon® instances to do the same test to dig the limitation of the Geocoder with multi-thread. Below Figure 5 is the test results:+{{:​map_suite_geocoder_performance_guide_003.png}} 
 +\\ 
 +//Figure 3. Single Thread & Multi-Thread Benchmark Result.// 
 + 
 +It shows that we improved about three times speed after using multi-thread. And the below Figure 4 shows the hardware usage when using multi-thread:​ 
 + 
 +{{:​map_suite_geocoder_performance_guide_004.png}} 
 +\\ 
 +//Figure 4. Hardware Usage Using Multi-Thread.//​ 
 + 
 +To dig the limitation of the Geocoder with multi-thread ​we also created different [[https://​aws.amazon.com/​ec2/​instance-types|Amazon® instances]] to do the same benchmark. Below Figure 5 is the test result:
  
 {{:​map_suite_geocoder_performance_guide_005.png}} {{:​map_suite_geocoder_performance_guide_005.png}}
 \\ \\
-//Figure 5. Performance ​test result ​on Amazon® instance.//+//Figure 5. Amazon® Instance Benchmark Result.// 
 + 
 +The match query speed increased significantly after using high performance hardware device. Note that we set the thread count to eight because we always tend to make the thread count to equal with or less than the CPU core count, if you start too much threads, the query speed would decrease instead. 
 +===== Bottleneck of Performance ​===== 
 +In this section we will discuss what is the bottleneck of Map Suite Geocoder performance. 
 +==== I/O ==== 
 +The most time spent of Geocoder is on I/O. Due to the size of source data is about 9.06 GB, Geocoder cannot load all of them into memory when opening, if we do that, it will spend a lot of time to open Geocoder. We already had optimized and made the Geocoder to load necessary source data when opening, and added in memory cache to increase the data read speed, it still would read data from files when doing match query, the I/O is the one of a bottleneck of performance. 
 +==== Normalize Text ==== 
 +The another bottleneck is input/​output text normalization,​ customer wants to input various of texts to do the match query, Geocoder has to split and explain them at first, then compare the normalized text cluster with cache or file data to do the match. The matched results also need to be normalized as the Geocoder result. We always keep to update our normalization algorithm to make it more faster and accurate. 
 +==== Performance Test ==== 
 +Below Figure 6 is the analysis result that did **1,000** queries one by one, the execution time is **4.669** seconds: 
 + 
 +{{:​map_suite_geocoder_performance_guide_006.png}} 
 +\\ 
 +//Figure 6. Analysis Result With 1,000 Queries.//​ 
 + 
 +And the below Figure 7 is the analysis result that did **10,000** queries one by one, the execution time is **16.588** seconds: 
 + 
 +{{:​map_suite_geocoder_performance_guide_007.png}} 
 +\\ 
 +//Figure 7. Analysis Test With 10,000 Queries.//
  
-Note that we set the thread count to 8 because we always tend to make the thread count to match the CPU core countif you start too much threads, the query speed would decrease instead.+It's obvious ​that the most time spent is on I/O (includes file or cache data read)then normalization.
map_suite_geocoder_performance_guide.1553075751.txt.gz · Last modified: 2019/03/20 09:55 by tgwikiupdate