Skip to content

Instantly share code, notes, and snippets.

@msm-code
Created December 29, 2021 02:41
Show Gist options
  • Select an option

  • Save msm-code/fef1088f59a0391689e510141b460a39 to your computer and use it in GitHub Desktop.

Select an option

Save msm-code/fef1088f59a0391689e510141b460a39 to your computer and use it in GitHub Desktop.

Revisions

  1. msm-code created this gist Dec 29, 2021.
    200 changes: 200 additions & 0 deletions karton-benchmark-notes.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,200 @@
    # Karton benchmark notes

    Predictibly, most of the time was spent in redis

    90k tasks, nothing to collect
    ```
    [2021-12-29 00:31:53,585][INFO] Tasks: Get all tasks: 4.495668649673462
    [2021-12-29 00:31:53,586][INFO] Tasks: Tasks from queue: 0.00021696090698242188
    [2021-12-29 00:31:53,675][INFO] Tasks: Big loop: 0.08784604072570801
    [2021-12-29 00:31:53,675][INFO] Tasks: Smol loop: 1.0728836059570312e-05
    [2021-12-29 00:31:53,812][INFO] Resources: List objects: 0.003173351287841797
    [2021-12-29 00:31:58,107][INFO] Resources: Get all tasks: 4.294421672821045
    [2021-12-29 00:31:58,268][INFO] Resources: For loop: 0.15995168685913086
    [2021-12-29 00:31:58,268][INFO] Resources: Remove loop: 1.1205673217773438e-05
    [2021-12-29 00:31:58,407][INFO] GC done in 10.318462610244751 seconds
    ```

    90k tasks including 60k to collect
    ```
    [2021-12-29 00:36:48,376][INFO] Tasks: Get all tasks: 4.544401407241821
    [2021-12-29 00:36:48,377][INFO] Tasks: Tasks from queue: 0.00017404556274414062
    [2021-12-29 00:37:01,953][INFO] Tasks: Big loop: 13.575872898101807
    [2021-12-29 00:37:01,954][INFO] Tasks: Smol loop: 9.775161743164062e-06
    [2021-12-29 00:37:02,108][INFO] Resources: List objects: 0.005395174026489258
    [2021-12-29 00:37:03,493][INFO] Resources: Get all tasks: 1.3837084770202637
    [2021-12-29 00:37:03,545][INFO] Resources: For loop: 0.05156588554382324
    [2021-12-29 00:37:03,546][INFO] Resources: Remove loop: 8.821487426757812e-06
    [2021-12-29 00:37:03,593][INFO] GC done in 20.767701625823975 seconds
    ```

    60k tasks including 60k to collect
    ```
    [2021-12-29 00:49:17,727][INFO] Tasks: Get all tasks: 2.212865114212036
    [2021-12-29 00:49:17,728][INFO] Tasks: Tasks from queue: 0.00018286705017089844
    [2021-12-29 00:49:26,743][INFO] Tasks: Big loop: 9.015160083770752
    [2021-12-29 00:49:26,744][INFO] Tasks: Smol loop: 1.2874603271484375e-05
    [2021-12-29 00:49:27,397][INFO] Resources: List objects: 0.5849082469940186
    [2021-12-29 00:49:27,399][INFO] Resources: Get all tasks: 0.0008819103240966797
    [2021-12-29 00:49:27,400][INFO] Resources: For loop: 3.337860107421875e-05
    [2021-12-29 00:49:37,279][INFO] Resources: Remove loop: 9.878673791885376
    [2021-12-29 00:49:37,280][INFO] GC done in 22.77230477333069 seconds
    ```

    Standard batch from now on: 20k tasks including 20k to collect:
    ```
    [2021-12-29 01:04:37,191][INFO] Tasks: Get all tasks: 1.994135856628418
    [2021-12-29 01:04:37,192][INFO] Tasks: Tasks from queue: 0.00023102760314941406
    [2021-12-29 01:04:46,747][INFO] Tasks: Big loop (record): 9.555027723312378
    [2021-12-29 01:04:46,748][INFO] Tasks: Smol loop: 1.8596649169921875e-05
    [2021-12-29 01:04:46,841][INFO] Resources: List objects: 0.005769491195678711
    [2021-12-29 01:04:46,843][INFO] Resources: Get all tasks: 0.001276254653930664
    [2021-12-29 01:04:46,843][INFO] Resources: For loop: 3.981590270996094e-05
    [2021-12-29 01:04:46,846][INFO] Resources: Remove loop: 0.002631664276123047
    [2021-12-29 01:04:46,847][INFO] GC done in 12.65667200088501 seconds
    ```

    ### Part 1: Tasks

    `Get all tasks` is (unsuprisingly) slow.
    Main loop of `gc_collect_tasks` is very slow when there are resources to remove

    ```python

    def delete_tasks(self, tasks: List[Task]) -> None:
    """
    Remove multiple tasks from Redis
    :param tasks: list of Task objects
    """
    keys = [f"{KARTON_TASK_NAMESPACE}:{task.uid}" for task in tasks]
    self.redis.delete(*keys)
    ```
    result:
    ```
    [2021-12-29 01:09:28,965][INFO] Tasks: Get all tasks: 1.9590599536895752
    [2021-12-29 01:09:28,967][INFO] Tasks: Tasks from queue: 0.00023603439331054688
    [2021-12-29 01:09:29,052][INFO] Tasks: Big loop (record): 0.08496618270874023
    [2021-12-29 01:09:29,053][INFO] Tasks: Deleting 40001 tasks
    [2021-12-29 01:09:29,216][INFO] Tasks: Big loop (delete): 0.16314005851745605
    [2021-12-29 01:09:33,712][INFO] Tasks: Big loop (metrics): 4.494740009307861
    [2021-12-29 01:09:33,713][INFO] Tasks: Smol loop: 1.5020370483398438e-05
    [2021-12-29 01:09:33,776][INFO] Resources: List objects: 0.004618406295776367
    [2021-12-29 01:09:33,778][INFO] Resources: Get all tasks: 0.0007326602935791016
    [2021-12-29 01:09:33,778][INFO] Resources: For loop: 2.384185791015625e-05
    [2021-12-29 01:09:33,780][INFO] Resources: Remove loop: 0.001951456069946289
    [2021-12-29 01:09:33,781][INFO] GC done in 7.78088116645813 seconds
    ```
    Wow! We **slashed** time it took to delete 40k tasks from 9.5 seconds to 0.16 seconds.

    Well, almost - incrementing all metrics still takes 4.5 seconds. One idea is to delete the metrics (system-side metrics are a bit redundant with service-side metrics. But still, it's a nice feature to have). But the fact that this is half of the time, suggests that network latency plays a huge role here. Let's try to speed it up with pipelining:

    ```python
    def increment_metrics_list(self, metric: KartonMetrics, identities: List[str]) -> None:
    p = self.redis.pipeline()
    for identity in identities:
    p.hincrby(metric.value, identity, 1)
    p.execute()
    ```
    Result:
    ```
    [2021-12-29 01:15:55,718][INFO] Tasks: Get all tasks: 2.1441361904144287
    [2021-12-29 01:15:55,719][INFO] Tasks: Tasks from queue: 0.0002739429473876953
    [2021-12-29 01:15:55,805][INFO] Tasks: Big loop (record): 0.08557796478271484
    [2021-12-29 01:15:55,806][INFO] Tasks: Deleting 40001 tasks
    [2021-12-29 01:15:55,965][INFO] Tasks: Big loop (delete): 0.15918326377868652
    [2021-12-29 01:15:56,424][INFO] Tasks: Big loop (metrics): 0.4580235481262207
    [2021-12-29 01:15:56,424][INFO] Tasks: Smol loop: 2.2649765014648438e-05
    [2021-12-29 01:15:56,494][INFO] Resources: List objects: 0.005167245864868164
    [2021-12-29 01:15:56,495][INFO] Resources: Get all tasks: 0.0008485317230224609
    [2021-12-29 01:15:56,496][INFO] Resources: For loop: 2.9325485229492188e-05
    [2021-12-29 01:15:56,498][INFO] Resources: Remove loop: 0.002457857131958008
    [2021-12-29 01:15:56,499][INFO] GC done in 3.9317240715026855 seconds
    ```
    Well, there goes another 4.5 seconds. Task GC time improved from ~12 seconds to ~3.5 seconds. Nice.

    Can we do better? Well, it depends on why `get all tasks` is slow. There are three separate operations:
    - `keys`
    - `mget`
    - deserialisation

    I've added logging for them all. A bit suprisingly, deserialisation is by far the slowest part:
    ```
    [2021-12-29 01:26:56,155][INFO] Keys: 0.18313837051391602
    [2021-12-29 01:26:56,583][INFO] Mget: 0.4270799160003662
    [2021-12-29 01:26:58,366][INFO] Deserialise: 1.7819278240203857
    [2021-12-29 01:26:58,372][INFO] Tasks: Get all tasks: 2.40012788772583
    ```
    This is good news. This means we can optimise it in the future, without redesigning everything.

    ### Part 2: Resources

    But for now, let's focus on resources. I've changed the producer a bit, to produce a different resource every time:

    ```python
    import sys
    import os
    import logging
    from karton.core import Config, Producer, Task, Resource

    config = Config("karton.ini")
    producer = Producer(config)

    howmany = int(sys.argv[1])
    for i in range(howmany):
    if i % 1000 == 0:
    print(i)

    contents = str(i).encode()
    resource = Resource(str(i), contents)

    task = Task({"type": "sample", "kind": "raw"})
    logging.info('pushing file to karton %s, task %s' % (i, task))

    task.add_resource("sample", resource)
    task.add_payload("tags", ["simple_producer"])
    task.add_payload("additional_info", ["This sample has been added by simple producer example"])
    producer.send_task(task)
    ```

    New times:
    ```
    [2021-12-29 02:24:10,000][INFO] Tasks: Get all tasks: 0.8783807754516602
    [2021-12-29 02:24:10,048][INFO] Tasks: Deleting 20000 tasks
    [2021-12-29 02:24:10,532][INFO] Tasks: Big loop: 0.48422670364379883
    [2021-12-29 02:24:11,426][INFO] Resources: List objects: 0.8508834838867188
    [2021-12-29 02:24:11,427][INFO] Resources: List objects: 10000 objects
    [2021-12-29 02:24:11,428][INFO] Resources: Get all tasks: 0.00037384033203125
    [2021-12-29 02:24:11,428][INFO] Resources: For loop: 7.62939453125e-06
    [2021-12-29 02:24:11,429][INFO] Resources: Removing 10000 resources
    [2021-12-29 02:24:26,314][INFO] Resources: Remove loop: 14.885408878326416
    [2021-12-29 02:24:26,315][INFO] GC done in 18.199631452560425 seconds
    ```

    Let's use basically the same technique, and pipeline object removal:

    ```python
    [2021-12-29 02:21:48,814][INFO] Tasks: Get all tasks: 0.8932852745056152
    [2021-12-29 02:21:48,871][INFO] Tasks: Deleting 20000 tasks
    [2021-12-29 02:21:49,267][INFO] Tasks: Big loop: 0.39539289474487305
    [2021-12-29 02:21:50,224][INFO] Resources: List objects: 0.9117233753204346
    [2021-12-29 02:21:50,225][INFO] Resources: List objects: 10000 objects
    [2021-12-29 02:21:50,226][INFO] Resources: Get all tasks: 0.0004830360412597656
    [2021-12-29 02:21:50,226][INFO] Resources: For loop: 6.198883056640625e-06
    [2021-12-29 02:21:50,226][INFO] Resources: Removing 10000 resources
    [2021-12-29 02:21:52,237][INFO] Resources: Remove loop: 2.0103540420532227
    [2021-12-29 02:21:52,239][INFO] GC done in 5.3230650424957275 seconds
    ```

    Batching wins, again. Decreased the remove time from 14.88s to 2.01s.

    ### Part 3: Routing

    With these changes, GC should be fast enough. There is one optimisation pending (don't call `get_all_tasks` twice), but maybe it's time to focus on routing.

    The ways to go from here are:
    - improve routing performnace (by using pipelining in the `process_task`, for example).
    - reverse-engineer why routing is not thread-safe and fix it so we can scale routers.
    - implement batched routing? (route multiple tasks at once to reduce roundtrips).

    Sadly it's gettint a bit late here, so I won't be able to look at it.