Skip to content

Support: Optimization when importing 100k attributes in MISP #10495

@obfstr

Description

@obfstr

Support Questions

Hello,

I am currently running MISP 2.4.183 in a docker environment. I have written a python script which purpose is to:

  • Fetch 100k ips from a third party source.
  • Save it in JSON Format
  • Parse the JSON file and save it in MISP JSON Format
  • Import the MISP JSON file using the MISP import API.

Fetching attributes and saving them as MISP JSON format is completed within seconds. Bottleneck arises when i try to import the attributes. The importing of attributes takes a very long time

I tried to optimize the InnoDB, at first i did the following configurations inside the /etc/mysql/mariadb.conf.d/50-server.cnf

  • innodb_buffer_pool_size = 8GB
  • innodb_buffer_pool_chunk_size = 128M
  • innodb_log_file_size = 1G
  • innodb_log_buffer_size = 256M
  • innodb_flush_log_at_trx_commit = 2

With this innodb configuration it took around 20 minutes to import all 100k attributes. I can't allocate 70% to 80% of system RAM to innodb_buffer_pool_size as there are other services running too.

A chunk of python script which is responsible for importing the MISP attributes is attached

def convert_to_misp(input_file, output_file):
    # Load your input JSON
    with open(input_file, "r") as f:
        input_data = json.load(f)["data"]

    today = datetime.today().strftime("%Y-%m-%d")
    event_uuid = str(uuid.uuid4())

    # Base MISP event (NO "Event" wrapper for /events/add)
    misp_event = {
        "orgc_id": "1",
        "org_id": "1",
        "date": today,
        "threat_level_id": "3",
        "info": f"Feed {today}",
        "published": False,
        "uuid": event_uuid,
        "attribute_count": str(len(input_data)),
        "analysis": "0",
        "timestamp": str(int(datetime.now().timestamp())),
        "distribution": "0",
        "event_creator_email": "admin@admin.test",
        "Attribute": []
    }

    # Convert each IP entry into a MISP Attribute
    for idx, entry in enumerate(input_data, start=1):
        attr_uuid = str(uuid.uuid4())
        ip = entry.get("ipAddress")
        score = entry.get("abuseConfidenceScore")
        country = entry.get("countryCode")
        last_reported = entry.get("lastReportedAt")

        comment = f"Score: {abuse_score}, Country: {country}, Last reported: {last_reported}"

        misp_event["Attribute"].append({
            "type": "ip-dst",
            "category": "Network activity",
            "to_ids": True,
            "uuid": attr_uuid,
            "distribution": "5",
            "timestamp": str(int(datetime.now().timestamp())),
            "comment": comment,
            "value": ip,
            "disable_correlation": True
        })

    # Save locally (for record keeping)
    with open(output_file, "w") as f:
        json.dump(misp_event, f, indent=4)

    logger.info("Converted data to MISP JSON Format")
    std_print(f"[+] Converted data saved to {output_file}")

    # Upload directly via /events/add
    headers = {
        "Authorization": MISP_CONFIG["MISP_KEY"],
        "Accept": "application/json",
        "Content-Type": "application/json"
    }

    try:
        resp = requests.post(
            f"{MISP_CONFIG['MISP_URL']}/events/add",
            headers=headers,
            data=json.dumps(misp_event),
            verify=MISP_CONFIG["MISP_VERIFYCERT"]
        )
        resp.raise_for_status()  # raises an error for 4xx/5xx responses

        logger.info("Uploaded MISP Event successfully")
        std_print("Successfully uploaded MISP event!")

    except requests.exceptions.RequestException as e:
        logger.error(f"Failed to upload MISP event: {e}", exc_info=True)
        std_print(f"Failed to upload MISP event: {e}")

Is it possible to import these 100k attributes within minutes or seconds or optimize this in any way?

MISP version

2.4.183

Operating System

Debian

Operating System version

11

PHP version

PHP 7.4.33 (cli) (built: Apr 12 2024 00:02:16) ( NTS )

Browser

Chrome

Browser version

No response

Relevant log output

Extra attachments

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs triageThis issue has been automatically labelled and needs further triagesupport

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions