Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- fhmfl12cg9vubrh80mui.auto.internal
- *** Found local files:
- *** * /lessons/logs/dag_id=sparkoperator_demo/run_id=manual__2024-03-14T08:58:37.174475+00:00/task_id=spark_submit_task/attempt=1.log
- [2024-03-14, 08:58:37 UTC] {taskinstance.py:1103} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: sparkoperator_demo.spark_submit_task manual__2024-03-14T08:58:37.174475+00:00 [queued]>
- [2024-03-14, 08:58:37 UTC] {taskinstance.py:1103} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: sparkoperator_demo.spark_submit_task manual__2024-03-14T08:58:37.174475+00:00 [queued]>
- [2024-03-14, 08:58:37 UTC] {taskinstance.py:1308} INFO - Starting attempt 1 of 1
- [2024-03-14, 08:58:37 UTC] {taskinstance.py:1327} INFO - Executing <Task(SparkSubmitOperator): spark_submit_task> on 2024-03-14 08:58:37.174475+00:00
- [2024-03-14, 08:58:37 UTC] {standard_task_runner.py:57} INFO - Started process 7696 to run task
- [2024-03-14, 08:58:37 UTC] {standard_task_runner.py:84} INFO - Running: ['airflow', 'tasks', 'run', 'sparkoperator_demo', 'spark_submit_task', 'manual__2024-03-14T08:58:37.174475+00:00', '--job-id', '52', '--raw', '--subdir', 'DAGS_FOLDER/spark_dag.py', '--cfg-path', '/tmp/tmpno54b_ek']
- [2024-03-14, 08:58:37 UTC] {standard_task_runner.py:85} INFO - Job 52: Subtask spark_submit_task
- [2024-03-14, 08:58:37 UTC] {task_command.py:410} INFO - Running <TaskInstance: sparkoperator_demo.spark_submit_task manual__2024-03-14T08:58:37.174475+00:00 [running]> on host fhmfl12cg9vubrh80mui.auto.internal
- [2024-03-14, 08:58:38 UTC] {taskinstance.py:1545} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='sparkoperator_demo' AIRFLOW_CTX_TASK_ID='spark_submit_task' AIRFLOW_CTX_EXECUTION_DATE='2024-03-14T08:58:37.174475+00:00' AIRFLOW_CTX_TRY_NUMBER='1' AIRFLOW_CTX_DAG_RUN_ID='manual__2024-03-14T08:58:37.174475+00:00'
- [2024-03-14, 08:58:38 UTC] {base.py:73} INFO - Using connection ID 'yarn_spark' for task execution.
- [2024-03-14, 08:58:38 UTC] {spark_submit.py:341} INFO - Spark-Submit cmd: spark-submit --master yarn --conf spark.driver.maxResultSize=20g --executor-cores 2 --executor-memory 2g --name arrow-spark /lessons/partition.py 2020-05-01 /user/master/data/events /user/kotlyarovb/data/events
- [2024-03-14, 08:58:39 UTC] {spark_submit.py:492} INFO - SLF4J: Class path contains multiple SLF4J bindings.
- [2024-03-14, 08:58:39 UTC] {spark_submit.py:492} INFO - SLF4J: Found binding in [jar:file:/usr/lib/spark/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
- [2024-03-14, 08:58:39 UTC] {spark_submit.py:492} INFO - SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
- [2024-03-14, 08:58:39 UTC] {spark_submit.py:492} INFO - SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
- [2024-03-14, 08:58:39 UTC] {spark_submit.py:492} INFO - SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
- [2024-03-14, 08:58:39 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:39,535 WARN util.Utils: Your hostname, fhmfl12cg9vubrh80mui resolves to a loopback address: 127.0.1.1; using 172.16.0.24 instead (on interface eth0)
- [2024-03-14, 08:58:39 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:39,536 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
- [2024-03-14, 08:58:40 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:40,805 INFO spark.SparkContext: Running Spark version 3.0.2
- [2024-03-14, 08:58:40 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:40,857 INFO resource.ResourceUtils: ==============================================================
- [2024-03-14, 08:58:40 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:40,858 INFO resource.ResourceUtils: Resources for spark.driver:
- [2024-03-14, 08:58:40 UTC] {spark_submit.py:492} INFO -
- [2024-03-14, 08:58:40 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:40,859 INFO resource.ResourceUtils: ==============================================================
- [2024-03-14, 08:58:40 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:40,859 INFO spark.SparkContext: Submitted application: EventsPartitioningJob-2020-05-01
- [2024-03-14, 08:58:40 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:40,926 INFO spark.SecurityManager: Changing view acls to: kotlyarovb
- [2024-03-14, 08:58:40 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:40,926 INFO spark.SecurityManager: Changing modify acls to: kotlyarovb
- [2024-03-14, 08:58:40 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:40,926 INFO spark.SecurityManager: Changing view acls groups to:
- [2024-03-14, 08:58:40 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:40,927 INFO spark.SecurityManager: Changing modify acls groups to:
- [2024-03-14, 08:58:40 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:40,927 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(kotlyarovb); groups with view permissions: Set(); users with modify permissions: Set(kotlyarovb); groups with modify permissions: Set()
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,165 INFO util.Utils: Successfully started service 'sparkDriver' on port 44065.
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,204 INFO spark.SparkEnv: Registering MapOutputTracker
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,241 INFO spark.SparkEnv: Registering BlockManagerMaster
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,257 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,257 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,292 INFO spark.SparkEnv: Registering BlockManagerMasterHeartbeat
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,307 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-64908467-2f69-49fa-ab31-24f1aa3616f5
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,332 INFO memory.MemoryStore: MemoryStore started with capacity 265.5 MiB
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,375 INFO spark.SparkEnv: Registering OutputCommitCoordinator
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,507 INFO util.log: Logging initialized @3082ms to org.sparkproject.jetty.util.log.Slf4jLog
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,602 INFO server.Server: jetty-9.4.34.v20201102; built: 2020-11-02T14:15:39.302Z; git: e46af88704a893fc12cb0e3bf46e2c7b48a009e7; jvm 1.8.0_392-8u392-ga-1~20.04-b08
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,629 INFO server.Server: Started @3204ms
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,681 INFO server.AbstractConnector: Started ServerConnector@7885432{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,681 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,710 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@78039eb4{/jobs,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,713 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@54095d19{/jobs/json,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,714 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7ba62011{/jobs/job,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,715 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3c97a623{/jobs/job/json,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,716 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3134a0e{/stages,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,716 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@12f79b1d{/stages/json,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,717 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@20851bbb{/stages/stage,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,718 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@53ac6f58{/stages/stage/json,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,719 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@775dd3de{/stages/pool,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,720 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7b813cf6{/stages/pool/json,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,720 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5f5db1d6{/storage,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,721 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5c3c754b{/storage/json,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,722 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@12e9e1f0{/storage/rdd,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,722 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4755928a{/storage/rdd/json,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,723 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7b3eaf80{/environment,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,724 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@9c64abf{/environment/json,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,725 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@866e118{/executors,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,726 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@338f1432{/executors/json,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,726 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5a450036{/executors/threadDump,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,727 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2097c661{/executors/threadDump/json,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,738 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@15ef8888{/static,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,739 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@67a239ee{/,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,740 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@795aa966{/api,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,741 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5682c239{/jobs/job/kill,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,742 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1c9db4f7{/stages/stage/kill,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,745 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://172.16.0.24:4040
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,966 WARN util.Utils: spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs.
- [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,968 INFO util.Utils: Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
- [2024-03-14, 08:58:42 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:42,052 INFO client.RMProxy: Connecting to ResourceManager at rc1a-dataproc-m-dg5lgqqm7jju58f9.mdb.yandexcloud.net/172.16.0.14:8032
- [2024-03-14, 08:58:42 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:42,279 INFO client.AHSProxy: Connecting to Application History server at rc1a-dataproc-m-dg5lgqqm7jju58f9.mdb.yandexcloud.net/172.16.0.14:10200
- [2024-03-14, 08:58:42 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:42,348 INFO yarn.Client: Requesting a new application from cluster with 5 NodeManagers
- [2024-03-14, 08:58:42 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:42,892 INFO conf.Configuration: resource-types.xml not found
- [2024-03-14, 08:58:42 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:42,893 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
- [2024-03-14, 08:58:42 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:42,914 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (37888 MB per container)
- [2024-03-14, 08:58:42 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:42,915 INFO yarn.Client: Will allocate AM container, with 10700 MB memory including 972 MB overhead
- [2024-03-14, 08:58:42 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:42,915 INFO yarn.Client: Setting up container launch context for our AM
- [2024-03-14, 08:58:42 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:42,918 INFO yarn.Client: Setting up the launch environment for our AM container
- [2024-03-14, 08:58:42 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:42,929 INFO yarn.Client: Preparing resources for our AM container
- [2024-03-14, 08:58:42 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:42,999 INFO yarn.Client: Uploading resource file:/usr/lib/spark/python/lib/pyspark.zip -> hdfs://rc1a-dataproc-m-dg5lgqqm7jju58f9.mdb.yandexcloud.net/user/kotlyarovb/.sparkStaging/application_1692104774102_37394/pyspark.zip
- [2024-03-14, 08:58:43 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:43,194 INFO yarn.Client: Uploading resource file:/usr/lib/spark/python/lib/py4j-0.10.9-src.zip -> hdfs://rc1a-dataproc-m-dg5lgqqm7jju58f9.mdb.yandexcloud.net/user/kotlyarovb/.sparkStaging/application_1692104774102_37394/py4j-0.10.9-src.zip
- [2024-03-14, 08:58:43 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:43,371 INFO yarn.Client: Uploading resource file:/tmp/spark-268a5017-4800-45bc-b8cc-04dcd63c2d45/__spark_conf__7761552592225624222.zip -> hdfs://rc1a-dataproc-m-dg5lgqqm7jju58f9.mdb.yandexcloud.net/user/kotlyarovb/.sparkStaging/application_1692104774102_37394/__spark_conf__.zip
- [2024-03-14, 08:58:43 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:43,426 INFO spark.SecurityManager: Changing view acls to: kotlyarovb
- [2024-03-14, 08:58:43 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:43,426 INFO spark.SecurityManager: Changing modify acls to: kotlyarovb
- [2024-03-14, 08:58:43 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:43,426 INFO spark.SecurityManager: Changing view acls groups to:
- [2024-03-14, 08:58:43 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:43,426 INFO spark.SecurityManager: Changing modify acls groups to:
- [2024-03-14, 08:58:43 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:43,426 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(kotlyarovb); groups with view permissions: Set(); users with modify permissions: Set(kotlyarovb); groups with modify permissions: Set()
- [2024-03-14, 08:58:43 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:43,449 INFO yarn.Client: Submitting application application_1692104774102_37394 to ResourceManager
- [2024-03-14, 08:58:43 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:43,487 INFO impl.YarnClientImpl: Submitted application application_1692104774102_37394
- [2024-03-14, 08:58:44 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:44,491 INFO yarn.Client: Application report for application_1692104774102_37394 (state: ACCEPTED)
- [2024-03-14, 08:58:44 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:44,494 INFO yarn.Client:
- [2024-03-14, 08:58:44 UTC] {spark_submit.py:492} INFO - client token: N/A
- [2024-03-14, 08:58:44 UTC] {spark_submit.py:492} INFO - diagnostics: AM container is launched, waiting for AM container to Register with RM
- [2024-03-14, 08:58:44 UTC] {spark_submit.py:492} INFO - ApplicationMaster host: N/A
- [2024-03-14, 08:58:44 UTC] {spark_submit.py:492} INFO - ApplicationMaster RPC port: -1
- [2024-03-14, 08:58:44 UTC] {spark_submit.py:492} INFO - queue: root.kotlyarovb
- [2024-03-14, 08:58:44 UTC] {spark_submit.py:492} INFO - start time: 1710406723463
- [2024-03-14, 08:58:44 UTC] {spark_submit.py:492} INFO - final status: UNDEFINED
- [2024-03-14, 08:58:44 UTC] {spark_submit.py:492} INFO - tracking URL: http://rc1a-dataproc-m-dg5lgqqm7jju58f9.mdb.yandexcloud.net:8088/proxy/application_1692104774102_37394/
- [2024-03-14, 08:58:44 UTC] {spark_submit.py:492} INFO - user: kotlyarovb
- [2024-03-14, 08:58:45 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:45,496 INFO yarn.Client: Application report for application_1692104774102_37394 (state: ACCEPTED)
- [2024-03-14, 08:58:46 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:46,498 INFO yarn.Client: Application report for application_1692104774102_37394 (state: ACCEPTED)
- [2024-03-14, 08:58:47 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:47,500 INFO yarn.Client: Application report for application_1692104774102_37394 (state: ACCEPTED)
- [2024-03-14, 08:58:48 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:48,503 INFO yarn.Client: Application report for application_1692104774102_37394 (state: ACCEPTED)
- [2024-03-14, 08:58:49 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:49,505 INFO yarn.Client: Application report for application_1692104774102_37394 (state: ACCEPTED)
- [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,281 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> rc1a-dataproc-m-dg5lgqqm7jju58f9.mdb.yandexcloud.net, PROXY_URI_BASES -> http://rc1a-dataproc-m-dg5lgqqm7jju58f9.mdb.yandexcloud.net:8088/proxy/application_1692104774102_37394), /proxy/application_1692104774102_37394
- [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,507 INFO yarn.Client: Application report for application_1692104774102_37394 (state: RUNNING)
- [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,508 INFO yarn.Client:
- [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - client token: N/A
- [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - diagnostics: N/A
- [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - ApplicationMaster host: 172.16.0.34
- [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - ApplicationMaster RPC port: -1
- [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - queue: root.kotlyarovb
- [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - start time: 1710406723463
- [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - final status: UNDEFINED
- [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - tracking URL: http://rc1a-dataproc-m-dg5lgqqm7jju58f9.mdb.yandexcloud.net:8088/proxy/application_1692104774102_37394/
- [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - user: kotlyarovb
- [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,509 INFO cluster.YarnClientSchedulerBackend: Application application_1692104774102_37394 has started running.
- [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,518 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33077.
- [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,518 INFO netty.NettyBlockTransferService: Server created on 172.16.0.24:33077
- [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,520 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
- [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,535 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 172.16.0.24, 33077, None)
- [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,539 INFO storage.BlockManagerMasterEndpoint: Registering block manager 172.16.0.24:33077 with 265.5 MiB RAM, BlockManagerId(driver, 172.16.0.24, 33077, None)
- [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,542 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 172.16.0.24, 33077, None)
- [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,543 INFO storage.BlockManager: external shuffle service port = 7337
- [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,544 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 172.16.0.24, 33077, None)
- [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,689 INFO ui.ServerInfo: Adding filter to /metrics/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
- [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,691 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7d523710{/metrics/json,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,733 INFO history.SingleEventLogFileWriter: Logging events to hdfs:/var/log/spark/apps/application_1692104774102_37394.inprogress
- [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,967 WARN util.Utils: spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs.
- [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,967 INFO util.Utils: Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
- [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,978 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
- [2024-03-14, 08:58:52 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:52,193 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
- [2024-03-14, 08:58:56 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:56,382 INFO resource.ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 2, script: , vendor: , memory -> name: memory, amount: 2048, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
- [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,495 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (172.16.0.25:51138) with ID 1
- [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,505 INFO dynalloc.ExecutorMonitor: New executor 1 has registered (new total is 1)
- [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,564 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
- [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,707 INFO storage.BlockManagerMasterEndpoint: Registering block manager rc1a-dataproc-d-1njdvt74bdkw9who.mdb.yandexcloud.net:41237 with 1007.8 MiB RAM, BlockManagerId(1, rc1a-dataproc-d-1njdvt74bdkw9who.mdb.yandexcloud.net, 41237, None)
- [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,782 INFO internal.SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/opt/airflow/spark-warehouse').
- [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,782 INFO internal.SharedState: Warehouse path is 'file:/opt/airflow/spark-warehouse'.
- [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,795 INFO ui.ServerInfo: Adding filter to /SQL: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
- [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,796 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6499fcc4{/SQL,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,797 INFO ui.ServerInfo: Adding filter to /SQL/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
- [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,798 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@dd9a2c3{/SQL/json,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,799 INFO ui.ServerInfo: Adding filter to /SQL/execution: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
- [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,799 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1082fca1{/SQL/execution,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,800 INFO ui.ServerInfo: Adding filter to /SQL/execution/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
- [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,801 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7b8eab3b{/SQL/execution/json,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,801 INFO ui.ServerInfo: Adding filter to /static/sql: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
- [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,803 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4eee3791{/static/sql,null,AVAILABLE,@Spark}
- [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - Traceback (most recent call last):
- [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - File "/lessons/partition.py", line 25, in <module>
- [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - main()
- [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - File "/lessons/partition.py", line 17, in main
- [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - events = sql.read.json(f"{base_input_path}/date={date}")
- [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 300, in json
- [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1304, in __call__
- [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 134, in deco
- [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - File "<string>", line 3, in raise_from
- [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - pyspark.sql.utils.AnalysisException: Path does not exist: hdfs://rc1a-dataproc-m-dg5lgqqm7jju58f9.mdb.yandexcloud.net/user/master/data/events/date=2020-05-01;
- [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,545 INFO spark.SparkContext: Invoking stop() from shutdown hook
- [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,556 INFO server.AbstractConnector: Stopped Spark@7885432{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
- [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,558 INFO ui.SparkUI: Stopped Spark web UI at http://172.16.0.24:4040
- [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,562 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread
- [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,597 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
- [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,598 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
- [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,606 INFO cluster.YarnClientSchedulerBackend: YARN client scheduler backend Stopped
- [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,633 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
- [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,642 INFO memory.MemoryStore: MemoryStore cleared
- [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,642 INFO storage.BlockManager: BlockManager stopped
- [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,651 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
- [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,655 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
- [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,663 INFO spark.SparkContext: Successfully stopped SparkContext
- [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,663 INFO util.ShutdownHookManager: Shutdown hook called
- [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,664 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-268a5017-4800-45bc-b8cc-04dcd63c2d45
- [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,667 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-21e5dfbf-8286-42ae-b2b2-f559faffa2e4
- [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,669 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-268a5017-4800-45bc-b8cc-04dcd63c2d45/pyspark-bbb1e6d6-49d4-4f78-9644-fd62fa7045cd
- [2024-03-14, 08:58:59 UTC] {taskinstance.py:1824} ERROR - Task failed with exception
- Traceback (most recent call last):
- File "/usr/local/lib/python3.8/dist-packages/airflow/providers/apache/spark/operators/spark_submit.py", line 157, in execute
- self._hook.submit(self._application)
- File "/usr/local/lib/python3.8/dist-packages/airflow/providers/apache/spark/hooks/spark_submit.py", line 423, in submit
- raise AirflowException(
- airflow.exceptions.AirflowException: Cannot execute: spark-submit --master yarn --conf spark.driver.maxResultSize=20g --executor-cores 2 --executor-memory 2g --name arrow-spark /lessons/partition.py 2020-05-01 /user/master/data/events /user/kotlyarovb/data/events. Error code is: 1.
- [2024-03-14, 08:58:59 UTC] {taskinstance.py:1345} INFO - Marking task as FAILED. dag_id=sparkoperator_demo, task_id=spark_submit_task, execution_date=20240314T085837, start_date=20240314T085837, end_date=20240314T085859
- [2024-03-14, 08:58:59 UTC] {standard_task_runner.py:104} ERROR - Failed to execute job 52 for task spark_submit_task (Cannot execute: spark-submit --master yarn --conf spark.driver.maxResultSize=20g --executor-cores 2 --executor-memory 2g --name arrow-spark /lessons/partition.py 2020-05-01 /user/master/data/events /user/kotlyarovb/data/events. Error code is: 1.; 7696)
- [2024-03-14, 08:58:59 UTC] {local_task_job_runner.py:225} INFO - Task exited with return code 1
- [2024-03-14, 08:58:59 UTC] {taskinstance.py:2653} INFO - 0 downstream tasks scheduled from follow-on schedule check
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement