V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
推荐学习书目
Learn Python the Hard Way
Python Sites
PyPI - Python Package Index
http://diveintopython.org/toc/index.html
Pocoo
值得关注的项目
PyPy
Celery
Jinja2
Read the Docs
gevent
pyenv
virtualenv
Stackless Python
Beautiful Soup
结巴中文分词
Green Unicorn
Sentry
Shovel
Pyflakes
pytest
Python 编程
pep8 Checker
Styles
PEP 8
Google Python Style Guide
Code Style from The Hitchhiker's Guide
huyinjie
V2EX  ›  Python

请教一个 Python 中字段切割问题

  •  
  •   huyinjie · 2020-02-11 23:02:34 +08:00 · 3361 次点击
    这是一个创建于 1784 天前的主题,其中的信息可能已经有所发展或是发生改变。

    想要将如下这个字段按照逗号分离,但是在某些字符串中带有逗号,希望能不要分离字符串中的逗号

    6898,"RAAF Williams, Laverton Base","Laverton","Australia",\N,"YLVT",-37.86360168457031,144.74600219726562,18,10,"O","Australia/Hobart","airport","OurAirports"

    希望能分离成如下形式

    6898

    RAAF Williams, Laverton Base

    Laverton

    Australia

    直接使用 Python 中的 split 方法会将"RAAF Williams"与"Laverton Base"分离开,请问有什么办法可以避免

    27 条回复    2020-02-14 13:19:16 +08:00
    qwjhb
        1
    qwjhb  
       2020-02-11 23:15:20 +08:00
    exec('a=[6898,"RAAF Williams, Laverton Base","Laverton","Australia","YLVT",-37.86360168457031,144.74600219726562,18,10,"O","Australia/Hobart","airport","OurAirports"]')
    retanoj
        2
    retanoj  
       2020-02-11 23:23:41 +08:00 via iPhone
    好方法,命令执行 /代码执行严重漏洞都你这么写出来的
    retanoj
        3
    retanoj  
       2020-02-11 23:25:21 +08:00
    wuwukai007
        4
    wuwukai007  
       2020-02-11 23:27:34 +08:00 via Android   ❤️ 1
    正则表达式,如果单词是有引号的话
    huyinjie
        5
    huyinjie  
    OP
       2020-02-11 23:35:59 +08:00
    @wuwukai007 #4 看来只能用 ^(\d+),(.+),(.+),(.+),(.+),(.+),(.+),(.+),(.+),(.+),(.+),(.+),(.+)$ 这种来分离了
    retanoj
        7
    retanoj  
       2020-02-11 23:41:26 +08:00
    不好意思,贴乱了
    试试这个
    import csv
    >>> list(csv.reader([your_string]))
    noreply69
        9
    noreply69  
       2020-02-11 23:56:00 +08:00
    import csv
    s = '6898,"RAAF Williams, Laverton Base","Laverton","Australia",\\N,"YLVT",-37.86360168457031,144.74600219726562,18,10,"O","Australia/Hobart","airport","OurAirports"'
    splitted = list(csv.reader([s], delimiter=',', quotechar='"'))[0]
    print(splitted)
    noreply69
        10
    noreply69  
       2020-02-11 23:56:32 +08:00
    ```
    import csv

    s = '6898,"RAAF Williams, Laverton Base","Laverton","Australia",\\N,"YLVT",-37.86360168457031,144.74600219726562,18,10,"O","Australia/Hobart","airport","OurAirports"'

    splitted = list(csv.reader([s], delimiter=',', quotechar='"'))[0]

    print(splitted)
    ```
    qwjhb
        11
    qwjhb  
       2020-02-12 00:02:24 +08:00
    @retanoj 能不能用? 脚本处理现成格式化文本怕人插命令进来? 干脆删了 exec 好了
    Akkuman
        12
    Akkuman  
       2020-02-12 00:13:53 +08:00 via Android   ❤️ 1
    ast.literal_eval('[6898,"RAAF Williams, Laverton Base","Laverton","Australia","YLVT",-37.86360168457031,144.74600219726562,18,10,"O","Australia/Hobart","airport","OurAirports"]')
    huyinjie
        13
    huyinjie  
    OP
       2020-02-12 00:20:54 +08:00
    @noreply69 #9
    @retanoj #7

    感谢 这两段代码都能解决
    huyinjie
        14
    huyinjie  
    OP
       2020-02-12 00:24:56 +08:00
    @Akkuman #12
    @qwjhb #11
    因为是从文本文件一行一行读取 你们括号里的部分相当于已经成功分离了==
    Akkuman
        15
    Akkuman  
       2020-02-12 00:38:16 +08:00 via Android
    @huyinjie 不是成功分离,只是前后拼接了方括号
    retanoj
        16
    retanoj  
       2020-02-12 08:18:02 +08:00 via iPhone
    @qwjhb
    是的,能用能用。
    你知道的,一句“能不能用?”已经能说明很多问题了。
    levelworm
        17
    levelworm  
       2020-02-12 08:43:21 +08:00   ❤️ 1
    当中那个\N 能去掉吗?不去掉的话好像报错?
    noqwerty
        18
    noqwerty  
       2020-02-12 10:17:17 +08:00 via Android   ❤️ 1
    直接整个 csv 文件也可以读进来的
    smallpython
        19
    smallpython  
       2020-02-12 10:25:05 +08:00   ❤️ 1
    s = your_str
    shuangyinhao_count = 0
    result = []
    temp = ''

    for i in s:
    if i == '"':
    shuangyinhao_count += 1
    elif i == ',':
    if shuangyinhao_count == 1: # 当双引号数量为 1 时,继续添加字符而不做处理
    temp += i
    else:
    result.append(temp)
    temp = ''
    else:
    temp += i

    if shuangyinhao_count == 2:
    shuangyinhao_count = 0

    result.append(temp)

    print(result)
    araraloren
        20
    araraloren  
       2020-02-12 11:05:17 +08:00   ❤️ 1
    正则分隔
    import re

    str = '6898,"RAAF Williams, Laverton Base","Laverton","Australia",\\N,"YLVT",-37.86360168457031,144.74600219726562,18,10,"O","Australia/Hobart","airport","OurAirports"'
    pattern = re.compile(r'\"[^\"]+\"\,|[^\"\,]+\,');
    print(pattern.findall(str))
    chenstack
        21
    chenstack  
       2020-02-12 12:53:31 +08:00   ❤️ 1
    @qwjhb @retanoj 安全地解析字符串成 Python 对象可以用 ast.literal_eval,遇到运算符会抛出异常


    import ast
    ast.literal_eval('6898,"RAAF Williams, Laverton Base","Laverton","Australia","N","YLVT",-37.86360168457031,144.74600219726562,18,10,"O","Australia/Hobart","airport","OurAirports"')
    (6898, 'RAAF Williams, Laverton Base', 'Laverton', 'Australia', 'YLVT', -37.86360168457031, 144.74600219726562, 18, 10, 'O', 'Australia/Hobart', 'airport', 'OurAirports')
    huyinjie
        22
    huyinjie  
    OP
       2020-02-12 15:04:56 +08:00
    @Akkuman #15 感谢 可以使用
    huyinjie
        23
    huyinjie  
    OP
       2020-02-12 15:48:21 +08:00
    @smallpython #19 感谢 这种相当于 C 语言中 getchar 最初想到这个方法就是感觉麻烦些
    huyinjie
        24
    huyinjie  
    OP
       2020-02-12 15:56:31 +08:00
    @chenstack #21
    N 两边没有双引号 ast.literal_eval 无法解析
    huyinjie
        25
    huyinjie  
    OP
       2020-02-12 18:41:13 +08:00   ❤️ 1
    @levelworm #17 是的 不去掉会报错 可以使用 replace 成"\N"的方法曲线解决==
    levelworm
        26
    levelworm  
       2020-02-13 02:33:26 +08:00
    @huyinjie 多谢,这样要不写个简单的 parser 吧?虽然我觉得上面的一些解法更简单。
    larsenlouis
        27
    larsenlouis  
       2020-02-14 13:19:16 +08:00
    pandas 用条件扩充字段
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   1005 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 27ms · UTC 22:01 · PVG 06:01 · LAX 14:01 · JFK 17:01
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.